记录一次pdb恢复过程中遇到的大量bug

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:记录一次pdb恢复过程中遇到的大量bug

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

12C版本使用pdb的Oracle数据库,由于在创建index的过程中强制终止,导致业务大量阻塞,然后重启数据库几次之后直接crash,最后直接无法open成功,报ORA-00600 6856

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 22, block # 993741)
ORA-01110: data file 22:
'+DATA/XIFENFEI/96D12F7BA1E2CE57E0532506060A4A2D/DATAFILE/xff01.309.1050943869'
ORA-10564: tablespace CWBASEMS01
ORA-01110: data file 22:
'+DATA/XIFENFEI/96D12F7BA1E2CE57E0532506060A4A2D/DATAFILE/xff01.309.1050943869'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 76286
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [6856], [0], [12], [], [], [], [],[], [], [], [], []
2020-11-14T15:56:50.736722+08:00
Started redo scan
2020-11-14T15:56:50.977023+08:00
Completed redo scan
 read 82825 KB redo, 10769 data blocks need recovery
2020-11-14T15:56:51.147256+08:00
Started redo application at
 Thread 1: logseq 120309, block 48, offset 0
 Thread 2: logseq 74989, block 2, offset 16, scn 0x00000000f69e1f8d
2020-11-14T15:56:51.151007+08:00
Recovery of Online Redo Log: Thread 1 Group 1 Seq 120309 Reading mem 0
  Mem# 0: +DATA/XIFENFEI/ONLINELOG/group_1.262.1023806467
2020-11-14T15:56:51.153989+08:00
Recovery of Online Redo Log: Thread 2 Group 7 Seq 74989 Reading mem 0
  Mem# 0: +DATA/XIFENFEI/ONLINELOG/group_7.274.1023806785
Errors in file /u01/app/oracle/……/xifenfei1/trace/xifenfei1_p00d_469777.trc(incident=10079552)(PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [6856], [0], [12], [], [], [], [], [], [], [], [], []
2020-11-14T15:56:52.089726+08:00
(3):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

这个错误比较明显是由于ORA-600 6856错误导致数据库在启动的时候无法进行实例恢复,出现这个错误原因是由于客户创建index的过程中强制终止引起Bug 17437634 – ORA-1578 or ORA-600 [6856] transient in-memory corruption on TEMP segment during transaction recovery / ROLLBACK (eg: after Ctrl-C) – superseded (Doc ID 17437634.8),屏蔽该文件实例恢复,cdb启动成功,但是pdb无法正常open

SQL> alter session set container=pdb1;
 
Session altered.

SQL> alter database open;
alter database open 
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check:fno_system], [3], [], []
Process ID: 224476
Session ID: 13311 Serial number: 59525

这个错误比较明显是由于ORA-600 kcffo_online_pdb_check:fno_system,数据库未正常检测到pdb的system文件导致该问题,通过对pdb的system文件进行操作,让数据库识别到该文件,然后继续open库

SQL> alter database open;
alter database open 
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [kcvfdb_pdb_set_clean_scn: cleanckpt],[3],[4138003978],[4289274940], [2]
Process ID: 224476
Session ID: 13311 Serial number: 59525

该错误是由于数据库在恢复的过程中推了scn,触发了oracle 某种bug导致该问题,通过一些操作之后,数据库可以open,尝试temp表空间增加临时数据文件报ORA-00600 [kcffo_add_tmpf-1] 错误(Bug 29379978 – ORA-00600 [kcffo_add_tmpf-1] when trying to add temp file (Doc ID 29379978.8)).由于该文件无法加入,数据库无法导出
20201115115951


最后没有办法换了思路直接bbed修改文件头,open cdb库,然后open pdb,顺利导出数据.这次的恢复中,深刻的体验到pdb在open过程中的各种bug,实在比较厌烦.

ORA-600 kcffo_online_pdb_check: fno_system 和 ORA-600 kcvfdb_pdb_set_clean_scn: cleanckpt错误

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-600 kcffo_online_pdb_check: fno_system 和 ORA-600 kcvfdb_pdb_set_clean_scn: cleanckpt错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在做18c模拟故障测试中,经过自己一系列折腾,主要遭遇了ORA-600 kcffo_online_pdb_check: fno_system 和 ORA-600 kcvfdb_pdb_set_clean_scn: cleanckpt错误,这些都是pdb特有的,主要是由于一些bug引起,在非pdb环境中不太可能遇到.其实这也就是说明由于pdb机制的引入,使得后续的数据库异常恢复中会更加复杂.
18c数据库open ORA-00603 ORA-01092 ORA-00600报错

[oracle@ora11g tmp]$ ss
SQL*Plus: Release 18.0.0.0.0 - Production on Sat Apr 20 21:18:09 2019
Version 18.3.0.0.0
Copyright (c) 1982, 2018, Oracle.  All rights reserved.
Connected to:
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.3.0.0.0
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
BANNER_FULL
--------------------------------------------------------------------------------
BANNER_LEGACY
--------------------------------------------------------------------------------
    CON_ID
----------
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.3.0.0.0
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
         0
BANNER
--------------------------------------------------------------------------------
BANNER_FULL
--------------------------------------------------------------------------------
BANNER_LEGACY
--------------------------------------------------------------------------------
    CON_ID
----------
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [],
[], [], []
Process ID: 55775
Session ID: 135 Serial number: 49652

alert日志信息

Database Characterset is AL32UTF8
2019-04-20T18:20:54.256841+08:00
No Resource Manager plan active
2019-04-20T18:20:56.751241+08:00
replication_dependency_tracking turned off (no async multimaster replication found)
2019-04-20T18:20:57.862516+08:00
Starting background process AQPC
2019-04-20T18:20:58.341991+08:00
AQPC started with pid=45, OS id=55830
2019-04-20T18:21:01.476252+08:00
PDB$SEED(2):Autotune of undo retention is turned on.
2019-04-20T18:21:01.738732+08:00
Pdb PDB$SEED hit error 1157 during open read only (2) and will be closed.
2019-04-20T18:21:01.755310+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc:
ORA-01157: cannot identify/lock data file 5 - see DBWR trace file
ORA-01110: data file 5: '/u01/app/oracle/oradata/ORCL18C/pdbseed/system01.dbf'
PDB$SEED(2):JIT: pid 55775 requesting stop
PDB$SEED(2):Buffer Cache flush deferred for PDB 2
Could not open PDB$SEED error=1157
2019-04-20T18:21:01.887601+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc:
ORA-01157: cannot identify/lock data file 5 - see DBWR trace file
ORA-01110: data file 5: '/u01/app/oracle/oradata/ORCL18C/pdbseed/system01.dbf'
2019-04-20T18:21:03.385503+08:00
PDB1(3):Autotune of undo retention is turned on.
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_p000_55808.trc  (incident=66865) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check: fno_system], [3], [], [], [], [], [], [], [], [], [], []
2019-04-20T18:21:03.682428+08:00
PDB2(4):Autotune of undo retention is turned on.
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_66865/orcl18c_p000_55808_i66865.trc
2019-04-20T18:21:12.863880+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2019-04-20T18:21:12.879921+08:00
Pdb PDB1 hit error 600 during open read write (5) and will be closed.
2019-04-20T18:21:12.880506+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_p000_55808.trc:
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check: fno_system], [3], [], [], [], [], [], [], [], [], [], []
PDB1(3):JIT: pid 55808 requesting stop
2019-04-20T18:21:12.915407+08:00
Dumping diagnostic data in directory=[cdmp_20190420182112], requested by (instance=1, osid=55808 (P000)), summary=[incident=66865].
2019-04-20T18:21:12.989890+08:00
PDB1(3):Buffer Cache flush deferred for PDB 3
2019-04-20T18:21:13.004575+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_p001_55810.trc  (incident=66873) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check: fno_system], [4], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_66873/orcl18c_p001_55810_i66873.trc
2019-04-20T18:21:17.218642+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2019-04-20T18:21:17.222057+08:00
Pdb PDB2 hit error 600 during open read write (5) and will be closed.
2019-04-20T18:21:17.222236+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_p001_55810.trc:
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check: fno_system], [4], [], [], [], [], [], [], [], [], [], []
2019-04-20T18:21:17.260941+08:00
Dumping diagnostic data in directory=[cdmp_20190420182117], requested by (instance=1, osid=55810 (P001)), summary=[incident=66873].
2019-04-20T18:21:17.262023+08:00
PDB2(4):JIT: pid 55810 requesting stop
PDB2(4):Buffer Cache flush deferred for PDB 4
2019-04-20T18:21:17.352939+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2019-04-20T18:21:17.483695+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc  (incident=66857) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check: fno_system], [3], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_66857/orcl18c_ora_55775_i66857.trc
2019-04-20T18:21:22.612339+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2019-04-20T18:21:26.062635+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc  (incident=66858) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_66858/orcl18c_ora_55775_i66858.trc
2019-04-20T18:21:26.506644+08:00
Dumping diagnostic data in directory=[cdmp_20190420182126], requested by (instance=1, osid=55775), summary=[incident=66857].
2019-04-20T18:21:30.119381+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2019-04-20T18:21:30.119505+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc:
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], []
2019-04-20T18:21:30.119629+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc:
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], []
2019-04-20T18:21:30.119719+08:00
Error 600 happened during db open, shutting down database
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc  (incident=66859) (PDBNAME=CDB$ROOT):
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_66859/orcl18c_ora_55775_i66859.trc
2019-04-20T18:21:30.346811+08:00
Dumping diagnostic data in directory=[cdmp_20190420182130], requested by (instance=1, osid=55775), summary=[incident=66858].
2019-04-20T18:21:34.551418+08:00
opiodr aborting process unknown ospid (55775) as a result of ORA-603
2019-04-20T18:21:34.720885+08:00
ORA-603 : opitsk aborting process
License high water mark = 4
2019-04-20T18:21:34.754766+08:00
USER (ospid: 55775): terminating the instance due to ORA error 600
2019-04-20T18:21:35.839992+08:00
Instance terminated by USER, pid = 55775

alert日志提示文件不存在,实际上文件是存在的

[root@ora11g ~]#
[root@ora11g ~]# ls -l /u01/app/oracle/oradata/ORCL18C/pdbseed/system01.dbf
-rw-r-----. 1 oracle oinstall 283123712 4月  13 23:59 /u01/app/oracle/oradata/ORCL18C/pdbseed/system01.dbf

主要错误是ORA-600 kcffo_online_pdb_check: fno_system 查询mos发现主要是由于数据库bug导致,查询mos发现不少bug
KCFFO_ONLINE_PDB_CHECK FNO_SYSTEM]


官方解释,主要是由于kcffo_online_pdb函数执行异常导致

Function kcffo_online_pdb_check  Check if it ok to online the files in a
pluggable database. An error is  signalled if it is not ok. This routine will
grab a file enqueue for the relevant files and save it in the NULL-terminated
array fenqsp. The caller must release these after kcffo_online_pdb() or if an
error occurs.

通过人工修改文件状态,绕过该错误,cdb数据库open成功,但是pdb依旧无法正常open

SQL> alter database open;
Database altered.
SQL>  alter session set container=PDB1;
Session altered.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kcvfdb_pdb_set_clean_scn: cleanckpt], [3], [1739494], [38655308813], [2], [], [], [], [], [], [], []

alert日志

PDB1(3):alter database open
PDB1(3):Autotune of undo retention is turned on.
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_56178.trc  (incident=69185) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [kcvfdb_pdb_set_clean_scn: cleanckpt], [3], [1739494], [38655308813], [2], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_69185/orcl18c_ora_56178_i69185.trc
2019-04-20T18:31:41.761479+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2019-04-20T18:31:42.097465+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Pdb PDB1 hit error 600 during open read write (1) and will be closed.
2019-04-20T18:31:42.097847+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_56178.trc:
ORA-00600: internal error code, arguments: [kcvfdb_pdb_set_clean_scn: cleanckpt], [3], [1739494], [38655308813], [2], [], [], [], [], [], [], []
PDB1(3):JIT: pid 56178 requesting stop
2019-04-20T18:31:42.098808+08:00
Dumping diagnostic data in directory=[cdmp_20190420183142], requested by (instance=1, osid=56178), summary=[incident=69185].
2019-04-20T18:31:42.138818+08:00
PDB1(3):Buffer Cache flush deferred for PDB 3
PDB1(3):ORA-600 signalled during: alter database open...

主要错误是ORA-600 kcvfdb_pdb_set_clean_scn: cleanckpt,通过查询mos,依旧发现mos上有的主要可能的bug
kcvfdb_pdb_set_clean_scn cleanckpt


通过人工修改数据文件的checkpoint scn解决该问题,pdb open成功