系统故障oracle数据库恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:系统故障oracle数据库恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

由于系统故障,导致操作系统进入,客户通过其他方式进入系统拷贝出来数据文件,redo,ctl等文件,安装版本相同的数据库,修改相关路径,启动数据库,但是启动报错,让我们给予技术支持.数据库open报ORA-600 2662错误

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [2], [2313731576], [2],
[2313735660], [12583040], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [2662], [2], [2313731575], [2],
[2313735660], [12583040], [], [], [], [], [], []
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [2], [2313731573], [2],
[2313735660], [12583040], [], [], [], [], [], []
Process ID: 22446
Session ID: 577 Serial number: 1

alert日志报错

Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_22446.trc:
ORA-00600: internal error code, arguments:[2662],[2],[2313731573],[2],[2313735660],[12583040],[],[],[],[],[],[]
Errors in file /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_22446.trc:
ORA-00600: internal error code, arguments:[2662],[2],[2313731573],[2],[2313735660],[12583040],[],[],[],[],[],[]
Error 600 happened during db open, shutting down database
USER (ospid: 22446): terminating the instance due to error 600

这个错误比较常见,特别是使用了_allow_resetlogs_corruption屏蔽一致性强制拉库的时候.解决该问题比较简单,修改数据库scn,然后open数据库成功,参考部分案例_allow_resetlogs_corruption

[oracle@localhost ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Sat Jun 20 08:31:55 2020

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup mount pfile='/tmp/pfile';
ORACLE instance started.

Total System Global Area 4275781632 bytes
Fixed Size                  2235208 bytes
Variable Size            2902459576 bytes
Database Buffers         1325400064 bytes
Redo Buffers               45686784 bytes
Database mounted.
SQL> alter database open;

Database altered.

尝试导出数据

[oracle@localhost ~]$ tail -f nohup.out 
. exporting foreign function library names for user XIFENFEI 
. exporting PUBLIC type synonyms
. exporting private type synonyms
. exporting object type definitions for user XIFENFEI 
About to export LIOVBJ2017's objects ...
. exporting database links
. exporting sequence numbers
. exporting cluster definitions
. about to export LIOVBJ2017's tables via Conventional Path ...
. . exporting table                           ABCD
                                                            1 rows exported
. . exporting table               TB_D_RECORD
EXP-00008: ORACLE error 1578 encountered
ORA-01578: ORACLE data block corrupted (file # 1, block # 290344)
ORA-01110: data file 1: '/home/oracle/app/oracle/oradata/orcl/system01.dbf'
. . exporting table                      TB_DRIVER
EXP-00008: ORACLE error 1578 encountered
ORA-01578: ORACLE data block corrupted (file # 1, block # 290344)
ORA-01110: data file 1: '/home/oracle/app/oracle/oradata/orcl/system01.dbf'
. . exporting table               TB_XFF
EXP-00008: ORACLE error 1578 encountered
ORA-01578: ORACLE data block corrupted (file # 1, block # 290344)
ORA-01110: data file 1: '/home/oracle/app/oracle/oradata/orcl/system01.dbf'
. . exporting table              TB_XFF_TM_REL
EXP-00008: ORACLE error 1578 encountered
ORA-01578: ORACLE data block corrupted (file # 1, block # 290344)
ORA-01110: data file 1: '/home/oracle/app/oracle/oradata/orcl/system01.dbf'
. . exporting table                    TB_LOCATION

由于file # 1, block # 290344坏块导致数据无法导出,通过dbv检查数据文件

[oracle@localhost trace]$ dbv file=/home/oracle/app/oracle/oradata/orcl/system01.dbf

DBVERIFY: Release 11.2.0.3.0 - Production on Sat Jun 20 08:43:49 2020

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /home/oracle/app/oracle/oradata/orcl/system01.dbf
Page 290344 is influx - most likely media corrupt
Corrupt block relative dba: 0x00446e28 (file 1, block 290344)
Fractured block found during dbv: 
Data in bad block:
 type: 6 format: 2 rdba: 0x00446e28
 last change scn: 0x0002.89e2b718 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x61980601
 check value in block header: 0xe118
 computed block checksum: 0xd680



DBVERIFY - Verification complete

Total Pages Examined         : 298240
Total Pages Processed (Data) : 257035
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 13457
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 3598
Total Pages Processed (Seg)  : 1
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 24149
Total Pages Marked Corrupt   : 1
Total Pages Influx           : 1
Total Pages Encrypted        : 0
Highest block SCN            : 3221247831 (2.3221247831)

确认只有一个坏块,尝试通过bbed进行坏块修复

BBED> set blocksize 8192
        BLOCKSIZE       8192

BBED> set block 290344
        BLOCK#          290344

BBED> map
 File: /home/oracle/app/oracle/oradata/orcl/system01.dbf (0)
 Block: 290344                                Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Index Leaf)

 struct kcbh, 20 bytes                      @0       

 struct ktbbh, 72 bytes                     @20      

 struct kdxle, 32 bytes                     @92      

 sb2 kd_off[231]                            @124     

 ub1 freespace[3026]                        @586     

 ub1 rowdata[4508]                          @3612    

 ub4 tailchk                                @8188    


BBED> verify
DBVERIFY - Verification starting
FILE = /home/oracle/app/oracle/oradata/orcl/system01.dbf
BLOCK = 290344

Block 290344 is corrupt
Corrupt block relative dba: 0x00446e28 (file 0, block 290344)
Fractured block found during verification
Data in bad block:
 type: 6 format: 2 rdba: 0x00446e28
 last change scn: 0x0002.89e2b718 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x61980601
 check value in block header: 0xe118
 computed block checksum: 0xd680


DBVERIFY - Verification complete

Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 1
Total Blocks Influx           : 2
Message 531 not found;  product=RDBMS; facility=BBED


BBED> set mode edit
        MODE            Edit

BBED> p kcbh
struct kcbh, 20 bytes                       @0       
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub1 spare1_kcbh                          @2        0x00
   ub1 spare2_kcbh                          @3        0x00
   ub4 rdba_kcbh                            @4        0x00446e28
   ub4 bas_kcbh                             @8        0x89e2b718
   ub2 wrp_kcbh                             @12       0x0002
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x06 (KCBHFDLC, KCBHFCKV)
   ub2 chkval_kcbh                          @16       0xe118
   ub2 spare3_kcbh                          @18       0x0000

BBED> p tailchk
ub4 tailchk                                 @8188     0x61980601

BBED> d /v
 File: /home/oracle/app/oracle/oradata/orcl/system01.dbf (0)
 Block: 290344  Offsets: 8188 to 8191  Dba:0x00000000
-------------------------------------------------------
 01069861                            l ...a

 <16 bytes per line>

BBED> m /x 010618b7
 File: /home/oracle/app/oracle/oradata/orcl/system01.dbf (0)
 Block: 290344           Offsets: 8188 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 010618b7 

 <32 bytes per line>

BBED> sum apply
Check value for File 0, Block 290344:
current = 0xe118, required = 0xe118

BBED> verify 
DBVERIFY - Verification starting
FILE = /home/oracle/app/oracle/oradata/orcl/system01.dbf
BLOCK = 290344


DBVERIFY - Verification complete

Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 1
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0
Message 531 not found;  product=RDBMS; facility=BBED

继续尝试导出数据,遭遇ORA-08103,参考相关文章:
模拟普通ORA-08103并解决
模拟极端ORA-08103并解决
数据库启动ORA-08103故障恢复

EXP-00056: ORACLE error 8103 encountered
ORA-08103: object no longer exists

通过对其进行处理,恢复该记录之外的所有记录,客户创建新库导入数据,数据库恢复基本完成

12.2数据库启动报ORA-007445 lmebucp错误

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:12.2数据库启动报ORA-007445 lmebucp错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有一个客户找到我们,说他们是数据库启动之时报的错误和数据库不能open 报ORA-7445 lmebucp错类似,让我们对其进行恢复支持,通过分析确定客户数据库版本为12.2.0.1

Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Productio
PL/SQL Release 12.2.0.1.0 - Production	
CORE 12.2.0.1.0 Production	
TNS for Linux: Version 12.2.0.1.0 - Production	
NLSRTL Version 12.2.0.1.0 - Production	

alert日志报错

--最初报错
2020-05-11T03:43:06.787164+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m004_3253.trc  (incident=639048):
ORA-00600: 内部错误代码, 参数: [kkpo_rcinfo_defstg:delseg], [72280], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_639048/orcl_m004_3253_i639048.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2020-05-11T03:43:14.250993+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m004_3253.trc  (incident=639049):
ORA-00600: 内部错误代码, 参数: [kkpo_rcinfo_defstg:delseg], [72280], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_639049/orcl_m004_3253_i639049.trc
2020-05-11T03:43:21.286310+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m004_3253.trc  (incident=639050):
ORA-00600: 内部错误代码, 参数: [kkpo_rcinfo_defstg:delseg], [72280], [], [], [], [], [], [], [], [], [], []
ORA-06512: 在 line 2
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_639050/orcl_m004_3253_i639050.trc
2020-05-11T03:43:28.059048+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2020-05-11T03:43:28.074681+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m004_3253.trc:
ORA-00600: 内部错误代码, 参数: [kkpo_rcinfo_defstg:delseg], [72280], [], [], [], [], [], [], [], [], [], []
ORA-06512: 在 line 2
2020-05-11T08:31:22.416087+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_16511.trc:
ORA-12012: 自动执行作业 141 出错
ORA-30732: 表中不包含用户可见的列
ORA-06512: 在 line 1


---关闭数据库之后重启报错
2020-05-11T09:42:43.234769+08:00
ALTER DATABASE OPEN
2020-05-11T09:42:44.353085+08:00
Ping without log force is disabled:
  instance mounted in exclusive mode.
Endian type of dictionary set to little
2020-05-11T09:42:44.660388+08:00
TT00: Gap Manager starting (PID:31134)
2020-05-11T09:42:45.180876+08:00
Thread 1 opened at log sequence 78596
  Current log# 2 seq# 78596 mem# 0: /u01/app/oracle/oradata/orcl/redo02.log
Successful open of redo thread 1
2020-05-11T09:42:45.181357+08:00
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Exception [type: SIGSEGV, Address not mapped to object][ADDR:0x0][PC:0x10F4C112, lmebucp()+34][flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_31125.trc  (incident=646445):
ORA-07445: exception encountered:core dump[lmebucp()+34][SIGSEGV][ADDR:0x0][PC:0x10F4C112][Address not mapped to object]
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_646445/orcl_ora_31125_i646445.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2020-05-11T09:42:46.070049+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2020-05-11T09:42:53.344955+08:00
Instance Critical Process (pid: 41, ospid: 31125) died unexpectedly
PMON (ospid: 31026): terminating the instance due to error 12752
2020-05-11T09:42:53.377209+08:00
System state dump requested by (instance=1, osid=31026 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_diag_31046_20200511094253.trc
2020-05-11T09:42:56.690224+08:00
Instance terminated by PMON, pid = 31026

对启动过程做10046跟踪

PARSING IN CURSOR #139821999713040 len=65 dep=1 uid=0 oct=3 lid=0 
tim=2200910467 hv=1762642493 ad='1bfd79130'sqlid='aps3qh1nhzkjx'
select line#, sql_text from bootstrap$ where obj# not in (:1, :2)
END OF STMT
PARSE #139821999713040:c=378,e=378,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=2200910467
BINDS #139821999713040:

 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f2ad89fe6c8  bln=22  avl=02  flg=05
  value=59
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f2ad89fe698  bln=24  avl=06  flg=05
  value=4294967295
EXEC #139821999713040:c=219,e=658,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=867914364,tim=2200911218
WAIT #139821999713040: nam='db file sequential read' ela= 9 file#=1 block#=520 blocks=1 obj#=59 tim=2200911300
WAIT #139821999713040: nam='db file scattered read' ela= 19 file#=1 block#=521 blocks=3 obj#=59 tim=2200911559
FETCH #139821999713040:c=404,e=404,p=4,cr=5,cu=0,mis=0,r=0,dep=1,og=4,plh=867914364,tim=2200911646
STAT #139821999713040 id=1 cnt=0 pid=0 pos=1 obj=59 op='TABLE ACCESS FULL BOOTSTRAP$ (cr=5 pr=4 pw=0 str=1 time=406 us)'

*** 2020-05-11T12:25:30.977832+08:00
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x0] [PC:0x10F4C112, lmebucp()+34] [flags: 0x0, count: 1]
2020-05-11T12:25:31.026914+08:00
Incident 718445 created, dump file:/u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_718445/orcl_ora_14324_i718445.trc
ORA-07445: exception encountered:core dump [lmebucp()+34][SIGSEGV][ADDR:0x0][PC:0x10F4C112][Address not mapped to object]

根据该报错,可以大概定位数据库重启之后报ORA-07445 lmebucp()+34错误不能正常启动是由于bootstrap$表异常导致.

BBED> p kcvfhrdb
ub4 kcvfhrdb                                @96       0x00400208

BBED> set block 523
        BLOCK#          523

BBED> map
 File: F:\temp\12.2\1\system01.dbf (0)
 Block: 523                                   Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Table/Cluster)

 struct kcbh, 20 bytes                      @0

 struct ktbbh, 48 bytes                     @20

 struct kdbh, 14 bytes                      @68

 struct kdbt[1], 4 bytes                    @82

 sb2 kdbr[20]                               @86

 ub1 freespace[1015]                        @126

 ub1 rowdata[7047]                          @1141

 ub4 tailchk                                @8188

BBED> p *kdbr[1]
rowdata[6431]
-------------
ub1 rowdata[6431]                           @7572     0x3c

BBED> x /rnnc
rowdata[6431]                               @7572
-------------
flag@7572: 0x3c (KDRHFL, KDRHFF, KDRHFD, KDRHFH)
lock@7573: 0x01
cols@7574:    0

通过bbed定位rootdba,然后dump相关block,随机找一条记录,确认bootstrap表无后效记录.但是该数据库在重启之前已经报了ORA-600 kkpo_rcinfo_defstg:delseg和ORA-30732错误,很可能还有其他的基表异常.通过先修复bootstrap$记录,然后根据该表中记录分析其他相关表记录,最终确定tab$中记录也异常,通过bbed 批量循环修复方法,对其进行恢复,open数据库,可以验证数据没有问题.至此该问题解决,但是没有找出来故障原因(是人为破坏【直接人工删除】,还是某种工具带入恶意脚本导致,类似:plsql dev引起的数据库被黑勒索比特币实现原理分析和解决方案,亦或者是数据库安装介质带有恶意程序,类似:plsql dev引起的数据库被黑勒索比特币实现原理分析和解决方案)

在数据库恢复遭遇ORA-07445 kgegpa错误

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:在数据库恢复遭遇ORA-07445 kgegpa错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到客户恢复请求,数据库启动报ORA-600 2662错误

Fri Apr 24 19:52:58 2020
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 15491509441794
Resetting resetlogs activation ID 1460987657 (0x5714e709)
Fri Apr 24 19:52:59 2020
Setting recovery target incarnation to 3
Fri Apr 24 19:52:59 2020
Assigning activation ID 1566342598 (0x5d5c7dc6)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: Y:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Apr 24 19:52:59 2020
SMON: enabling cache recovery
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_3860.trc  (incident=8561):
ORA-00600: 内部错误代码, 参数: [2662], [3606], [3857372426], [3606], [3857377059], [12583040], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_8561\orcl_ora_3860_i8561.trc
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_3860.trc:
ORA-00600: 内部错误代码, 参数: [2662], [3606], [3857372426], [3606], [3857377059], [12583040], [], [], [], [], [], []
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_3860.trc:
ORA-00600: 内部错误代码, 参数: [2662], [3606], [3857372426], [3606], [3857377059], [12583040], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 3860): terminating the instance due to error 600
Instance terminated by USER, pid = 3860
ORA-1092 signalled during: alter database open resetlogs...

这个错误比较常见,通过对数据库scn进行调整,顺利规避该错误,继续启动报如下错误

SQL> startup mount pfile='d:/pfile.txt';
ORACLE 例程已经启动。

Total System Global Area 1.3696E+10 bytes
Fixed Size                  2188768 bytes
Variable Size            6878661152 bytes
Database Buffers         6777995264 bytes
Redo Buffers               37044224 bytes
数据库装载完毕。
SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-03113: 通信通道的文件结尾
进程 ID: 5884
会话 ID: 66 序列号: 3
Fri Apr 24 20:57:49 2020
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 2.
Dictionary check beginning
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0x898ADE43] [PC:0x9287D88, kgegpa()+38]
Dump file d:\app\administrator\diag\rdbms\orcl\orcl\trace\alert_orcl.log
Fri Apr 24 20:57:49 2020
ORACLE V11.2.0.1.0 - 64bit Production vsnsta=0
vsnsql=16 vsnxtr=3
Windows NT Version V6.1  
CPU                 : 16 - type 8664, 16 Physical Cores
Process Affinity    : 0x0x0000000000000000
Memory (Avail/Total): Ph:21429M/32767M, Ph+PgF:54255M/65533M 
Fri Apr 24 20:57:49 2020
Errors in file 
ORA-07445: caught exception [ACCESS_VIOLATION] at [kgegpa()+38] [0x0000000009287D88]
Fri Apr 24 20:57:52 2020
PMON (ospid: 2496): terminating the instance due to error 397
Instance terminated by PMON, pid = 2496

这里的主要错误是由于ORA-07445 kgegpa,根据以前恢复经验,该问题很可能和undo有关,对undo进行处理之后启动库

SQL> startup mount pfile='d:/pfile.txt' ;
ORACLE 例程已经启动。

Total System Global Area 1.3696E+10 bytes
Fixed Size                  2188768 bytes
Variable Size            6878661152 bytes
Database Buffers         6777995264 bytes
Redo Buffers               37044224 bytes
数据库装载完毕。
SQL> recover database;
完成介质恢复。
SQL> alter database open;

数据库已更改。

SMON: enabling tx recovery
Database Characterset is ZHS16GBK
SMON: Restarting fast_start parallel rollback
Fri Apr 24 21:01:28 2020
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_p000_4360.trc  (incident=13377):
ORA-00600: internal error code, arguments: [4198], [], [], [], [], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_13377\orcl_p000_4360_i13377.trc
Stopping background process MMNL
Doing block recovery for file 3 block 296
Resuming block recovery (PMON) for file 3 block 296
Block recovery from logseq 3, block 25 to scn 15491947056761
Recovery of Online Redo Log: Thread 1 Group 3 Seq 3 Reading mem 0
  Mem# 0: Y:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO03.LOG
Block recovery completed at rba 3.25.16, scn 3607.20090
Doing block recovery for file 6 block 165592
Resuming block recovery (PMON) for file 6 block 165592
Block recovery from logseq 3, block 33 to scn 15491947056769
Recovery of Online Redo Log: Thread 1 Group 3 Seq 3 Reading mem 0
  Mem# 0: Y:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO03.LOG
Block recovery completed at rba 3.58.16, scn 3607.20098
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_4912.trc  (incident=13321):
ORA-00600: internal error code, arguments: [4198], [], [], [], [], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_13321\orcl_smon_4912_i13321.trc
SMON: Parallel transaction recovery slave got internal error
SMON: Downgrading transaction recovery to serial
Stopping background process MMON
Fri Apr 24 21:01:29 2020
Trace dumping is performing id=[cdmp_20200424210129]
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_4912.trc  (incident=13322):
ORA-00600: internal error code, arguments: [4137], [12.30.1712324], [0], [0], [], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_13322\orcl_smon_4912_i13322.trc
ORACLE Instance orcl (pid = 14) - Error 600 encountered while recovering transaction (12, 30).
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_4912.trc:
ORA-00600: internal error code, arguments: [4137], [12.30.1712324], [0], [0], [], [], [], [], [], [], [], []
Completed: alter database open upgrade
Fri Apr 24 21:01:30 2020
MMON started with pid=16, OS id=4980 
Fri Apr 24 21:01:31 2020
Sweep [inc][13322]: completed
Corrupt block relative dba: 0x00c395ee (file 3, block 234990)
Fractured block found during buffer read
Data in bad block:
 type: 2 format: 2 rdba: 0x00c395ee
 last change scn: 0x0e16.e5ead38b seq: 0x2b flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xdb720232
 check value in block header: 0xebe2
 computed block checksum: 0xb60b
Reading datafile'Y:\APP\ADMINISTRATOR\ORADATA\ORCL\UNDOTBS01.DBF'for corruption at rdba: 0x00c395ee (file 3,block 234990)
Reread (file 3, block 234990) found same corrupt data
Corrupt Block Found
         TSN = 2, TSNAME = UNDOTBS1
         RFN = 3, BLK = 234990, RDBA = 12817902
         OBJN = 0, OBJD = -1, OBJECT = , SUBOBJECT = 
         SEGMENT OWNER = , SEGMENT TYPE = 
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_m001_4852.trc  (incident=13641):
ORA-01578: ORACLE data block corrupted (file # 3, block # 234990)
ORA-01110: data file 3: 'Y:\APP\ADMINISTRATOR\ORADATA\ORCL\UNDOTBS01.DBF'
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_13641\orcl_m001_4852_i13641.trc
SQL> create undo tablespace undotbs2 datafile 
2   'Y:\APP\ADMINISTRATOR\ORADATA\ORCL\undo_xff02.dbf' size 128M autoextend on;

表空间已创建。

SQL> drop tablespace undotbs1 including contents and datafiles;

表空间已删除。

SQL> shutdown immediate;
数据库已经关闭。
已经卸载数据库。
ORACLE 例程已经关闭。
SQL> create spfile from pfile='d:/pfile.txt';

文件已创建。

SQL> startup mount
ORACLE 例程已经启动。

Total System Global Area 1.3696E+10 bytes
Fixed Size                  2188768 bytes
Variable Size            6878661152 bytes
Database Buffers         6777995264 bytes
Redo Buffers               37044224 bytes
数据库装载完毕。
SQL> alter database open;

数据库已更改。

数据库启动之后继续报出来的ORA-600 4198和ORA-600 4137以及undo坏块均证明是由于undo异常引起的问题,通过重建新undo,数据库open正常,安排客户进行数据导出导入到新库

Oracle 19c故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle 19c故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户找到我们,他们的oracle 19c数据库由于异常断电,导致启动异常,经过一系列恢复之后,依旧无法解决问题,请求我们给予支持.通过我们的Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check),获取数据库当前信息如下:
数据库版本为19C并且安装了19.5.0.0.191015 (30125133)补丁
20200310220453
20200310220748


数据库使用pdb
20200310220610

数据库启动成功后,一会就crash掉

2020-03-10T01:44:41.018032+08:00
Pluggable database RACBAK opened read write
2020-03-10T01:44:41.018996+08:00
Pluggable database RAC opened read write
2020-03-10T01:44:51.244050+08:00
Completed: ALTER PLUGGABLE DATABASE ALL OPEN
Starting background process CJQ0
Completed: ALTER DATABASE OPEN
2020-03-10T01:44:51.317085+08:00
CJQ0 started with pid=224, OS id=32581 
2020-03-10T01:44:56.067043+08:00
Errors in file /opt/oracle/diag/rdbms/XFF/XFF/trace/XFF_j001_32588.trc  (incident=1095281) (PDBNAME=RAC):
ORA-00600: internal error code, arguments: [4193], [27733], [27754], [], [], [], [], [], [], [], [], []
RAC(4):Incident details in: /opt/oracle/diag/rdbms/XFF/XFF/incident/incdir_1095281/XFF_j001_32588_i1095281.trc
RAC(4):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2020-03-10T01:44:56.073112+08:00
RAC(4):*****************************************************************
RAC(4):An internal routine has requested a dump of selected redo.
RAC(4):This usually happens following a specific internal error, when
RAC(4):analysis of the redo logs will help Oracle Support with the
RAC(4):diagnosis.
RAC(4):It is recommended that you retain all the redo logs generated (by
RAC(4):all the instances) during the past 12 hours, in case additional
RAC(4):redo dumps are required to help with the diagnosis.
RAC(4):*****************************************************************
2020-03-10T01:44:56.079228+08:00
Errors in file /opt/oracle/diag/rdbms/XFF/XFF/trace/XFF_j002_32590.trc  (incident=1095289) (PDBNAME=RAC):
ORA-00600: internal error code, arguments: [4193], [2633], [2638], [], [], [], [], [], [], [], [], []
RAC(4):Incident details in: /opt/oracle/diag/rdbms/XFF/XFF/incident/incdir_1095289/XFF_j002_32590_i1095289.trc
RAC(4):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2020-03-10T01:44:56.085068+08:00
RAC(4):*****************************************************************
RAC(4):An internal routine has requested a dump of selected redo.
RAC(4):This usually happens following a specific internal error, when
RAC(4):analysis of the redo logs will help Oracle Support with the
RAC(4):diagnosis.
RAC(4):It is recommended that you retain all the redo logs generated (by
RAC(4):all the instances) during the past 12 hours, in case additional
RAC(4):redo dumps are required to help with the diagnosis.
RAC(4):*****************************************************************
2020-03-10T01:44:56.115765+08:00
Errors in file /opt/oracle/diag/rdbms/XFF/XFF/trace/XFF_j004_32594.trc  (incident=1095305) (PDBNAME=RAC):
ORA-00600: internal error code, arguments: [4193], [63532], [63537], [], [], [], [], [], [], [], [], []
RAC(4):Incident details in: /opt/oracle/diag/rdbms/XFF/XFF/incident/incdir_1095305/XFF_j004_32594_i1095305.trc
RAC(4):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2020-03-10T01:46:48.202213+08:00
RAC(4):Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
RAC(4):  Mem# 0: /opt/oracle/oradata/XFF/redo02.log
RAC(4):Block recovery completed at rba 0.0.0, scn 0x0000000d3675e48e
RAC(4):DDE: Problem Key 'ORA 600 [4193]' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
2020-03-10T01:46:48.384040+08:00
Errors in file /opt/oracle/diag/rdbms/XFF/XFF/trace/XFF_clmn_31741.trc:
ORA-00600: internal error code, arguments: [4193], [27733], [27754], [], [], [], [], [], [], [], [], []
Errors in file /opt/oracle/diag/rdbms/XFF/XFF/trace/XFF_clmn_31741.trc  (incident=1093505) (PDBNAME=CDB$ROOT):
ORA-501 [] [] [] [] [] [] [] [] [] [] [] []
Incident details in: /opt/oracle/diag/rdbms/XFF/XFF/incident/incdir_1093505/XFF_clmn_31741_i1093505.trc
2020-03-10T01:46:49.264624+08:00
USER (ospid: 31741): terminating the instance due to ORA error 501
2020-03-10T01:46:49.280664+08:00
System state dump requested by (instance=1, osid=31741 (CLMN)), summary=[abnormal instance termination].
System State dumped to trace file /opt/oracle/diag/rdbms/XFF/XFF/trace/XFF_diag_31759.trc
2020-03-10T01:46:53.156926+08:00
ORA-00501: CLMN process terminated with error
2020-03-10T01:46:53.157103+08:00
Errors in file /opt/oracle/diag/rdbms/XFF/XFF/trace/XFF_diag_31759.trc:
ORA-00501: CLMN process terminated with error
2020-03-10T01:46:53.157211+08:00
Dumping diagnostic data in directory=[cdmp_20200310014649], requested by (instance=1, osid=31741 (CLMN)), 
summary=[abnormal instance termination].

通过报错信息判断,数据库open之后(特别是pdb 4 open之后),开始报ORA-600 4193错误.然后由于CLMN进程异常,最后数据库crash.对于这类故障,因为使用的pdb,而且是由于pdb的undo异常导致数据库启动之后crash,可以通过对于pdb进行特殊处理,从而实现数据库启动之后不再crash.

Oracle 极端恢复支持

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:Oracle 极端恢复支持

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

所有类似dul工具(包括原厂和三方工具)可以完成的Oracle数据库恢复工作,我们均可通过服务提供恢复支持,而且是恢复数据成功之后收费,恢复不成功不收费

  • 不需要运行Oracle数据库软件,直接读取数据库文件解析数据。
  • 支持ASM,能够直接从ASM磁盘中导出数据,即使相关的磁盘组不能成功mount
  • 支持从ASM中直接抽取出数据文件和其他任意存储在ASM中的文件(包括控制文件、日志文件和归档日志等),即使相关的磁盘组不能成功mount
  • 在ASM磁盘损坏严重的情况下,可以扫描ASM磁盘,提取出没有被覆盖的数据文件,然后再提取和恢复数据。
  • 支持的Oracle数据库版本包括7,8i,9i,10g,11g,12c,18c,19c
  • 支持多种平台的数据库,包括AIX、LINUX、HPUX、SOLARIS、WINDOWS等。能够在一个平台上导出其他平台的数据,比如在Windows 32位系统上,使用AIX系统上的数据文件导出数据。
  • 支持的数据类型包括:NUMBER, CHAR, VARCHAR2, NCHAR,NVARHCAR2, LONG, DATE, RAW, LONG RAW, BLOB, CLOB, TIMESTAMP (9i+) , BINARY FLOAT, BINARY DOUBLE (10g+),XMLTYPE
  • 全面支持LOB字段:
    • 支持CLOB、NCLOB和BLOB
    • CLOB支持Big Endian和Little Endian字节序
    • 支持LOB分区,子分区
    • 支持同一个表中,不同LOB列使用不同CHUNK SIZE的情况
    • CLOB数据可以导出到与其他列相同的文件中,或存储到单独的文件
    • LOB列在没有SYSTEM表空间的情况下仍然能够导出
    • LOB列在相关的lob index损坏的情况下依然能够导出
    • 能够恢复Oracle 11g及以上版本的SecureFile LOB(目前不支持去重和加密的SecureFile LOB)
  • 支持各种表,包括普通的HEAP表,IOT表和聚簇(CLUSTER)表
  • 支持IOT表:
    • 支持普通IOT表的导出
    • 支持压缩IOT表的导出
    • 支持IOT表溢出段
    • 支持IOT表分区(包括子分区)
    • 只能在有SYSTEM表空间时才能导出IOT表
  • 支持压缩表
  • 支持表被truncate后的数据恢复
  • 支持表被drop后的数据恢复
  • 在有SYSTEM表空间的情况下,自动获取数据字典信息
  • 支持在没有SYSTEM表空间和数据字典损坏的情况下恢复数据,在没有数据字典可用时,能够自动判断数据的类型
  • 支持10g及以上的大文件(BigFile)表空间
  • 全面支持64位系统,支持超过4G大小的数据文件。
  • 支持复制操作系统命令不能复制的坏文件
  • 支持同一个库中不同块大小的数据文件。
  • 支持多种字符集之间的转换,能够正确的转换CLOB、NCLOB、NVARCHAR2列类型的数据到指定的字符集。
  • 自动检测数据文件的表空间号和文件号
  • 导出的数据格式包括纯文本、exp dmp和expdp dmp文件三种。以纯文本导出时,能够自动生成建表的SQL语句和SQL*Loader导入所需的control文件
  • 模拟Oracle的dump块功能,能够dump数据文件中的数据块
  • 支持DESC表,以显示表的列定义
  • 支持列出表的分区和子分区
  • 支持对误删除数据的恢复,即使被删除数据的表中有LOB列,即使被删除数据的表中对应row directory中所有记录的offset都已经完全被Oracle清除
  • 支持表创建语句,存储过程,视图,函数,包,索引,约束等非数据恢复

ORA-00600: internal error code, arguments: [2252], [3987]

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-00600: internal error code, arguments: [2252], [3987]

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户数据库版本10.2.0.4,启动成功之后立马crash,让我们协助解决
version


Thu Jul  4 13:03:10 2019
Completed: ALTER DATABASE OPEN
Thu Jul  4 13:03:10 2019
db_recovery_file_dest_size of 2048 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Thu Jul  4 13:04:01 2019
Errors in file /oracle/app/oracle/admin/orcl/bdump/orcl_reco_22268.trc:
ORA-00600: internal error code, arguments: [2252], [3987], [3375047096], [], [], [], [], []
Thu Jul  4 13:04:01 2019
Errors in file /oracle/app/oracle/admin/orcl/bdump/orcl_reco_22268.trc:
ORA-00600: internal error code, arguments: [2252], [3987], [3375047096], [], [], [], [], []
Thu Jul  4 13:04:02 2019
Errors in file /oracle/app/oracle/admin/orcl/bdump/orcl_reco_22268.trc:
ORA-00600: internal error code, arguments: [2252], [3987], [3375047096], [], [], [], [], []
Thu Jul  4 13:04:02 2019
RECO: terminating instance due to error 476
Instance terminated by RECO, pid = 22268

根据以往经验记录一次ORA-00600[2252]故障解决,很可能是scn过大引起.通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)检查scn相关信息
scn


从ORA-600 2252错误信息看,由于scn可能超过该数据库的天花板理论上而导致该问题,而reco进程主要是由于分布式事务引起,通过和客户确认,该库有通过dblink去访问11204版本oracle,而从2019年6月23日之后scn的算法发生了一些改变(SCN Compatibility问题汇总-2019年6月23日),导致数据库可以支持更大的scn,从而当低版本需要进行分布式事务操作之时,可能导致数据库异常.

处理方案:通过临时屏蔽分布式事务,让数据库临时正常工作;长期解决方案需要把数据库版本升级,避免scn引起相关问题

ORA-00470: LGWR process terminated with error

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-00470: LGWR process terminated with error

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户win 10.2.0.1数据库由于断电之后无法正常启动,报ORA-00470错误

SQL> startup mount;
ORACLE 例程已经启动。
Total System Global Area  293601280 bytes
Fixed Size                  1248600 bytes
Variable Size              92275368 bytes
Database Buffers          192937984 bytes
Redo Buffers                7139328 bytes
数据库装载完毕。
SQL> recoer database;
SQL> recover database;
完成介质恢复。
SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-00470: LGWR 进程因错误而终止

查看alert日志发现报ORA-600 kcrfwfl_nab错误,导致后台进程异常

Mon May 13 10:38:52 2019
alter database open
Mon May 13 10:38:52 2019
Thread 1 opened at log sequence 1699
  Current log# 3 seq# 1699 mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORACLE\REDO03.LOG
Successful open of redo thread 1
Mon May 13 10:38:52 2019
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon May 13 10:38:52 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_lgwr_2308.trc:
ORA-00600: internal error code, arguments: [kcrfwfl_nab], [4294967295], [102401], [], [], [], [], []
Mon May 13 10:38:53 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_lgwr_2308.trc:
ORA-00600: internal error code, arguments: [kcrfwfl_nab], [4294967295], [102401], [], [], [], [], []
Mon May 13 10:38:53 2019
LGWR: terminating instance due to error 470
Mon May 13 10:38:53 2019
ORA-470 signalled during: alter database open...
Mon May 13 10:38:53 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_pmon_2928.trc:
ORA-00470: LGWR process terminated with error
Mon May 13 10:38:54 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_reco_3224.trc:
ORA-00470: LGWR process terminated with error
Mon May 13 10:38:54 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_smon_3064.trc:
ORA-00470: LGWR process terminated with error
Mon May 13 10:38:54 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_ckpt_3640.trc:
ORA-00470: LGWR process terminated with error
Mon May 13 10:38:54 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_dbw0_1976.trc:
ORA-00470: LGWR process terminated with error
Mon May 13 10:38:54 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_mman_2356.trc:
ORA-00470: LGWR process terminated with error
Mon May 13 10:38:54 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_psp0_3876.trc:
ORA-00470: LGWR process terminated with error
Mon May 13 10:38:54 2019
Instance terminated by LGWR, pid = 2308

尝试启动数据库到upgrade模式,报ORA-600 2758错误

SQL> alter database open upgrade;
alter database open upgrade
*
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [2758], [3], [4294967295], [102400], [10], [],[], []

对应alert日志报错

Mon May 13 10:45:50 2019
Completed redo application
Mon May 13 10:45:50 2019
Completed crash recovery at
 Thread 1: logseq 1699, block 4294967295, scn 56807170
 0 data blocks read, 0 data blocks written, 0 redo blocks read
Mon May 13 10:45:50 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\udump\oracle_ora_3220.trc:
ORA-00600: 内部错误代码, 参数: [2758], [3], [4294967295], [102400], [10], [], [], []
Mon May 13 10:45:51 2019
Aborting crash recovery due to error 600
Mon May 13 10:45:51 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\udump\oracle_ora_3220.trc:
ORA-00600: 内部错误代码, 参数: [2758], [3], [4294967295], [102400], [10], [], [], []
ORA-600 signalled during: alter database open upgrade...

通过上述相关报错分析,以及ORA-600 kcrfwfl_nab和ORA-600 2758报错的相关资料查询,确定是由于redo和ctl损坏导致,通过强制拉库,恢复成功

SQL> recover database;
完成介质恢复。
SQL> alter database open resetlogs;
数据库已更改。

存储双活系统逻辑损坏数据库抢救恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:存储双活系统逻辑损坏数据库抢救恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

计划休假的前一夜晚上节点朋友求救电话,说xx医院核心his系统的Oracle数据库很多表报ORA-8103错误,业务无法正常办理.
ora-8103


通过dbv检查文件发现连续坏块
2


根据以往经验数据库出现类似这样的错误,很可能是底层问题,查看系统日志发现大量磁盘错误
3


该报错时间和应用反馈系统异常时间基本上匹配,初步怀疑是硬件或者os异常导致.因为客户数据库大量表表ORA-8103,而且有文件出现连片被置空,无法准确定位数据库损坏情况(置空值数据库级别的物理损坏,ora-8103是逻辑错误在表不被访问的情况下无法检查出来),考虑分析客户的硬件环境,备份容灾情况,分析选择最佳方案.
通过和客户沟通以及检查数据库的相关情况发现信息如下:
1)存储使用的是xx厂商的双活方案,这种存储级解决方案对于该故障来说没用,因为是lun的逻辑级别损坏,损坏数据同时同步到两套存储上.
2)数据库库容灾使用的是某厂家的cdp同步容灾,客户对cdp库进行分析,发现数据同步异常,基本上该方案也无法使用
3)数据库的备份情况:由于存放数据库备份的存储电池异常和有坏盘导致存储写io效率非常低,客户在3天之前停止掉了文件系统中的rman备份;有tsm的带库备份,结果检查发现竟无一次备份成功.
故障进一步扩大
针对客户情况,确定是节点2有明显异常,准备停掉节点2的数据库和集群,然后看下在节点1上是否有改善,结果发现把节点2的crs停掉之后,节点1的库直接crash,通过分析发现asm disk有一块盘磁盘前几M表直接置空(应该是在关闭crs之前就已经异常,只是因为磁盘头部分数据没有相关操作,因此没有触发相关问题),当一个节点关闭会去写磁盘头信息,asm发现异常直接dismount 节点1的磁盘组了,从而使得节点1的库异常.
4


5


现在的情况:
1)现在的asm 磁盘组异常(其中一个磁盘头前几M损坏),也就是说在原库基础上直接修复的概率基本上没有可能
2)cdp数据异常,不可用
3)在数据库相关服务器中找到一份4天之前的一次全备
恢复思路:
1.客户准备新空间,直接把4天之前的备份还原到本地文件系统中
2.通过底层工具对于有磁盘损坏的asm磁盘组进行分析,尝试恢复归档日志和redo(尽可能做到最大限度恢复数据)
3.通过备份还原4天之前的备份结合我们恢复的归档日志和redo尝试完全恢复数据
4.问题风险,就算归档日志和redo从损坏的asm 磁盘组中恢复出来,但是也有可能损坏,导致后面无法恢复到最新数据(造成数据丢失)
实际操作:
1. 由于客户在昨天晚上故障之后增加了一些undo数据文件,使得无法正常全库restore database(因为ctl中数据文件信息比备份集中多)
2. 后续由于10204 rac还原到单机出现ORA-600 kgeade_is_1错误
6


3. 数据库恢复完成之后,出现sqlplus 操作数据库正常,plsql dev和应用访问数据库报ora-27092的问题
7


最后运气不错,经过一系列努力,数据库open成功,应用也正常访问,最初生产环境中损坏的表现在查询也不再报ORA-8103,dbv检查异常文件也ok
8


9


再次提醒各位朋友:
1)你的数据库备份是否正常,建议定期做故障演练
2)选择合适数据库的容灾方案,建议定期检查或者演练
3)存储双活可以解决硬件故障问题,但是还要有适当的解决方案来规避存储逻辑错误风险.

数据库open过程遭遇ORA-1555对应sql语句补充

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:数据库open过程遭遇ORA-1555对应sql语句补充

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在2015年的在数据库open过程中常遇到ORA-01555汇总文章中写过oracle open过程中可能会遇到ORA-01555错误,对应的sql语句.最近的恢复中又遇到两个新的,对其进行补充
select rowcnt,blkcnt,empcnt,avgspc,chncnt,avgrln,nvl(degree,1), nvl(instances,1) from tab$ where obj# = :1

Thu May 09 02:10:27 2019
SMON: enabling cache recovery
ORA-01555 caused by SQL statement below (SQL ID: bqbdby3c400p7, SCN: 0x0000.3e785fc7):
select rowcnt,blkcnt,empcnt,avgspc,chncnt,avgrln,nvl(degree,1), nvl(instances,1) from tab$ where obj# = :1
Errors in file /home/u01/diag/rdbms/orcl/orcl/trace/orcl_ora_15929.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 91 with name "_SYSSMU91_1360910548$" too small
Errors in file /home/u01/diag/rdbms/orcl/orcl/trace/orcl_ora_15929.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 91 with name "_SYSSMU91_1360910548$" too small
Error 704 happened during db open, shutting down database
USER (ospid: 15929): terminating the instance due to error 704
Instance terminated by USER, pid = 15929
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (15929) as a result of ORA-1092
Thu May 09 02:10:28 2019
ORA-1092 : opitsk aborting process

select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null

NSA2 started with pid=41, OS id=32571518
ORA-01555 caused by SQL statement below (SQL ID: 3nkd3g3ju5ph1, Query Duration=0 sec, SCN: 0x0005.e4bea784):
select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from
    obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null
Errors in file /u01/app/oracle/diag/rdbms/xifenfei_std/xifenfei/trace/xifenfei_ora_18939904.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 7 with name "_SYSSMU7_542380376$" too small
Errors in file /u01/app/oracle/diag/rdbms/xifenfei_std/xifenfei/trace/xifenfei_ora_18939904.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 7 with name "_SYSSMU7_542380376$" too small
Error 704 happened during db open, shutting down database
USER (ospid: 18939904): terminating the instance due to error 704
Instance terminated by USER, pid = 18939904
ORA-1092 signalled during: alter database open RESETLOGS...
opiodr aborting process unknown ospid (18939904) as a result of ORA-1092

ORA-600 kcffo_online_pdb_check: fno_system 和 ORA-600 kcvfdb_pdb_set_clean_scn: cleanckpt错误

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-600 kcffo_online_pdb_check: fno_system 和 ORA-600 kcvfdb_pdb_set_clean_scn: cleanckpt错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在做18c模拟故障测试中,经过自己一系列折腾,主要遭遇了ORA-600 kcffo_online_pdb_check: fno_system 和 ORA-600 kcvfdb_pdb_set_clean_scn: cleanckpt错误,这些都是pdb特有的,主要是由于一些bug引起,在非pdb环境中不太可能遇到.其实这也就是说明由于pdb机制的引入,使得后续的数据库异常恢复中会更加复杂.
18c数据库open ORA-00603 ORA-01092 ORA-00600报错

[oracle@ora11g tmp]$ ss
SQL*Plus: Release 18.0.0.0.0 - Production on Sat Apr 20 21:18:09 2019
Version 18.3.0.0.0
Copyright (c) 1982, 2018, Oracle.  All rights reserved.
Connected to:
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.3.0.0.0
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
BANNER_FULL
--------------------------------------------------------------------------------
BANNER_LEGACY
--------------------------------------------------------------------------------
    CON_ID
----------
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.3.0.0.0
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
         0
BANNER
--------------------------------------------------------------------------------
BANNER_FULL
--------------------------------------------------------------------------------
BANNER_LEGACY
--------------------------------------------------------------------------------
    CON_ID
----------
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [],
[], [], []
Process ID: 55775
Session ID: 135 Serial number: 49652

alert日志信息

Database Characterset is AL32UTF8
2019-04-20T18:20:54.256841+08:00
No Resource Manager plan active
2019-04-20T18:20:56.751241+08:00
replication_dependency_tracking turned off (no async multimaster replication found)
2019-04-20T18:20:57.862516+08:00
Starting background process AQPC
2019-04-20T18:20:58.341991+08:00
AQPC started with pid=45, OS id=55830
2019-04-20T18:21:01.476252+08:00
PDB$SEED(2):Autotune of undo retention is turned on.
2019-04-20T18:21:01.738732+08:00
Pdb PDB$SEED hit error 1157 during open read only (2) and will be closed.
2019-04-20T18:21:01.755310+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc:
ORA-01157: cannot identify/lock data file 5 - see DBWR trace file
ORA-01110: data file 5: '/u01/app/oracle/oradata/ORCL18C/pdbseed/system01.dbf'
PDB$SEED(2):JIT: pid 55775 requesting stop
PDB$SEED(2):Buffer Cache flush deferred for PDB 2
Could not open PDB$SEED error=1157
2019-04-20T18:21:01.887601+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc:
ORA-01157: cannot identify/lock data file 5 - see DBWR trace file
ORA-01110: data file 5: '/u01/app/oracle/oradata/ORCL18C/pdbseed/system01.dbf'
2019-04-20T18:21:03.385503+08:00
PDB1(3):Autotune of undo retention is turned on.
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_p000_55808.trc  (incident=66865) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check: fno_system], [3], [], [], [], [], [], [], [], [], [], []
2019-04-20T18:21:03.682428+08:00
PDB2(4):Autotune of undo retention is turned on.
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_66865/orcl18c_p000_55808_i66865.trc
2019-04-20T18:21:12.863880+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2019-04-20T18:21:12.879921+08:00
Pdb PDB1 hit error 600 during open read write (5) and will be closed.
2019-04-20T18:21:12.880506+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_p000_55808.trc:
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check: fno_system], [3], [], [], [], [], [], [], [], [], [], []
PDB1(3):JIT: pid 55808 requesting stop
2019-04-20T18:21:12.915407+08:00
Dumping diagnostic data in directory=[cdmp_20190420182112], requested by (instance=1, osid=55808 (P000)), summary=[incident=66865].
2019-04-20T18:21:12.989890+08:00
PDB1(3):Buffer Cache flush deferred for PDB 3
2019-04-20T18:21:13.004575+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_p001_55810.trc  (incident=66873) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check: fno_system], [4], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_66873/orcl18c_p001_55810_i66873.trc
2019-04-20T18:21:17.218642+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2019-04-20T18:21:17.222057+08:00
Pdb PDB2 hit error 600 during open read write (5) and will be closed.
2019-04-20T18:21:17.222236+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_p001_55810.trc:
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check: fno_system], [4], [], [], [], [], [], [], [], [], [], []
2019-04-20T18:21:17.260941+08:00
Dumping diagnostic data in directory=[cdmp_20190420182117], requested by (instance=1, osid=55810 (P001)), summary=[incident=66873].
2019-04-20T18:21:17.262023+08:00
PDB2(4):JIT: pid 55810 requesting stop
PDB2(4):Buffer Cache flush deferred for PDB 4
2019-04-20T18:21:17.352939+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2019-04-20T18:21:17.483695+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc  (incident=66857) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check: fno_system], [3], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_66857/orcl18c_ora_55775_i66857.trc
2019-04-20T18:21:22.612339+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2019-04-20T18:21:26.062635+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc  (incident=66858) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_66858/orcl18c_ora_55775_i66858.trc
2019-04-20T18:21:26.506644+08:00
Dumping diagnostic data in directory=[cdmp_20190420182126], requested by (instance=1, osid=55775), summary=[incident=66857].
2019-04-20T18:21:30.119381+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2019-04-20T18:21:30.119505+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc:
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], []
2019-04-20T18:21:30.119629+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc:
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], []
2019-04-20T18:21:30.119719+08:00
Error 600 happened during db open, shutting down database
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc  (incident=66859) (PDBNAME=CDB$ROOT):
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_66859/orcl18c_ora_55775_i66859.trc
2019-04-20T18:21:30.346811+08:00
Dumping diagnostic data in directory=[cdmp_20190420182130], requested by (instance=1, osid=55775), summary=[incident=66858].
2019-04-20T18:21:34.551418+08:00
opiodr aborting process unknown ospid (55775) as a result of ORA-603
2019-04-20T18:21:34.720885+08:00
ORA-603 : opitsk aborting process
License high water mark = 4
2019-04-20T18:21:34.754766+08:00
USER (ospid: 55775): terminating the instance due to ORA error 600
2019-04-20T18:21:35.839992+08:00
Instance terminated by USER, pid = 55775

alert日志提示文件不存在,实际上文件是存在的

[root@ora11g ~]#
[root@ora11g ~]# ls -l /u01/app/oracle/oradata/ORCL18C/pdbseed/system01.dbf
-rw-r-----. 1 oracle oinstall 283123712 4月  13 23:59 /u01/app/oracle/oradata/ORCL18C/pdbseed/system01.dbf

主要错误是ORA-600 kcffo_online_pdb_check: fno_system 查询mos发现主要是由于数据库bug导致,查询mos发现不少bug
KCFFO_ONLINE_PDB_CHECK FNO_SYSTEM]


官方解释,主要是由于kcffo_online_pdb函数执行异常导致

Function kcffo_online_pdb_check  Check if it ok to online the files in a
pluggable database. An error is  signalled if it is not ok. This routine will
grab a file enqueue for the relevant files and save it in the NULL-terminated
array fenqsp. The caller must release these after kcffo_online_pdb() or if an
error occurs.

通过人工修改文件状态,绕过该错误,cdb数据库open成功,但是pdb依旧无法正常open

SQL> alter database open;
Database altered.
SQL>  alter session set container=PDB1;
Session altered.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kcvfdb_pdb_set_clean_scn: cleanckpt], [3], [1739494], [38655308813], [2], [], [], [], [], [], [], []

alert日志

PDB1(3):alter database open
PDB1(3):Autotune of undo retention is turned on.
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_56178.trc  (incident=69185) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [kcvfdb_pdb_set_clean_scn: cleanckpt], [3], [1739494], [38655308813], [2], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_69185/orcl18c_ora_56178_i69185.trc
2019-04-20T18:31:41.761479+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2019-04-20T18:31:42.097465+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Pdb PDB1 hit error 600 during open read write (1) and will be closed.
2019-04-20T18:31:42.097847+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_56178.trc:
ORA-00600: internal error code, arguments: [kcvfdb_pdb_set_clean_scn: cleanckpt], [3], [1739494], [38655308813], [2], [], [], [], [], [], [], []
PDB1(3):JIT: pid 56178 requesting stop
2019-04-20T18:31:42.098808+08:00
Dumping diagnostic data in directory=[cdmp_20190420183142], requested by (instance=1, osid=56178), summary=[incident=69185].
2019-04-20T18:31:42.138818+08:00
PDB1(3):Buffer Cache flush deferred for PDB 3
PDB1(3):ORA-600 signalled during: alter database open...

主要错误是ORA-600 kcvfdb_pdb_set_clean_scn: cleanckpt,通过查询mos,依旧发现mos上有的主要可能的bug
kcvfdb_pdb_set_clean_scn cleanckpt


通过人工修改数据文件的checkpoint scn解决该问题,pdb open成功