对恢复案例:因对工作调整不满,链家一员工删除公司 9 TB数据:被判7年事件有感

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:对恢复案例:因对工作调整不满,链家一员工删除公司 9 TB数据:被判7年事件有感

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

从业生涯中恢复过各种删库的场景(有恶意,有无意),有rm数据库,rm数据文件,drop 数据库,dd asm磁盘头,drop table,truncate table,delete table等等.
两年前的一次给链家恢复EBS数据库案例,数据库服务器被人恶意rm 删除了/u01,/u02,rman备份目录等文件(还有ebs服务器相关程序也被删除,由另外一家公司对其进行处理),导致oracle集群异常,无可用备份恢复,通过对asm磁盘组直接恢复,实现数据0丢失.最近有了法院的终审判决:
北京市海淀区人民法院认为,被告人韩冰违反国家规定,对计算机信息系统中存储的数据和应用程序进行删除,造成计算机信息系统不能正常运行,后果特别严重,其行为已构成破坏计算机信息系统罪,依法应予惩处.海淀法院判决:被告人韩冰犯破坏计算机信息系统罪,判处有期徒刑七年。
由衷的感慨一个40岁的老dba,为了一时的赌气冲动的做出了错误的事情,需要使用自己的7年青春去弥补,实在不值得.
1000


具体的关于判决的描述,参考:因对工作调整不满,链家一员工删除公司 9 TB数据:被判7年
我们做dba的,把握了企业的数据命脉,是很多企业最核心的资产,切莫因为一些工作/生活中的赌气去破坏数据库,引起系统无法正常使用,甚至导致数据丢失,给企业带来损失的同时,也让自己失去自由.
对于我们维护的数据库:切莫冲动!!!切莫冲动!!!切莫冲动!!!

ORA-27303: failure occurred at: skgpwinit6

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-27303: failure occurred at: skgpwinit6

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在客户联系我们,数据库在运行过程中突然报ORA-27140 ORA-27300 ORA-27301 ORA-27302 ORA-27303: failure occurred at: skgpwinit6错误
20201231173708


通过Startup Instance Failed with ORA-27140 ORA-27300 ORA-27301 ORA-27302 and ORA-27303on skgpwinit6 (Doc ID 1274030.1)分析,确认可能是由于$ORACLE_HOME/bin/oracle文件权限异常导致,通过分析确认是有系统被chmod修改了权限导致数据库异常.把oracle全下修改为chmod 6751 oracle后,数据库启动正常

dblink会话引起library cache lock

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:dblink会话引起library cache lock

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户反馈,系统最近几天晚上都有卡顿
alert日志里面报如下错误
20201213180505


查看对应的trace文件
20201213180555

确定在收集统计信息的时候报ORA-04021错误.
查看这段时间ash报告
20201213180855

大量的library cache lock等待,而且引起等待的是类似语句,主要都是集中在一张表上,和收集统计信息报错的trace表匹配.
正好当天有对该表进行增加分区维护hang住
20201213181100

查询等待事件和阻塞情况

SQL> select distinct sid,a.BLOCKING_SESSION_STATUS,a.BLOCKING_INSTANCE,a.BLOCKING_SESSION
  2  ,event from gv$session a where sid=7399;

       SID BLOCKING_SE BLOCKING_INSTANCE BLOCKING_SESSION
---------- ----------- ----------------- ----------------
EVENT
----------------------------------------------------------------
      7399 VALID                       3             3593
library cache lock

SQL>  select a.INST_ID,a.sid,a.paddr,a.sql_id,a.event,a.MACHINE,a.PROGRAM 
   2 ,status from gv$session a where a.sid=3593;

   INST_ID        SID PADDR            SQL_ID
---------- ---------- ---------------- -------------
EVENT
----------------------------------------------------------------
MACHINE
----------------------------------------------------------------
PROGRAM                                          STATUS
------------------------------------------------ --------
         3       3593 0000001F91E80670 grxhz2vpmsrc6
SQL*Net message from dblink
WORKGROUP\XG
PlatformSyn.exe                                  ACTIVE

由于对应的sql_id 无法找到sql语句,不过根据等待事件基本上确认是调用一个dblink导致该问题,通过查询该回话,发现该回话一致处于active状态,但是一致无任何变化,实在可能处于僵死状态,对其进行kill之后,增加分区正常,收集统计信息正常.

再次遇到ORA-600 kokasgi1故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:再次遇到ORA-600 kokasgi1故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到一个朋友数据库无法启动,报错ORA-00600: 内部错误代码, 参数: [kokasgi1]错误:

Sun Nov 29 14:23:15 2020
Thread 1 advanced to log sequence 28572 (thread open)
Thread 1 opened at log sequence 28572
  Current log# 3 seq# 28572 mem# 0: /u01/app/oracle/oradata/xifenfei/redo03.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sun Nov 29 14:23:15 2020
SMON: enabling cache recovery
[3505] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:135724 end:135754 diff:30 (0 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
ORA-00600: 内部错误代码, 参数: [kokasgi1], [], [], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_3505.trc:
ORA-00600: 内部错误代码, 参数: [kokasgi1], [], [], [], [], [], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_3505.trc:
ORA-00600: 内部错误代码, 参数: [kokasgi1], [], [], [], [], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 3505): terminating the instance due to error 600
Instance terminated by USER, pid = 3505
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (3505) as a result of ORA-1092

对于这个错误,以前处理过一些类似故障,参考:
ORA-600 kokasgi1故障恢复
重命名sys用户引起数据库启动报ORA-01092 ORA-00600 kokasgi1错误
通过技术手段实现数据库open,查询user$信息
20201129175609


确认sys和system被分别被修改为sysa和systema,由于这个调整导致数据库无法正常启动.把sysa修改为sys,systema修改为system,数据库正常启动,业务测试正常.
20201129175803

sysaux表空间不足—WRH$_ACTIVE_SESSION_HISTORY

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:sysaux表空间不足—WRH$_ACTIVE_SESSION_HISTORY

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一个老生常谈的问题遇到oracle sysaux表空间不足,通过awrinfo 分析,初步判断很可能是ash相关表没有正常清理分区导致
20201128205515


进一步分析确认异常对象
20201128205618

20201128205704

通过分析基本上可以确认是由于WRH$_ACTIVE_SESSION_HISTORY没有创建新分析,数据未被及时清理.根据官方WRH$_ACTIVE_SESSION_HISTORY Does Not Get Purged (Doc ID 387914.1)处理建议
alter session set “_swrf_test_action” = 72;
增加新分区,过一段时间之后,自动清理WRH$_ACTIVE_SESSION_HISTORY表历史数据.
20201128210106

对于sysaux表空间不足的情况,还遇到过类似情况:
sysaux中HEATMAP 对象较大
WRI$_ADV_OBJECTS表过大,导致SYSAUX表空间不足

模拟19c数据库root pdb undo异常恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:模拟19c数据库root pdb undo异常恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

对于19c在pdb情况下三种常见故障进行了模拟测试:
模拟19c数据库redo异常恢复
模拟19c数据库pdb undo异常恢复
模拟19c数据库root pdb undo异常恢复
模拟oracle 19c cdb模式下root pdb中undo丢失故障恢复
会话1,pdb中插入大量数据,未提交

SQL> alter session set container=pdb; 

Session altered.

SQL> alter database open;

Database altered.

SQL> create user xff identified by oracle default tablespace users;
grant dba to xff;
conn xff/oracle@127.0.0.1/pdb
create table t_xifenfei as select * from dba_objects;
insert into t_xifenfei select * from t_xifenfei;
insert into t_xifenfei select * from t_xifenfei;
insert into t_xifenfei select * from t_xifenfei;
insert into t_xifenfei select * from t_xifenfei;
insert into t_xifenfei select * from t_xifenfei;

User created.

SQL> 
Grant succeeded.

SQL> Connected.
SQL> 

Table created.

SQL> 
72351 rows created.

SQL> 
144702 rows created.

SQL> 
289404 rows created.

SQL> 
578808 rows created.

SQL> 

1157616 rows created.

SQL> SQL> SQL> 

会话2中root pdb模拟事务

[oracle@localhost ~]$ ss

SQL*Plus: Release 19.0.0.0.0 - Production on Mon Nov 16 16:56:01 2020
Version 19.5.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.5.0.0.0

SQL> 
SQL> 
SQL> conn system/oracle
Connected.
SQL> create table t_xifenfei tablespace users as select * from dba_objects;
insert into t_xifenfei select * from t_xifenfei;
insert into t_xifenfei select * from t_xifenfei;
insert into t_xifenfei select * from t_xifenfei;
insert into t_xifenfei select * from t_xifenfei;
insert into t_xifenfei select * from t_xifenfei;

Table created.

SQL> 
72380 rows created.

SQL> 
144760 rows created.

SQL> 
289520 rows created.

SQL> 
579040 rows created.

SQL> 

1158080 rows created.

SQL> SQL> 

会话3 abort库并删除root pdb中undo文件

[oracle@localhost ~]$ ss

SQL*Plus: Release 19.0.0.0.0 - Production on Mon Nov 16 16:56:55 2020
Version 19.5.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.5.0.0.0

SQL> shutdown abort;
ORACLE instance shut down.
SQL> 

[oracle@localhost oradata]$ cd ORA19C
[oracle@localhost ORA19C]$ ls
control01.ctl  control02.ctl  pdb  pdbseed  redo01.log  redo02.log  redo03.log  sysaux01.dbf
system01.dbf  temp01.dbf  undotbs01.dbf  users01.dbf
[oracle@localhost ORA19C]$ rm -rf undotbs01.dbf 

启动数据库报ORA-01157 ORA-01110错误

SQL> alter database datafile 4 offline drop;

Database altered.

SQL> alter database open;

Database altered.

SQL> show pdbs;

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB                            MOUNTED
SQL> alter session set container=pdb;

Session altered.

SQL> alter database open;

Database altered.

SQL> conn / as sysdba
Connected.
SQL> show pdbs;

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB                            READ WRITE NO
 
SQL> SQL> select tablespace_name,segment_name,status from dba_rollback_segs;

TABLESPACE_NAME                SEGMENT_NAME                   STATUS
------------------------------ ------------------------------ --------------------------------
SYSTEM                         SYSTEM                         ONLINE
UNDOTBS1                       _SYSSMU1_1261223759$           NEEDS RECOVERY
UNDOTBS1                       _SYSSMU2_27624015$             NEEDS RECOVERY
UNDOTBS1                       _SYSSMU3_2421748942$           NEEDS RECOVERY
UNDOTBS1                       _SYSSMU4_625702278$            NEEDS RECOVERY
UNDOTBS1                       _SYSSMU5_2101348960$           NEEDS RECOVERY
UNDOTBS1                       _SYSSMU6_813816332$            NEEDS RECOVERY
UNDOTBS1                       _SYSSMU7_2329891355$           NEEDS RECOVERY
UNDOTBS1                       _SYSSMU8_399776867$            NEEDS RECOVERY
UNDOTBS1                       _SYSSMU9_1692468413$           NEEDS RECOVERY
UNDOTBS1                       _SYSSMU10_930580995$           NEEDS RECOVERY

本次测试比较幸运,虽然undo段状态为NEEDS RECOVERY,但是数据库直接open成功.实际生产情况,可能比这个要复杂很多

模拟19c数据库pdb undo异常恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:模拟19c数据库pdb undo异常恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

对于19c在pdb情况下三种常见故障进行了模拟测试:
模拟19c数据库redo异常恢复
模拟19c数据库pdb undo异常恢复
模拟19c数据库root pdb undo异常恢复
测试在有事务的情况下,删除pdb中的undo数据库异常情况测试
会话1在root pdb中删除表记录,不提交

SQL> show pdbs;

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB                            READ WRITE NO
SQL> delete from system.t_xifenfei;

2316160 rows deleted.

会话2在pdb中删除表记录,不提交

SQL> conn xff/oracle@127.0.0.1/pdb
Connected.
SQL> delete from xff.t_xifenfei;

72351 rows deleted.

会话3 直接abort库

SQL> shutdown abort;
ORACLE instance shut down.

删除pdb中undo文件

[oracle@localhost pdb]$ rm -rf undotbs01.dbf 
[oracle@localhost pdb]$ 

启动数据库

SQL> startup
ORACLE instance started.

Total System Global Area 4999609088 bytes
Fixed Size                  9145088 bytes
Variable Size             905969664 bytes
Database Buffers         4076863488 bytes
Redo Buffers                7630848 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 11 - see DBWR trace file
ORA-01110: data file 11: '/u01/app/oracle/oradata/ORA19C/pdb/undotbs01.dbf'

offline异常文件,再open库

SQL> alter database datafile 11 offline drop;
alter database datafile 11 offline drop
*
ERROR at line 1:
ORA-01516: nonexistent log file, data file, or temporary file "11" in the current container

SQL> show pdbs;

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       MOUNTED
         3 PDB                            MOUNTED
SQL> alter session set container=pdb;

Session altered.

SQL> alter database datafile 11 offline drop;

Database altered.

SQL> conn / as sysdba
Connected.

SQL> startup 
ORACLE instance started.

Total System Global Area 4999609088 bytes
Fixed Size                  9145088 bytes
Variable Size             905969664 bytes
Database Buffers         4076863488 bytes
Redo Buffers                7630848 bytes
Database mounted.
Database opened.

open pdb

SQL> alter session set container=pdb;

Session altered.

SQL> 
SQL> alter database open;

Database altered.

SQL> conn / as sysdba
Connected.
SQL> show pdbs;

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB                            READ WRITE NO

测试中库open比较简单,后续只要对异常undo进行处理即可

SQL> create undo tablespace undotbs2 datafile '/u01/app/oracle/oradata/ORA19C/pdb/undotbs02.dbf' size 128M autoextend on;

Tablespace created.

SQL> alter system set undo_tablespace=undotbs2;

System altered.

SQL> select tablespace_name,segment_name,status from dba_rollback_segs;

TABLESPACE_NAME                SEGMENT_NAME                   STATUS
------------------------------ ------------------------------ --------------------------------
SYSTEM                         SYSTEM                         ONLINE
UNDOTBS1                       _SYSSMU1_3588498444$           NEEDS RECOVERY
UNDOTBS1                       _SYSSMU2_2971032042$           NEEDS RECOVERY
UNDOTBS1                       _SYSSMU3_3657342154$           NEEDS RECOVERY
UNDOTBS1                       _SYSSMU4_811969446$            NEEDS RECOVERY
UNDOTBS1                       _SYSSMU5_3018429039$           NEEDS RECOVERY
UNDOTBS1                       _SYSSMU6_442110264$            NEEDS RECOVERY
UNDOTBS1                       _SYSSMU7_2728255665$           NEEDS RECOVERY
UNDOTBS1                       _SYSSMU8_801938064$            NEEDS RECOVERY
UNDOTBS1                       _SYSSMU9_647420285$            NEEDS RECOVERY
UNDOTBS1                       _SYSSMU10_2262159254$          NEEDS RECOVERY

SQL>  drop tablespace undotbs1 including contents and datafiles;

Tablespace dropped.

在测试中,undo有事务的情况下,数据库可以正常open,而且运行了一段时间未crash,在这个方面确实比11g及其以前版本有很大改进.当然由于测试环境本身比较单一,可能实际生产中会此类故障处理比较复杂

模拟19c数据库redo异常恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:模拟19c数据库redo异常恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

对于19c在pdb情况下三种常见故障进行了模拟测试:
模拟19c数据库redo异常恢复
模拟19c数据库pdb undo异常恢复
模拟19c数据库root pdb undo异常恢复
模拟oracle 19c数据库redo丢失的恢复操作,模拟数据库有事务,在没有提交的情况下redo丢失故障

[oracle@localhost oradata]$ ss

SQL*Plus: Release 19.0.0.0.0 - Production on Mon Nov 16 16:11:16 2020
Version 19.5.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.5.0.0.0

SQL> conn xff/oracle@127.0.0.1/pdb
Connected.
SQL> create table t_xifenfei as select * from dba_objects;

Table created.

SQL> insert into t_xifenfei select * from t_xifenfei;
insert into t_xifenfei select * from t_xifenfei;
insert into t_xifenfei select * from t_xifenfei;
insert into t_xifenfei select * from t_xifenfei;
insert into t_xifenfei select * from t_xifenfei;

72351 rows created.

SQL> 
144702 rows created.

SQL> 
289404 rows created.

SQL> 
578808 rows created.

SQL> 

1157616 rows created.

另外一个会话kill数据库并且删除redo

[root@localhost ~]# ps -ef|grep pmon
oracle    38500      1  0 16:08 ?        00:00:00 ora_pmon_ora19c
root      39030  39009  0 16:11 pts/2    00:00:00 grep --color=auto pmon
[root@localhost ~]# kill -9 38500
[root@localhost ~]# ps -ef|grep pmon
root      39042  39009  0 16:11 pts/2    00:00:00 grep --color=auto pmon
[root@localhost ~]# ls -l /u01/app/oracle/oradata/ORA19C/redo*.log
ls: cannot access /u01/app/oracle/oradata/ORA19C/redo*.log: No such file or directory

启动数据库报错ORA-00313 ORA-00312 ORA-27037

SQL> startup 
ORACLE instance started.

Total System Global Area 4999609088 bytes
Fixed Size                  9145088 bytes
Variable Size             905969664 bytes
Database Buffers         4076863488 bytes
Redo Buffers                7630848 bytes
Database mounted.
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/u01/app/oracle/oradata/ORA19C/redo03.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 7

因为redo全部丢失只能尝试强制拉库

SQL> startup mount pfile='/tmp/pfile'
ORACLE instance started.

Total System Global Area 4999609088 bytes
Fixed Size                  9145088 bytes
Variable Size             905969664 bytes
Database Buffers         4076863488 bytes
Redo Buffers                7630848 bytes
Database mounted.
SQL>  recover database until cancel;
ORA-00279: change 2335666 generated at 11/16/2020 16:08:42 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/product/19.2/db_1/dbs/arch1_12_1056620100.dbf
ORA-00280: change 2335666 for thread 1 is in sequence #12


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/app/oracle/oradata/ORA19C/system01.dbf'


ORA-01112: media recovery not started


SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [],
[], [], [], [], [], [], []
Process ID: 39588
Session ID: 9 Serial number: 32012

数据库报ORA-600 kcbzib_kcrsds_1错误是由于在强制拉库过程中文件异常导致,通过对异常文件进行处理数据库open成功

SQL> recover database;
Media recovery complete.
SQL> alter database open;

Database altered.

SQL> 

SQL> alter session set container=pdb;

Session altered.

SQL> alter database open;

SQL> show pdbs;

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB                            READ WRITE NO

这个是模拟redo丢失或者损坏故障,在实际的生产故障中可能要比这个复杂很多.

259461356@qq.com加密数据库恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:259461356@qq.com加密数据库恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有一朋友医院客户的oracle数据库所在服务器被加密,导致数据库无法正常工作
20201103130938


使用工具对加密文件分析
20201103131324

通过分析该文件可以实现业务数据的绝大部分恢复,因为该库有大量xml字段,常规方法无法处理,选择open数据库,并且导致dmp给客户
20201103132248

如果您遇到此类加密情况,Oracle/SQL SERVER/MySQL数据文件被加密无法恢复,请联系我们解决,提供专业ORACLE数据库恢复技术支持
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

数据库打开遭遇ORA-08103故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:数据库打开遭遇ORA-08103故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库启动报ORA-08103
朋友反馈,客户的一套sap历史数据系统,无备份,由于硬件故障导致,数据库启动报ORA-08103: object no longer exists错误,无法正常启动

SMON: enabling tx recovery
Mon Oct 12 09:51:56 2020
Database Characterset is UTF8
Opening with internal Resource Manager plan
where NUMA PG = 1, CPUs = 32
Starting background process QMNC
QMNC started with pid=36, OS id=14942376
Mon Oct 12 09:51:58 2020
Errors in file /oracle/XFF/saptrace/usertrace/xff_ora_15204508.trc:
ORA-00604: error occurred at recursive SQL level 2
ORA-08103: object no longer exists
Mon Oct 12 09:51:58 2020
ORA-604 signalled during: ALTER DATABASE OPEN...
Mon Oct 12 09:51:58 2020
Errors in file /oracle/XFF/saptrace/background/xff_smon_8978452.trc:
ORA-08103: object no longer exists
Mon Oct 12 09:52:50 2020
Errors in file /oracle/XFF/saptrace/background/xff_reco_8781998.trc:
ORA-08103: object no longer exists
Mon Oct 12 09:52:50 2020
RECO: terminating instance due to error 8103
Instance terminated by RECO, pid = 8781998

dbv检查system文件
20201016220751


由于客户这个库遭遇过硬件,不少文件中都有一些空块(zero block)
10046分析错误信息

PARSING IN CURSOR #14 len=210 dep=2 uid=0 oct=3 lid=0 tim=191031051961382 hv=864012087 ad='cabb8a10'
select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt, timestamp#, sample_size, minimum, maximum, 
distcnt, lowval, hival, density, col#, spare1, spare2, avgcln from hist_head$ where obj#=:1 and intcol#=:2
END OF STMT
PARSE #14:c=0,e=21,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=3,tim=191031051961380
BINDS #14:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=110919308  bln=22  avl=02  flg=05
  value=96
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=1109192d8  bln=24  avl=02  flg=05
  value=1
EXEC #14:c=0,e=128,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=3,tim=191031051961618
FETCH #14:c=0,e=40,p=1,cr=1,cu=0,mis=0,r=0,dep=2,og=3,tim=191031051961676
ORA-00604: error occurred at recursive SQL level 2
ORA-08103: object no longer exists

比较明显是由于hist_head$异常导致,而且报ORA-08103错误和dbv检查system文件有一些空块错误匹配。这个错误比较简单,有几种处理方法:
1. 直接把空块标记为坏块,然后启动库
2. 在数据库启动过程跳过该sql,启动数据库,然后对这个表进行处理
该库进行恢复之后,正常打开数据库
20201016221301


由于其他文件还有不少空块,在数据库进行导出的时候,还有部分表报ORA-08103和ORA-01555[含lob对象表导出]错误,对其进行特殊导出处理,实现该sap系统业务数据的整体恢复.