又一例asm格式化文件系统恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:又一例asm格式化文件系统恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

又一个客户把win rac中的asm disk给格式化为ntfs了(data磁盘组由三个500G的磁盘组成,被格式化掉前面两个还剩下一个),而且格式化之后,还进行了一系列恢复(比如修复磁盘头,又进行分区等一些磁盘操作),导致恢复难度增加,也增加了一些数据覆盖
asm alert日志报错

Thu Aug 23 11:20:14 2018
NOTE: ASM client orcl1:orcl disconnected unexpectedly.
NOTE: check client alert log.
NOTE: Process state recorded in trace file d:\app\administrator\diag\asm\+asm\+asm1\trace\+asm1_ora_2260.trc
Thu Aug 23 11:20:28 2018
Errors in file d:\app\administrator\diag\asm\+asm\+asm1\trace\+asm1_lgwr_3820.trc:
ORA-27070: async read/write failed
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 87) 参数错误。
WARNING: IO Failed. group:2 disk(number.incarnation):1.0xf0f0a1cb disk_path:\\.\ORCLDISKDATA1
	 AU:26 disk_offset(bytes):27566080 io_size:4096 operation:Write type:synchronous
	 result:I/O error process_id:3820
NOTE: unable to write any mirror side for diskgroup DATA
NOTE: cache initiating offline of disk 1 group DATA
NOTE: process 3268:3820 initiating offline of disk 1.4042301899 (DATA_0001) with mask 0x7e in group 2
WARNING: Disk DATA_0001 in mode 0x7f is now being taken offline
NOTE: initiating PST update: grp = 2, dsk = 1/0xf0f0a1cb, mode = 0x15
kfdp_updateDsk(): 22
Thu Aug 23 11:20:28 2018
kfdp_updateDskBg(): 22
ERROR: too many offline disks in PST (grp 2)
WARNING: Disk DATA_0001 in mode 0x7f offline aborted

数据库alert日志报错

WARNING: IO Failed. group:2 disk(number.incarnation):1.0xf0f0a1cb disk_path:\\.\ORCLDISKDATA1
	 AU:422 disk_offset(bytes):442515456 io_size:16384 operation:Read type:synchronous
	 result:I/O error process_id:11992
WARNING: failed to read mirror side 1 of virtual extent 5 logical extent 0 of file 260 in
group [2.1859146063] from disk DATA_0001  allocation unit 422 reason error; if possible,will try another mirror side
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_ora_11992.trc:
ORA-15080: 与磁盘的同步 I/O 操作失败
WARNING: failed to write mirror side 1 of virtual extent 5 logical extent 0 of file 260
in group 2 on disk 1 allocation unit 422
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_ora_11992.trc:
ORA-00202: 控制文件: ''+DATA/orcl/controlfile/current.260.944422981''
ORA-15081: 无法将 I/O 操作提交到磁盘
Thu Aug 23 11:20:13 2018
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-27070: 异步读取/写入失败
WARNING: IO Failed. group:2 disk(number.incarnation):1.0xf0f0a1cb disk_path:\\.\ORCLDISKDATA1
	 AU:841 disk_offset(bytes):882532352 io_size:131072 operation:Write type:asynchronous
	 result:I/O error process_id:3224
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-15080: 与磁盘的同步 I/O 操作失败
WARNING: failed to write mirror side 1 of virtual extent 240 logical extent 0 of file 259 in group 2 on disk 1
allocation unit 841 KCF: read, write or open error, block=0x7853 online=1
        file=4 '+DATA/orcl/datafile/users.259.944422883'
        error=15081 txt: ''
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 87) 参数错误。
WARNING: IO Failed. group:2 disk(number.incarnation):1.0xf0f0a1cb disk_path:\\.\ORCLDISKDATA1
	 AU:422 disk_offset(bytes):442515456 io_size:16384 operation:Read type:synchronous
	 result:I/O error process_id:3224
WARNING: failed to read mirror side 1 of virtual extent 5 logical extent 0 of file 260 in group [2.1859146063] from
disk DATA_0001  allocation unit 422 reason error; if possible,will try another mirror side
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-15080: 与磁盘的同步 I/O 操作失败
WARNING: failed to write mirror side 1 of virtual extent 5 logical extent 0 of file 260 in group 2 on disk 1
allocation unit 422
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-00202: 控制文件: ''+DATA/orcl/controlfile/current.260.944422981''
ORA-15081: 无法将 I/O 操作提交到磁盘
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-00204: 读取控制文件时出错 (块 41, # 块 1)
ORA-00202: 控制文件: ''+DATA/orcl/controlfile/current.260.944422981''
ORA-15081: 无法将 I/O 操作提交到磁盘
DBW1 (ospid: 3224): terminating the instance due to error 204

由于客户进行了一系列恢复恢复操作导致查看磁盘都不全

D:\>asmtool -list
NTFS                             \Device\Harddisk0\Partition1              100M
NTFS                             \Device\Harddisk0\Partition2           102298M
NTFS                             \Device\Harddisk1\Partition1           102397M
NTFS                             \Device\Harddisk2\Partition1           204797M
---这里还有一个磁盘没有正常显示
ORCLDISKDATA10                   \Device\Harddisk4\Partition1           511997M--客户尝试修复的磁盘
ORCLDISKDATA2                    \Device\Harddisk5\Partition1           511997M
ORCLDISKRECOVERY0                \Device\Harddisk6\Partition1            51197M
ORCLDISKRECOVERY1                \Device\Harddisk7\Partition1            51197M
ORCLDISKRECOVERY2                \Device\Harddisk8\Partition1            51197M
ORCLDISKCRS0                     \Device\Harddisk9\Partition1            10237M
ORCLDISKCRS1                     \Device\Harddisk10\Partition1           10237M
ORCLDISKCRS2                     \Device\Harddisk11\Partition1           10237M
NTFS                             \Device\Harddisk12\Partition2         4194174M

通过主机层面激活卷,删除分区等一系列操作,然后通过kfed构造磁盘头,让这些磁盘在os层面可以正常显示

C:\Users\Administrator>asmtool -list
NTFS                             \Device\Harddisk0\Partition1              100M
NTFS                             \Device\Harddisk0\Partition2           102298M
NTFS                             \Device\Harddisk1\Partition1           102397M
NTFS                             \Device\Harddisk2\Partition1           204797M
------需要处理的磁盘------
ORCLDISKDATA0                    \Device\Harddisk3\Partition1           511997M
ORCLDISKDATA1                    \Device\Harddisk4\Partition1           511997M
ORCLDISKDATA2                    \Device\Harddisk5\Partition1           511997M
-----------------------
ORCLDISKRECOVERY0                \Device\Harddisk6\Partition1            51197M
ORCLDISKRECOVERY1                \Device\Harddisk7\Partition1            51197M
ORCLDISKRECOVERY2                \Device\Harddisk8\Partition1            51197M
ORCLDISKCRS0                     \Device\Harddisk9\Partition1            10237M
ORCLDISKCRS1                     \Device\Harddisk10\Partition1           10237M
ORCLDISKCRS2                     \Device\Harddisk11\Partition1           10237M
NTFS                             \Device\Harddisk12\Partition2         4194174M

由于asm磁盘组内部目录au被彻底损坏,导致无法通过asm直接拷贝出来数据,通过底层扫描,按照au恢复出来相关数据,由于格式化ntfs和后续的误操作导致部分数据au被覆盖.其余数据均恢复,抢救了绝大部分数据.
数据文件恢复参考:asm disk header 彻底损坏恢复
另外有一次win平台类似恢复经历:asm disk格式化为ntfs恢复
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

Automatic datafile offline due to write error on

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:Automatic datafile offline due to write error on

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

由于存储突然掉线导致数据文件无法访问,导致部分数据文件被自动offline

Thu May 17 14:49:03 2018
KCF: read, write or open error, block=0xe93b8 online=1
Thu May 17 14:49:03 2018
KCF: read, write or open error, block=0x24eb65 online=1
        file=25 'F:\ORACLE\ORADATA\ORCL\QYSCZH12.ORA'
        file=28 'F:\ORACLE\ORADATA\ORCL\QYSCZH15.ORA'
        error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 2) 系统找不到指定的文件。'
Automatic datafile offline due to write error on
file 25: F:\ORACLE\ORADATA\ORCL\QYSCZH12.ORA
Thu May 17 14:49:03 2018
KCF: read, write or open error, block=0x22b0a1 online=1
        file=28 'F:\ORACLE\ORADATA\ORCL\QYSCZH15.ORA'
        error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 2) 系统找不到指定的文件。'
Automatic datafile offline due to write error on
file 28: F:\ORACLE\ORADATA\ORCL\QYSCZH15.ORA
Thu May 17 14:49:03 2018
KCF: read, write or open error, block=0x138def online=1
        file=11 'F:\ORACLE\ORADATA\ORCL\QYSCZH4'
        error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 2) 系统找不到指定的文件。'
        file=30 'F:\ORACLE\ORADATA\ORCL\QYSCZH17.ORA'
        file=11 'F:\ORACLE\ORADATA\ORCL\QYSCZH4'
        error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
        error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 2) 系统找不到指定的文件。'
O/S-Error: (OS 2) 系统找不到指定的文件。'
……
        file=15 'F:\ORACLE\ORADATA\ORCL\QYSCZH6.ORA'
        error=27072 txt: 'OSD-04008: WriteFile() 失败, 无法写入文件
O/S-Error: (OS 21) 设备未就绪。'
Automatic datafile offline due to write error on
file 15: F:\ORACLE\ORADATA\ORCL\QYSCZH6.ORA
KCF: read, write or open error, block=0xade96 online=1
        file=9 'F:\ORACLE\ORADATA\ORCL\QYSCZH2'
        error=27072 txt: 'OSD-04008: WriteFile() 失败, 无法写入文件
O/S-Error: (OS 21) 设备未就绪。'
Automatic datafile offline due to write error on
file 9: F:\ORACLE\ORADATA\ORCL\QYSCZH2
Thu May 17 14:49:28 2018
KCF: read, write or open error, block=0x378c66 online=1
        file=15 'F:\ORACLE\ORADATA\ORCL\QYSCZH6.ORA'
        error=27072 txt: 'OSD-04008: WriteFile() 失败, 无法写入文件
O/S-Error: (OS 21) 设备未就绪。'
Automatic datafile offline due to write error on
file 15: F:\ORACLE\ORADATA\ORCL\QYSCZH6.ORA
KCF: read, write or open error, block=0x35f6de online=1
……

存储掉线是悲剧的起点,按理说数据库是归档模式,存储恢复之后,继续recover datafile,然后online应该问题不大,但是由于客户没有及时处理这个问题(也许业务实时性要求不高,可能挂几个小时也没人知道),导致第二个悲剧发生,删除归档的定时任务把数据库的归档日志给删除了.导致后面存储挂载上来之后,数据文件也无法正常online成功

Tue May 22 16:28:13 2018
ALTER DATABASE RECOVER  datafile 'F:\ORACLE\ORADATA\ORCL\QYSCZH'
Media Recovery Start
Serial Media Recovery started
ORA-279 signalled during: ALTER DATABASE RECOVER  datafile 'F:\ORACLE\ORADATA\ORCL\QYSCZH'  ...
Tue May 22 16:28:42 2018
ALTER DATABASE RECOVER    CONTINUE DEFAULT
Media Recovery Log D:\ORALCE\ADMINISTRATOR\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2018_05_22\O1_MF_1_267346_%U_.ARC
Errors with log D:\ORALCE\ADMINISTRATOR\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2018_05_22\O1_MF_1_267346_%U_.ARC
ORA-308 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
ALTER DATABASE RECOVER    CONTINUE DEFAULT
Media Recovery Log D:\ORALCE\ADMINISTRATOR\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2018_05_22\O1_MF_1_267346_%U_.ARC
Errors with log D:\ORALCE\ADMINISTRATOR\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2018_05_22\O1_MF_1_267346_%U_.ARC
ORA-308 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...

通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)脚本检测发现结果如下:
recover


遭遇这种情况,常规方法无法恢复,考虑使用bbed或者其他方法强制online文件,由于存储突然掉线,这样恢复的库可能后续还有大量工作需要处理,最常见的可能有表和index不一致,表的segment header信息和extent实际信息不匹配等

truncate IDL_UB1$恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:truncate IDL_UB1$恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

世界之大无奇不有,已经记不清这是第几个客户咨询IDL_UB1$ 被truncate之后导致数据库无法启动的case了.idl_ub1$表是用来存储PL/SQL的代码单元的,包括DIANA等,IDL在这里代表Interface Definition Language. 在数据库的启动过程中通过10046跟踪可以知道,有类似:select /*+ index(idl_ub1$ i_idl_ub11) +*/ piece#,length,piece from idl_ub1$ where obj#=:1 and part=:2 and version=:3 order by piece#的查询语句,由于该表被truncate之后,导致数据库启动无法绕过该sql,而hang住无法完全open成功.下午闲着没事通过模拟,可以对该故障实现正常open,并且导出数据
模拟业务数据

create user xff identified by oracle;
grant dba to xff;
conn xff/oracle
create table t_xifenfei as select * from dba_objects;
create index i_xifenfei on t_xifenfei(object_id);
create view v_xifenfei as select * from t_xifenfei;
create or replace procedure proc1(
para1 varchar2,
para2 out varchar2,
para3 in out varchar2
) as
v_name varchar2(20);
begin
 v_name :='xifenfei';
 para3 := v_name;
dbms_output.put_line('para3:'||para3);
end;
/
alter system checkpoint;

创建xff账户,并且创建有代表性的表,索引,存储过程,视图等.

模拟truncate IDL_UB1$表

SQL> conn / as sysdba
Connected.
SQL> truncate table IDL_UB1$;
truncate table IDL_UB1$
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 31325
Session ID: 177 Serial number: 7

重启数据库

SQL> conn / as sysdba
Connected.
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>
SQL>
SQL>
SQL>
SQL> startup
ORACLE instance started.
Total System Global Area 7499329536 bytes
Fixed Size                  2267832 bytes
Variable Size            1409287496 bytes
Database Buffers         6073352192 bytes
Redo Buffers               14422016 bytes
Database mounted.

数据库在mount之后,一直处于hang住状态,查看alert日志

Sun May 20 17:02:34 2018
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x18] [PC:0x98D94A7, hshuid()+273] [flags: 0x0, count: 1]
Errors in file /home/u01/app/oracle/diag/rdbms/test/test/trace/test_ora_31325.trc  (incident=21781):
ORA-07445: exception encountered: core dump [hshuid()+273] [SIGSEGV] [ADDR:0x18] [PC:0x98D94A7] [Address not mapped to object] []
Incident details in: /home/u01/app/oracle/diag/rdbms/test/test/incident/incdir_21781/test_ora_31325_i21781.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sun May 20 17:02:37 2018
Sweep [inc][21781]: completed
Sweep [inc2][21781]: completed
Sun May 20 17:02:37 2018
Dumping diagnostic data in directory=[cdmp_20180520170237], requested by (instance=1, osid=31325), summary=[incident=21781].
Sun May 20 17:02:55 2018
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x18] [PC:0x98D94A7, hshuid()+273] [flags: 0x0, count: 1]
Errors in file /home/u01/app/oracle/diag/rdbms/test/test/trace/test_m000_31373.trc  (incident=21821):
ORA-07445: exception encountered: core dump [hshuid()+273] [SIGSEGV] [ADDR:0x18] [PC:0x98D94A7] [Address not mapped to object] []
Incident details in: /home/u01/app/oracle/diag/rdbms/test/test/incident/incdir_21821/test_m000_31373_i21821.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sun May 20 17:02:56 2018
Dumping diagnostic data in directory=[cdmp_20180520170256], requested by (instance=1, osid=31373 (M000)), summary=[incident=21821].

这类问题比较明显,正常方法无法打开,通过工具分析system文件,发现虽然truncate IDL_UB1$操作报错了,但是IDL_UB1$和对应的index I_IDL_UB11 obj#,dataobj#均已经改变,而且相关对象的segment header也变化为新dataobj#(truncate之后的),也就是说truncate在数据库中的主要更改操作已经完成.现在在缺少记录的情况下,数据库执行如下sql无法获取记录,从而无法open

PARSING IN CURSOR #140342551421712 len=132 dep=2 uid=0 oct=3 lid=0 tim=1526843464635335 hv=4260389146 ad='21af73218' sqlid='cvn54b7yz0s8u'
select /*+ index(idl_ub1$ i_idl_ub11) +*/ piece#,length,piece from idl_ub1$ where obj#=:1 and part=:2 and version=:3 order by piece#
END OF STMT
PARSE #140342551421712:c=0,e=14,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,plh=3246118364,tim=1526843464635335
BINDS #140342551421712:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fa40bec7d80  bln=22  avl=03  flg=05
  value=1310
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fa40bec7d50  bln=24  avl=01  flg=05
  value=0
 Bind#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fa40bec7d20  bln=24  avl=06  flg=05
  value=184549376
EXEC #140342551421712:c=0,e=76,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,plh=3246118364,tim=1526843464635449
FETCH #140342551421712:c=0,e=4,p=0,cr=1,cu=0,mis=0,r=0,dep=2,og=4,plh=3246118364,tim=1526843464635462
STAT #140342551421712 id=1 cnt=0 pid=0 pos=1 obj=225 op='TABLE ACCESS BY INDEX ROWID IDL_UB1$ (cr=1 pr=0 pw=0 time=4 us cost=3 size=44 card=2)'
STAT #140342551421712 id=2 cnt=0 pid=1 pos=1 obj=236 op='INDEX RANGE SCAN I_IDL_UB11 (cr=1 pr=0 pw=0 time=4 us cost=2 size=0 card=2)'
CLOSE #140342551421712:c=0,e=2,dep=2,type=0,tim=1526843464635496

通过对数据库采用技术欺骗手段,让数据库启动相关sql能够获取到记录(和正常查询的相同),从而实现数据库正常open

SQL> startup mount
ORACLE instance started.
Total System Global Area 7499329536 bytes
Fixed Size                  2267832 bytes
Variable Size            1409287496 bytes
Database Buffers         6073352192 bytes
Redo Buffers               14422016 bytes
Database mounted.
SQL> alter database open;
Database altered.

open成功之后,后台报大量的ORA-08103: object no longer exists,通过分析是由于truncate IDL_UB1$没有完全成功,导致出现该错误.解决方法就是对Oracle数据字典进行人工更新,把没有完成的truncate操作在数据库中给予完成.
导出数据
exp导出数据成功
exp-data


但是expdp无法执行

[oracle@bogon oradata]$ expdp '"/ as sysdba"' schemas=xff file=1.dmp
Export: Release 11.2.0.4.0 - Production on Sun May 20 17:10:53 2018
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
UDE-31623: operation generated ORACLE error 31623
ORA-31623: a job is not attached to this session via the specified handle
ORA-06512: at "SYS.DBMS_DATAPUMP", line 3326
ORA-06512: at "SYS.DBMS_DATAPUMP", line 4551
ORA-06512: at line 1

暂时未去研究对这个表进行重建,使用exp导出然后再imp导入是比较理想的办法

恶意删除bootstrap$导致数据库无法正常启动

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:恶意删除bootstrap$导致数据库无法正常启动

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户10.2.0.5的数据库关闭之后,无法正常启动报ORA-00704 ORA-00702错误.

Fri May 18 22:42:26  2018
ALTER DATABASE OPEN
Fri May 18 22:42:27  2018
Beginning crash recovery of 1 threads
 parallel recovery started with 7 processes
Fri May 18 22:42:27  2018
Started redo scan
Fri May 18 22:42:27  2018
Completed redo scan
 1 redo blocks read, 0 data blocks need recovery
Fri May 18 22:42:27  2018
Started redo application at
 Thread 1: logseq 2, block 2, scn 8448162573
Fri May 18 22:42:27  2018
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: D:\DATABASE\xifenfei\REDO02.LOG
Fri May 18 22:42:27  2018
Completed redo application
Fri May 18 22:42:27  2018
Completed crash recovery at
 Thread 1: logseq 2, block 3, scn 8448182575
 0 data blocks read, 0 data blocks written, 1 redo blocks read
Fri May 18 22:42:28  2018
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=23, OS id=3188
ARC1 started with pid=24, OS id=3168
ARC2 started with pid=25, OS id=996
ARC3 started with pid=26, OS id=432
ARC4 started with pid=27, OS id=3728
Fri May 18 22:42:28  2018
ARC0: Archival started
ARC1: Archival started
ARC5 started with pid=28, OS id=2876
Fri May 18 22:42:28  2018
ARC2: Archival started
ARC3: Archival started
ARC4: Archival started
ARC5: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
Fri May 18 22:42:28  2018
Thread 1 advanced to log sequence 3 (thread open)
Thread 1 opened at log sequence 3
  Current log# 3 seq# 3 mem# 0: D:\DATABASE\xifenfei\REDO03.LOG
Successful open of redo thread 1
Fri May 18 22:42:28  2018
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri May 18 22:42:28  2018
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Fri May 18 22:42:28  2018
ARC2: Becoming the heartbeat ARCH
Fri May 18 22:42:28  2018
SMON: enabling cache recovery
Fri May 18 22:42:28  2018
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\udump\xifenfei_ora_3148.trc:
ORA-00704: 引导程序进程失败
ORA-00702: 引导程序版本 '' 与版本 '8.0.0.0.0' 不一致
Fri May 18 22:42:28  2018
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 3148
ORA-1092 signalled during: ALTER DATABASE OPEN...

根据以前恢复经验ORA-00702: bootstrap verison ” inconsistent with version ’8.0.0.0.0′,很可能是由于bootstrap$表异常了.
通过dbv检查system文件确认没有坏块
dbv-system


通过bbed分析,确认记录被删除
把数据文件拷贝到本地,通过bbed进行分析,确认记录丢失

BBED> map
 File: d:/system01.dbf (0)
 Block: 379                                   Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Table/Cluster)
 struct kcbh, 20 bytes                      @0
 struct ktbbh, 48 bytes                     @20
 struct kdbh, 14 bytes                      @68
 struct kdbt[1], 4 bytes                    @82
 sb2 kdbr[24]                               @86
 ub1 freespace[1158]                        @134
 ub1 rowdata[6896]                          @1292
 ub4 tailchk                                @8188
BBED> p *kdbr[0]
rowdata[6875]
-------------
ub1 rowdata[6875]                           @8167     0x3c
BBED> x /rnnc
rowdata[6875]                               @8167
-------------
flag@8167: 0x3c (KDRHFL, KDRHFF, KDRHFD, KDRHFH)
lock@8168: 0x01
cols@8169:    0

故障原因跟踪
有人在数据库中注入了恶意脚本,导致数据库删除了bootstrap$中数据,关闭之后无法正常启动
delete-bootstrap$


处理方法
通过oracle bbed 修复数据字典,正常启动数据库

Oracle 数据文件大小为0kb或者文件丢失恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:Oracle 数据文件大小为0kb或者文件丢失恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到一个朋友恢复请求,由于rose频繁切换导致文件系统部分数据文件变化为0kb和文件丢失.
故障现象
部分数据文件变化为0kb和文件丢失.
file_lost
file_size_0


这里比较明显,数据库的users03变为了0kb和users04丢失.数据库alert日志报错信息如下:

Completed: alter database mount exclusive
alter database open
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_dbw0_12008.trc:
ORA-01157: ????/?????? 7 - ??? DBWR ????
ORA-01110: ???? 7: 'E:\APP\ADMINISTRATOR\ORADATA\ORCL\USERS03.DBF'
ORA-27047: ??????????
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 38) 已到文件结尾。
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_dbw0_12008.trc:
ORA-01157: ????/?????? 8 - ??? DBWR ????
ORA-01110: ???? 8: 'E:\APP\ADMINISTRATOR\ORADATA\ORCL\USERS04.DBF'
ORA-27041: ??????
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_12040.trc:
ORA-01157: ????/?????? 7 - ??? DBWR ????
ORA-01110: ???? 7: 'E:\APP\ADMINISTRATOR\ORADATA\ORCL\USERS03.DBF'
ORA-1157 signalled during: alter database open...
Fri May 04 09:35:10 2018
Checker run found 2 new persistent data failures

alert日志的报错也比较明显,users03是文件超过了大小(大小为0kb,读取之后肯定超过大小),users04提示无法打开文件(文件在文件系统层面已经丢失).现在问题比较明显由于文件系统故障导致文件大小为0和丢失

碎片扫描恢复
常规的方法肯定无法恢复,比较好的方法只能是底层碎片扫描重组,结合多种扫描工具,最后发现一个做底层恢复的朋友的工具效果不错,扫描结果如下
file_scan


通过工具分析坏块情况

C:\Users\Administrator>dbv FiLe=D:\0504\ORCL_TS.4_FILE.7_10.ora
DBVERIFY: Release 11.2.0.4.0 - Production on 星期六 5月 5 08:52:53 2018
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
DBVERIFY - 开始验证: FILE = D:\0504\ORCL_TS.4_FILE.7_10.ora
………………
页 382565 标记为损坏
Corrupt block relative dba: 0x01c5d665 (file 7, block 382565)
Completely zero block found during dbv:
页 382566 标记为损坏
Corrupt block relative dba: 0x01c5d666 (file 7, block 382566)
Completely zero block found during dbv:
页 382567 标记为损坏
Corrupt block relative dba: 0x01c5d667 (file 7, block 382567)
Completely zero block found during dbv:
DBVERIFY - 验证完成
检查的页总数: 1374720
处理的页总数 (数据): 27582
失败的页总数 (数据): 0
处理的页总数 (索引): 20114
失败的页总数 (索引): 0
处理的页总数 (其他): 1319752
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 1
标记为损坏的总页数: 7271
流入的页总数: 0
加密的总页数        : 0
最高块 SCN            : 228271996 (0.228271996)
C:\Users\Administrator>dbv FiLe=D:\0504\ORCL_TS.4_FILE.8_8.ora
DBVERIFY: Release 11.2.0.4.0 - Production on 星期六 5月 5 08:52:53 2018
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
DBVERIFY - 开始验证: FILE = D:\0504\ORCL_TS.4_FILE.8_8.ora
DBVERIFY - 验证完成
检查的页总数: 1136896
处理的页总数 (数据): 36639
失败的页总数 (数据): 0
处理的页总数 (索引): 57038
失败的页总数 (索引): 0
处理的页总数 (其他): 1043218
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 1
标记为损坏的总页数: 0
流入的页总数: 0
加密的总页数        : 0
最高块 SCN            : 228271997 (0.228271997)
C:\Users\Administrator>

scan_resulte


这里通过分析恢复的两个文件总的block数量2511618,其中连续损坏7271个block损坏,由于出现问题之后,数据库被offline这两个文件继续启动运行了几个小时,导致少量block被覆盖,恢复软件直接置空.后续的恢复比较顺利,正常open数据库,然后处理坏块对象(正好不是业务核心表的lob字段,所有部分丢失影响不是非常大).

温馨提醒:
1. 数据文件和备份不要放在同一个阵列上,更不能是同一个分区(卷)上
2. 出现此类问题之后,应当理解停止对该分区的任何写操作,方式丢失或者大小为0KB的文件被覆盖.
如果需要专业ORACLE数据库恢复技术支持,请联系我们
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

asm磁盘组操作不当导致数据文件丢失恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:asm磁盘组操作不当导致数据文件丢失恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最近遇到数据库恢复case,客户是要更换存储,在数据库mount状态把使用omf方式存储数据的asm 磁盘组通过rman copy到新的通过别名方式存储的新的asm 磁盘组的存储中,但是由于操作人员粗心,copy语句中部分目标磁盘组的数据文件别名重复了,最后执行rename file之后,导致部分数据文件彻底丢失.我们通过底层碎片扫描(参考:asm disk header 彻底损坏恢复)对于该用户的数据实现完全恢复.
因为整个过程重现比较麻烦,这里测试从一个data磁盘组中有一个omf方式存储的含有两个数据文件的表空间,通过rman copy 把这个表空间的两个文件拷贝到datanew磁盘组中,但是由于粗心把两个数据文件的别名写成一样,结果导致该表空间的一个数据文件彻底丢失的测试.

创建测试表空间
在datanew磁盘组中创建omf方式管理的xifenfei表空间,含有两个数据文件,file#分别为14和15

SQL> create tablespace xifenfei datafile '+DATA' SIZE 128m;
Tablespace created.
SQL> ALTER TABLESPACE XIFENFEI ADD DATAFILE '+DATA' SIZE 128m AUTOEXTEND ON;
Tablespace altered.
SQL> SELECT FILE_NAME,FILE_ID FROM  DBA_DATA_FILES WHERE TABLESPACE_NAME='XIFENFEI';
FILE_NAME
--------------------------------------------------------------------------------
   FILE_ID
----------
+DATA/XFF/DATAFILE/xifenfei.276.961143809
        14
+DATA/XFF/DATAFILE/xifenfei.277.961143825
        15

rman copy datafile 14
通过rman copy把datafile 14拷贝到data磁盘组中,目标端为别名方式存储

RMAN> copy datafile 14 to '+datanew/xifenfei.dbf';
Starting backup at 27-NOV-17
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=24 device type=DISK
channel ORA_DISK_1: starting datafile copy
input datafile file number=00014 name=+DATA/XFF/DATAFILE/xifenfei.276.961143809
output file name=+DATANEW/xifenfei.dbf tag=TAG20171127T082643 RECID=4 STAMP=961144006
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:07
Finished backup at 27-NOV-17
[grid@localhost ~]$ asmcmd
ASMCMD> cd datanew
ASMCMD> ls
XFF/
xifenfei.dbf
ASMCMD> ls -l
Type      Redund  Striped  Time             Sys  Name
                                            Y    XFF/
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  N    xifenfei.dbf => +DATANEW/XFF/DATAFILE/XIFENFEI.256.961144003
ASMCMD>

这里通过asmcmd的ls命令,可以看到虽然我们存储的为datanew磁盘组的别名文件,实际上是link到asm的omf方式的文件(本质上asm中的文件都是omf方式存储,只是在使用的时候体现asm的客户端程序方式不一样,是直接asm中的omf方式,还是asm中的别名).

rman copy datafile 15
通过rman copy把datafile 15 拷贝到和datafile 14别名一样的文件了

RMAN> copy datafile 15 to '+datanew/xifenfei.dbf';
Starting backup at 27-NOV-17
using channel ORA_DISK_1
channel ORA_DISK_1: starting datafile copy
input datafile file number=00015 name=+DATA/XFF/DATAFILE/xifenfei.277.961143825
output file name=+DATANEW/xifenfei.dbf tag=TAG20171127T082731 RECID=5 STAMP=961144053
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:03
Finished backup at 27-NOV-17
ASMCMD> ls -l
Type      Redund  Striped  Time             Sys  Name
                                            Y    XFF/
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  N    xifenfei.dbf => +DATANEW/XFF/DATAFILE/XIFENFEI.256.961144003
ASMCMD> cd xff
ASMCMD> ls
DATAFILE/
ASMCMD> cd datafile
ASMCMD> ls
XIFENFEI.256.961144003
ASMCMD>

这里可以看出来,在data磁盘组中,file 14被file 15覆盖掉了

rename file
把data磁盘组中的数据文件rename 到datanew磁盘组中

SQL> alter database rename file '+DATA/XFF/DATAFILE/xifenfei.276.961143809' to '+datanew/xifenfei.dbf';
Database altered.
SQL> alter database rename file '+DATA/XFF/DATAFILE/xifenfei.277.961143825' to '+datanew/xifenfei.dbf';
alter database rename file '+DATA/XFF/DATAFILE/xifenfei.277.961143825' to '+datanew/xifenfei.dbf'
*
ERROR at line 1:
ORA-01511: error in renaming log/data files
ORA-01523: cannot rename data file to '+data/xifenfei.dbf' - file already part of database

这里我们可以看到,file 14 rename 成功,但是file 15 rename失败,因为在数据库中,已经有了别名的文件(数据文件的路径)

omf自动删除文件
查看原磁盘组datanew中,发现datafile 14被自动删除

ASMCMD> pwd
+DATA/XFF/DATAFILE
ASMCMD> ls -l
Type      Redund  Striped  Time             Sys  Name
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  Y    SYSAUX.257.942061433
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  Y    SYSTEM.256.942061393
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  Y    UNDOTBS1.258.942061449
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  Y    USERS.259.942061449
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  Y    XIFENFEI.277.961143825
ASMCMD>

alert日志证实数据文件被删除

2017-11-27T09:05:03.054741-05:00
alter database rename file '+DATA/XFF/DATAFILE/xifenfei.276.961143809' to '+datanew/xifenfei.dbf'
2017-11-27T09:05:03.114947-05:00
NOTE: Under CF enqueue, no dependency request for disk group DATANEW
Deleted Oracle managed file +DATA/XFF/DATAFILE/xifenfei.276.961143809
Completed: alter database rename file '+DATA/XFF/DATAFILE/xifenfei.276.961143809' to '+datanew/xifenfei.dbf'
2017-11-27T09:05:21.471474-05:00
alter database rename file '+DATA/XFF/DATAFILE/xifenfei.277.961143825' to '+data/xifenfei.dbf'
ORA-1511 signalled during:alter database rename file
      '+DATA/XFF/DATAFILE/xifenfei.277.961143825' to'+datanew/xifenfei.dbf'

这里可以证实,数据文件的omf方式管理,在数据文件执行rename file的时候,会自动删除掉老的数据文件.这里悲剧已经发生,由于rman copy 覆盖了datanew磁盘组中的datafile 14,rename file又导致data磁盘组中的datafile 14被自动删除,从而使得datafile 14这个数据文件在两个磁盘组中都丢失.从常规角度来说,如果没有合适的备份该文件无法恢复.如果遭遇到oracle asm中数据文件丢失或者部分覆盖,请保护现场,联系我们(ORACLE数据库恢复技术支持),将为您提供专业数据库技术支持:Phone:17813235971    Q Q:107644445    E-Mail:dba@xifenfei.com最大限度抢救您的数据

mount: wrong fs type, bad option, bad superblock恢复Oracle

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:mount: wrong fs type, bad option, bad superblock恢复Oracle

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友找到我们,说对lv进行收缩操作之后,导致文件系统无法mount,提示超级块损坏.
尝试mount失败

[root@GZGSDB data]# mount /dev/vg_gzgsdb/lv_home /home
mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg_gzgsdb-lv_home,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

系统日志报错

Aug 16 21:23:24 GZGSDB kernel: EXT4-fs (dm-5): ext4_check_descriptors: Block bitmap for group 1 not in group (block 0)!
Aug 16 21:23:24 GZGSDB kernel: EXT4-fs (dm-5): group descriptors corrupted!

主要操作
cz


这里可以看出来,明显操作错误,在没有收缩文件系统之前,直接收缩的lv.
list_history


这里可以看出来,在发生故障之后,又做了很多操作,包括fsck和testdisk等,最终无法恢复,请求我们协助处理。

找出来备份超级块

[root@GZGSDB dul]# dumpe2fs /dev/vg_gzgsdb/lv_home |grep superblock
dumpe2fs 1.41.12 (17-May-2010)
ext2fs_read_bb_inode: A block group is missing an inode table
  Primary superblock at 0, Group descriptors at 1-52
  Backup superblock at 32768, Group descriptors at 32769-32820
  Backup superblock at 98304, Group descriptors at 98305-98356
  Backup superblock at 163840, Group descriptors at 163841-163892
  Backup superblock at 229376, Group descriptors at 229377-229428
  Backup superblock at 294912, Group descriptors at 294913-294964
  Backup superblock at 819200, Group descriptors at 819201-819252
  Backup superblock at 884736, Group descriptors at 884737-884788
  Backup superblock at 1605632, Group descriptors at 1605633-1605684
  Backup superblock at 2654208, Group descriptors at 2654209-2654260
  Backup superblock at 4096000, Group descriptors at 4096001-4096052
  Backup superblock at 7962624, Group descriptors at 7962625-7962676
  Backup superblock at 11239424, Group descriptors at 11239425-11239476
  Backup superblock at 20480000, Group descriptors at 20480001-20480052
  Backup superblock at 23887872, Group descriptors at 23887873-23887924
  Backup superblock at 71663616, Group descriptors at 71663617-71663668
  Backup superblock at 78675968, Group descriptors at 78675969-78676020
  Backup superblock at 102400000, Group descriptors at 102400001-102400052
  Backup superblock at 214990848, Group descriptors at 214990849-214990900

使用fsck修复

[root@GZGSDB data]# fsck -y /dev/vg_gzgsdb/lv_home
fsck from util-linux-ng 2.17.2
e2fsck 1.41.12 (17-May-2010)
fsck.ext4: Group descriptors look bad... trying backup blocks...
fsck.ext4: The ext2 superblock is corrupt when using the backup blocks
fsck.ext4: going back to original superblock
fsck.ext4: Device or resource busy while trying to open /dev/mapper/vg_gzgsdb-lv_home
Filesystem mounted or opened exclusively by another program?
--指定超级块恢复
[root@GZGSDB data]# fsck -y -b 102400000 /dev/vg_gzgsdb/lv_home
…………
Illegal block #0 (1296647770) in inode 354315.  CLEARED.
Illegal block #1 (1398362886) in inode 354315.  CLEARED.
Illegal block #3 (453538936) in inode 354315.  CLEARED.
Illegal block #5 (808333361) in inode 354315.  CLEARED.
Illegal block #6 (775434798) in inode 354315.  CLEARED.
Illegal block #8 (1180306180) in inode 354315.  CLEARED.
Illegal block #9 (1413893971) in inode 354315.  CLEARED.
Illegal block #10 (1229347423) in inode 354315.  CLEARED.
Illegal block #11 (1498613325) in inode 354315.  CLEARED.
Illegal indirect block (1296389203) in inode 354315.  CLEARED.
Inode 354315 is too big.  Truncate? yes
Block #1074301965 (69632) causes directory to be too big.  CLEARED.
Warning... fsck.ext4 for device /dev/mapper/vg_gzgsdb-lv_home exited with signal 11.
[root@GZGSDB data]# mount /dev/vg_gzgsdb/lv_home /home
mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg_gzgsdb-lv_home,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

至此基本上可以判断,直接修复该文件系统,让其正常mount的概率很小.直接通过lun层面来恢复.

通过工具解析lun
disk_size


比较明显,客户经过一系列操作现在的现象是lv 1.03T,文件系统只有821G,明显和客户给我反馈的操作不符(应该是lv 821G),证明客户做了大量操作,已经导致文件系统损坏
inode_recovery


由于文件系统严重异常,工具获取到的文件都是没有名称,直接从inode里面读取的数据.获取到这些数据之后,然后结合oracle的特性,判断出来对应的文件号关系(这里有大量文件重复);另外有个别文件通过inode恢复丢失,通过底层碎片重组进行恢复出来文件,然后恢复出来其中数据(asm disk header 彻底损坏恢复).这个客户比较幸运,system文件在另外一个分区中,不然工作量会大很多.
再次提醒:对lv操作一定要谨慎,特别是lvreduce操作,另外出现发生误操作之后,应该第一时间保护现场,而不是百度着去乱操作,可能导致故障更加悲剧

警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

继续上篇的tab$被清空(ORA-600 16703故障解析—tab$表被清空),导致数据库启动异常的case
ORA-600 16703报错
ora-600-16703


数据库日志分析
数据库open成功同时报ORA-7445 kokeicbegin和ORA-600 kzrini:!uprofile错误
ora-600-kzrini-uprofile


再次启动数据库直接报ORA-600 16703错误
ora-600-16703


这里有一个其他现象,就是数据库在open成功的同时(同一秒内),开始报异常.重启之后直接报
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
根据10046分析结果

=====================
select rowcnt,blkcnt,empcnt,avgspc,chncnt,avgrln,nvl(degree,1), nvl(instances,1) from tab$ where obj# = :1
END OF STMT
PARSE #140048443935120:c=0,e=390,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905161433
=====================
select blevel, leafcnt, distkey, lblkkey, dblkkey, clufac,        nvl(degree,1), nvl(instances,1) from ind$ where bo# = :1 and obj# = :2
END OF STMT
PARSE #140048443934176:c=1000,e=601,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162088
=====================
PARSING IN CURSOR #140048443933232 len=70 dep=1 uid=0 oct=3 lid=0 tim=1499185905162444 hv=3377894161 ad='84f13d70' sqlid='32d4jrb4pd4sj'
select charsetid, charsetform from col$  where obj# = :1 and col# = :2
END OF STMT
PARSE #140048443933232:c=0,e=294,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162443
=====================
PARSING IN CURSOR #140048443932288 len=52 dep=1 uid=0 oct=3 lid=0 tim=1499185905247020 hv=429618617 ad='84f0f2d0' sqlid='4krwuz0ctqxdt'
select ctime, mtime, stime from obj$ where obj# = :1
END OF STMT
PARSE #140048443932288:c=0,e=549,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905247019
BINDS #140048443932288:
select blevel, leafcnt, distkey, lblkkey, dblkkey, clufac,        nvl(degree,1), nvl(instances,1) from ind$ where bo# = :1 and obj# = :2
END OF STMT
PARSE #140048443934176:c=1000,e=601,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162088
=====================
PARSING IN CURSOR #140048443933232 len=70 dep=1 uid=0 oct=3 lid=0 tim=1499185905162444 hv=3377894161 ad='84f13d70' sqlid='32d4jrb4pd4sj'
select charsetid, charsetform from col$  where obj# = :1 and col# = :2
END OF STMT
PARSE #140048443933232:c=0,e=294,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162443
=====================
PARSING IN CURSOR #140048443932288 len=52 dep=1 uid=0 oct=3 lid=0 tim=1499185905247020 hv=429618617 ad='84f0f2d0' sqlid='4krwuz0ctqxdt'
select ctime, mtime, stime from obj$ where obj# = :1
END OF STMT
PARSE #140048443932288:c=0,e=549,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905247019
BINDS #140048443932288:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f5f91b87bd0  bln=22  avl=02  flg=05
  value=20
EXEC #140048443932288:c=2000,e=2686,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=1218588913,tim=1499185905249810
WAIT #140048443932288: nam='db file sequential read' ela= 6205 file#=1 block#=337 blocks=1 obj#=36 tim=1499185905256132
WAIT #140048443932288: nam='db file sequential read' ela= 3739 file#=1 block#=338 blocks=1 obj#=36 tim=1499185905266704
WAIT #140048443932288: nam='db file sequential read' ela= 4966 file#=1 block#=241 blocks=1 obj#=18 tim=1499185905271761
FETCH #140048443932288:c=0,e=21976,p=3,cr=3,cu=0,mis=0,r=1,dep=1,og=4,plh=1218588913,tim=1499185905271820
STAT #140048443932288 id=1 cnt=1 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ$ (cr=3 pr=3 pw=0 time=21993 us)'
STAT #140048443932288 id=2 cnt=1 pid=1 pos=1 obj=36 op='INDEX RANGE SCAN I_OBJ1 (cr=2 pr=2 pw=0 time=16923 us)'
CLOSE #140048443932288:c=0,e=63,dep=1,type=0,tim=1499185905271941
BINDS #140048443935120:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f5f91c07de8  bln=22  avl=02  flg=05
  value=20
EXEC #140048443935120:c=1000,e=795,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=2970138452,tim=1499185905272802
WAIT #140048443935120: nam='db file sequential read' ela= 3197 file#=1 block#=169 blocks=1 obj#=3 tim=1499185905276069
WAIT #140048443935120: nam='db file sequential read' ela= 3459 file#=1 block#=170 blocks=1 obj#=3 tim=1499185905404084
WAIT #140048443935120: nam='db file sequential read' ela= 6358 file#=1 block#=145 blocks=1 obj#=4 tim=1499185905410548
FETCH #140048443935120:c=999,e=137805,p=3,cr=3,cu=0,mis=0,r=0,dep=1,og=4,plh=2970138452,tim=1499185905410635
STAT #140048443935120 id=1 cnt=0 pid=0 pos=1 obj=4 op='TABLE ACCESS CLUSTER TAB$ (cr=3 pr=3 pw=0 time=137810 us)'
STAT #140048443935120 id=2 cnt=1 pid=1 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=2 pw=0 time=131330 us)'
*** 2017-07-05 00:31:46.094
Incident 176395 created, dump file: /oracle/diag/rdbms/orcl/orcl2/incident/incdir_176395/orcl_ora_51261_i176395.trc
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []

以及以往恢复经验和mos,基本上可以确定是由于tab$和obj$记录不匹配导致该问题.而且是obj#=20的记录在tab$和obj$中不匹配.

分析tab$和obj$记录

Data UnLoader: 11.2.0.1.5 - Internal Only - on Wed Jul 05 01:28:53 2017
with 64-bit io functions and the decompression option
 
Copyright (c) 1994 2017 Bernard van Duijnen All rights reserved.
 
 Strictly Oracle Internal Use Only
 
 
Found db_id = 1334610369
Found db_name = orcl
DUL> unload table TAB$( OBJ# number, DATAOBJ# number,
  2      TS# number, FILE# number, BLOCK# number,
  3      BOBJ# number, TAB# number, COLS number, CLUCOLS number,
  4      PCTFREE$ ignore, PCTUSED$ ignore, INITRANS ignore, MAXTRANS ignore,
  5      FLAGS ignore, AUDIT$ ignore, ROWCNT ignore, BLKCNT ignore,
  6      EMPCNT ignore, AVGSPC ignore, CHNCNT ignore, AVGRLN ignore,
  7      AVGSPC_FLB ignore, FLBCNT ignore,
  8      ANALYZETIME ignore, SAMPLESIZE ignore,
  9      DEGREE ignore, INSTANCES ignore,
 10      INTCOLS ignore, KERNELCOLS number, PROPERTY number)
 11      cluster  C_OBJ#(OBJ#)
 12      storage ( tablespace 0 segobjno 2 tabno 1 file 1 block 144);
. unloading table                      TAB$       0 rows unloaded
DUL> unload table OBJ$( OBJ# number, DATAOBJ# number, OWNER# number,
  2      NAME clean varchar2(30), NAMESPACE ignore, SUBNAME clean varchar2(30),
  3      TYPE# number, CTIME ignore, MTIME ignore, STIME ignore,
  4      STATUS ignore, REMOTEOWNER ignore, LINKNAME ignore,
  5      FLAGS ignore, OID$ hexraw)
  6      storage ( tablespace 0 segobjno 18 file 1 block 240);
. unloading table                      OBJ$   89804 rows unloaded
DUL>

这里可以明确表示tab$无记录,obj$有记录,从而启动的过程出现ORA-600 16703错误可以很好解释.由于数据库启动成功和报错几乎同时进行,以及上面看到的tab$记录不存在的情况,初步怀疑是有startup触发器清空tab$表记录
工具分析触发器表trigger$
startup-trigger


这里果然看到一个after startup on database的触发器,名字为DBMS_SUPPORT_DBMONITOR,而它调用的是DBMS_SUPPORT_DBMONITORP存储.

工具分析pl/sql表source$
DBMS_SUPPORT_DBMONITOR


对wraped加密的程序进行解密
DBMS_SUPPORT_DBMONITOR-unwraped


这里可以明确的看到DBMS_SUPPORT_DBMONITORP存储过程备份tab$表到orachk中然后delete tab$表,现在已经明确是由于DBMS_SUPPORT_DBMONITOR触发器在数据库重启之后开始执行调用DBMS_SUPPORT_DBMONITORP程序,如果判断数据库创建时间大于等于300天,便干掉tab$表,实现数据库破坏.

查找DBMS_SUPPORT_DBMONITOR等程序源头
分析相关程序创建时间,通过obj$表的ctime和name来判断
DBMS_SUPPORT_DBMONITOR-ctime
bootstrap-ctime


这里可以看出来DBMS_SUPPORT_DBMONITOR和DBMS_SUPPORT_DBMONITORP的创建时间基本上和数据库核心对象的创建时间相差无几,我们可以大概排除掉,是由于pl sql等工具连接数据库导致该发问题(类似:plsql dev引起的数据库被黑勒索比特币实现原理分析和解决方案),很可能是在dbca创建库的过程中就已经带有了DBMS_SUPPORT_DBMONITOR等程序,如果这样那很可能是由于数据库的安装介质被破坏导致该问题.

分析DBMS_SUPPORT_DBMONITOR来源
prvtsupp
20170711001626


这里已经很清晰,由于prvtsupp.plb被人注入了恶意脚本从而使得数据库被创建了DBMS_SUPPORT_DBMONITOR的触发器和DBMS_SUPPORT_DBMONITORP的存储过程,从而实现了对数据库的而易破坏.

确定破坏文件prvtsupp.plb来源于介质
jar


这里比较明显,文件就是来源database\stage\Components\oracle.rdbms.dbscripts\11.2.0.4.0\1\DataFiles\filegroup2.jar\rdbms\admin\prvtsupp.plb文件被修改导致
md5


通过md5码对比,可以确定是有人对软件的安装介质进行了破坏,从而实现了恶意代码的注入,从而实现了数据库300天之后重启之后无法正常启动而是出现类似ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []的错误.

温馨提示
各位一定要从官方途径下载oracle安装介质,如果是从其他互联网途径下载一定要验证md5,确保文件没有被人恶意篡改,造成无可挽回的损坏.如果真的不幸遇到这类问题,请保护现场联系我们
Tel:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com
我们可以通过使用bbed对tab$表数据数据进行恢复实现数据库正常启动,实现数据0丢失,最大限度抢救您的数据和减少业务恢复时间

ORA-600 16703故障解析—tab$表被清空

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-600 16703故障解析—tab$表被清空

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最近连续遇到两次数据库启动报ORA-600 16703错误,而导致数据库无法正常启动的问题
ORA-600 16703报错
ora-600-16703


Completed: ALTER DATABASE RECOVER  database
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 32 processes
Started redo scan
Completed redo scan
 read 0 KB redo, 0 data blocks need recovery
Started redo application at
 Thread 1: logseq 993752, block 2, scn 14872376881763
Recovery of Online Redo Log: Thread 1 Group 6 Seq 993752 Reading mem 0
  Mem# 0: /oracle/oradata/xifenfei/redo06.log
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 993752, block 3, scn 14872376901765
 0 data blocks read, 0 data blocks written, 0 redo k-bytes read
Tue Jul 04 23:13:29 2017
Thread 1 advanced to log sequence 993753 (thread open)
Thread 1 opened at log sequence 993753
  Current log# 7 seq# 993753 mem# 0: /oracle/oradata/xifenfei/redo07.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Jul 04 23:13:29 2017
SMON: enabling cache recovery
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_15886.trc  (incident=163595):
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_163595/xifenfei_ora_15886_i163595.trc
Tue Jul 04 23:13:30 2017
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_15886.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_15886.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER (ospid: 15886): terminating the instance due to error 704
Instance terminated by USER, pid = 15886
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (15886) as a result of ORA-1092

这里报错比较明显ORA-600 16703,而且是在启动时bootstrap$中的对象出现该问题.
10046分析启动过程

=====================
select rowcnt,blkcnt,empcnt,avgspc,chncnt,avgrln,nvl(degree,1), nvl(instances,1) from tab$ where obj# = :1
END OF STMT
PARSE #140048443935120:c=0,e=390,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905161433
=====================
select blevel, leafcnt, distkey, lblkkey, dblkkey, clufac,        nvl(degree,1), nvl(instances,1) from ind$ where bo# = :1 and obj# = :2
END OF STMT
PARSE #140048443934176:c=1000,e=601,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162088
=====================
PARSING IN CURSOR #140048443933232 len=70 dep=1 uid=0 oct=3 lid=0 tim=1499185905162444 hv=3377894161 ad='84f13d70' sqlid='32d4jrb4pd4sj'
select charsetid, charsetform from col$  where obj# = :1 and col# = :2
END OF STMT
PARSE #140048443933232:c=0,e=294,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162443
=====================
PARSING IN CURSOR #140048443932288 len=52 dep=1 uid=0 oct=3 lid=0 tim=1499185905247020 hv=429618617 ad='84f0f2d0' sqlid='4krwuz0ctqxdt'
select ctime, mtime, stime from obj$ where obj# = :1
END OF STMT
PARSE #140048443932288:c=0,e=549,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905247019
BINDS #140048443932288:
select blevel, leafcnt, distkey, lblkkey, dblkkey, clufac,        nvl(degree,1), nvl(instances,1) from ind$ where bo# = :1 and obj# = :2
END OF STMT
PARSE #140048443934176:c=1000,e=601,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162088
=====================
PARSING IN CURSOR #140048443933232 len=70 dep=1 uid=0 oct=3 lid=0 tim=1499185905162444 hv=3377894161 ad='84f13d70' sqlid='32d4jrb4pd4sj'
select charsetid, charsetform from col$  where obj# = :1 and col# = :2
END OF STMT
PARSE #140048443933232:c=0,e=294,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162443
=====================
PARSING IN CURSOR #140048443932288 len=52 dep=1 uid=0 oct=3 lid=0 tim=1499185905247020 hv=429618617 ad='84f0f2d0' sqlid='4krwuz0ctqxdt'
select ctime, mtime, stime from obj$ where obj# = :1
END OF STMT
PARSE #140048443932288:c=0,e=549,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905247019
BINDS #140048443932288:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f5f91b87bd0  bln=22  avl=02  flg=05
  value=20
EXEC #140048443932288:c=2000,e=2686,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=1218588913,tim=1499185905249810
WAIT #140048443932288: nam='db file sequential read' ela= 6205 file#=1 block#=337 blocks=1 obj#=36 tim=1499185905256132
WAIT #140048443932288: nam='db file sequential read' ela= 3739 file#=1 block#=338 blocks=1 obj#=36 tim=1499185905266704
WAIT #140048443932288: nam='db file sequential read' ela= 4966 file#=1 block#=241 blocks=1 obj#=18 tim=1499185905271761
FETCH #140048443932288:c=0,e=21976,p=3,cr=3,cu=0,mis=0,r=1,dep=1,og=4,plh=1218588913,tim=1499185905271820
STAT #140048443932288 id=1 cnt=1 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ$ (cr=3 pr=3 pw=0 time=21993 us)'
STAT #140048443932288 id=2 cnt=1 pid=1 pos=1 obj=36 op='INDEX RANGE SCAN I_OBJ1 (cr=2 pr=2 pw=0 time=16923 us)'
CLOSE #140048443932288:c=0,e=63,dep=1,type=0,tim=1499185905271941
BINDS #140048443935120:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f5f91c07de8  bln=22  avl=02  flg=05
  value=20
EXEC #140048443935120:c=1000,e=795,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=2970138452,tim=1499185905272802
WAIT #140048443935120: nam='db file sequential read' ela= 3197 file#=1 block#=169 blocks=1 obj#=3 tim=1499185905276069
WAIT #140048443935120: nam='db file sequential read' ela= 3459 file#=1 block#=170 blocks=1 obj#=3 tim=1499185905404084
WAIT #140048443935120: nam='db file sequential read' ela= 6358 file#=1 block#=145 blocks=1 obj#=4 tim=1499185905410548
FETCH #140048443935120:c=999,e=137805,p=3,cr=3,cu=0,mis=0,r=0,dep=1,og=4,plh=2970138452,tim=1499185905410635
STAT #140048443935120 id=1 cnt=0 pid=0 pos=1 obj=4 op='TABLE ACCESS CLUSTER TAB$ (cr=3 pr=3 pw=0 time=137810 us)'
STAT #140048443935120 id=2 cnt=1 pid=1 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=2 pw=0 time=131330 us)'
*** 2017-07-05 00:31:46.094
Incident 176395 created, dump file: /oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_176395/xifenfei_ora_51261_i176395.trc
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []

报错信息明显,由于select rowcnt,blkcnt,empcnt,avgspc,chncnt,avgrln,nvl(degree,1), nvl(instances,1) from tab$ where obj# = :1无法正常执行.从而出现了ORA-600 16703的错误,更加直接一点的解释就是obj#=20的对象在tab$中找不到记录,从而出现此类报错.和官方解释ORA-600 TAB$和obj$不匹配一致.
分析system文件
通过dul等工具分析system文件发现tab$表记录为空

Data UnLoader: 11.2.0.1.5 - Internal Only - on Wed Jul 05 01:28:53 2017
with 64-bit io functions and the decompression option
Copyright (c) 1994 2017 Bernard van Duijnen All rights reserved.
 Strictly Oracle Internal Use Only
Found db_id = 1334610369
Found db_name = xifenfei
DUL> unload table TAB$( OBJ# number, DATAOBJ# number,
  2      TS# number, FILE# number, BLOCK# number,
  3      BOBJ# number, TAB# number, COLS number, CLUCOLS number,
  4      PCTFREE$ ignore, PCTUSED$ ignore, INITRANS ignore, MAXTRANS ignore,
  5      FLAGS ignore, AUDIT$ ignore, ROWCNT ignore, BLKCNT ignore,
  6      EMPCNT ignore, AVGSPC ignore, CHNCNT ignore, AVGRLN ignore,
  7      AVGSPC_FLB ignore, FLBCNT ignore,
  8      ANALYZETIME ignore, SAMPLESIZE ignore,
  9      DEGREE ignore, INSTANCES ignore,
 10      INTCOLS ignore, KERNELCOLS number, PROPERTY number)
 11      cluster  C_OBJ#(OBJ#)
 12      storage ( tablespace 0 segobjno 2 tabno 1 file 1 block 144);
. unloading table                      TAB$       0 rows unloaded
DUL> unload table OBJ$( OBJ# number, DATAOBJ# number, OWNER# number,
  2      NAME clean varchar2(30), NAMESPACE ignore, SUBNAME clean varchar2(30),
  3      TYPE# number, CTIME ignore, MTIME ignore, STIME ignore,
  4      STATUS ignore, REMOTEOWNER ignore, LINKNAME ignore,
  5      FLAGS ignore, OID$ hexraw)
  6      storage ( tablespace 0 segobjno 18 file 1 block 240);
. unloading table                      OBJ$   89804 rows unloaded
DUL>

发现在obj$中有创建表ORACHKBEC66CBE055000000000001(ORACHK+16进制24位)的一个表名字
orachk


该表用途通过分析数据库日志发现
create table ORACHKBEC66CBE055000000000001 tablespace system as select * from sys.tab$;
也就是说,这个orachk的表是用来备份tab$的,然后进一步发现有delete from tab$.至此基本上分析清楚,tab$表备份到ORACHK表中,然后delete tab$表数据.实现数据库破坏以及难以恢复的效果.有点类似plsql dev引起的数据库被黑勒索比特币实现原理分析和解决方案的破坏案例
我们可以通过使用bbed对tab$表数据数据进行恢复实现数据库正常启动,实现数据0丢失,最大限度抢救您的数据和减少业务恢复时间
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com
源头已经分析清楚,是有人篡改了oracle安装介质,导致数据库运行300天之后出现该问题,具体见:警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703

file$ 删除记录恢复(delete file$ recovery)

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:file$ 删除记录恢复(delete file$ recovery)

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最近遭遇几次有人因为对oracle不太理解,由于各种情况下,删除了file$中的部分记录,从而使得该文件之后的文件都丢失,使得数据库出现各种异常情况。这里演示了可以重启数据库的情况下两种常见的删除file$中记录的恢复

创建表空间和表

SQL> startup
ORACLE instance started.
Total System Global Area 2421825536 bytes
Fixed Size                  2215744 bytes
Variable Size            1828716736 bytes
Database Buffers          570425344 bytes
Redo Buffers               20467712 bytes
Database mounted.
Database opened.
SQL> create tablespace tbs_delete_file datafile '/home/oracle/oradata/xifenfei/file01.dbf' size 128M;
Tablespace created.
SQL> alter tablespace tbs_delete_file add datafile '/home/oracle/oradata/xifenfei/file02.dbf' size 128M;
Tablespace altered.
SQL> alter tablespace tbs_delete_file add datafile '/home/oracle/oradata/xifenfei/file03.dbf' size 128M;
Tablespace altered.
SQL> create table t_xifenfei tablespace tbs_delete_file
  2  as  select * from dba_objects;
Table created.
SQL> insert into t_xifenfei select * from dba_objects;
71895 rows created.
SQL> /
71895 rows created.
SQL> /
71895 rows created.
SQL> /
71895 rows created.
SQL> commit;
Commit complete.
SQL> select count(*) from t_xifenfei;
  COUNT(*)
----------
    359475

删除file$中记录

SQL> select ts#,file# from file$;
       TS#      FILE#
---------- ----------
         0          1
         1          2
         2          3
         4          4
         6          5
         6          6
         6          7
7 rows selected.
[oracle@localhost ~]$ ss
SQL*Plus: Release 11.2.0.1.0 Production on Fri Jun 2 23:30:57 2017
Copyright (c) 1982, 2009, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> delete from file$ where file#=5;
1 row deleted.
SQL> commit;
Commit complete.
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

不重建控制文件,重启数据库

[oracle@localhost tmp]$ ss
SQL*Plus: Release 11.2.0.1.0 Production on Fri Jun 2 23:46:33 2017
Copyright (c) 1982, 2009, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 2421825536 bytes
Fixed Size                  2215744 bytes
Variable Size            1828716736 bytes
Database Buffers          570425344 bytes
Redo Buffers               20467712 bytes
Database mounted.
Database opened.
SQL> select file# from file$;
     FILE#
----------
         1
         2
         3
         4
         6
         7
6 rows selected.
SQL> select file# from v$datafile;
     FILE#
----------
         1
         2
         3
         4
         5
         6
         7
7 rows selected.
SQL> select count(*) from t_xifenfei;
  COUNT(*)
----------
    359475

数据库启动正常,而且文件也未从控制文件中删除,而且记录查询正常,考虑通过逻辑方式迁移数据。

测试重建控制文件

SQL> startup nomount
ORACLE instance started.
Total System Global Area 2421825536 bytes
Fixed Size                  2215744 bytes
Variable Size            1828716736 bytes
Database Buffers          570425344 bytes
Redo Buffers               20467712 bytes
SQL> CREATE CONTROLFILE REUSE DATABASE "TEST" NORESETLOGS  NOARCHIVELOG
  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
    MAXDATAFILES 100
    MAXINSTANCES 8
    MAXLOGHISTORY 292
  4    5    6    7  LOGFILE
  8    GROUP 1 '/home/oracle/oradata/xifenfei/redo01.log'  SIZE 50M BLOCKSIZE 512,
  GROUP 2 '/home/oracle/oradata/xifenfei/redo02.log'  SIZE 50M BLOCKSIZE 512,
  GROUP 3 '/home/oracle/oradata/xifenfei/redo03.log'  SIZE 50M BLOCKSIZE 512
  9   10   11  DATAFILE
 12    '/home/oracle/oradata/xifenfei/system01.dbf',
  '/home/oracle/oradata/xifenfei/sysaux01.dbf',
  '/home/oracle/oradata/xifenfei/undotbs01.dbf',
 13   14   15    '/home/oracle/oradata/xifenfei/users01.dbf',
  '/home/oracle/oradata/xifenfei/file01.dbf',
 16   17    '/home/oracle/oradata/xifenfei/file02.dbf',
  '/home/oracle/oradata/xifenfei/file03.dbf'
CHARACTER SET AL32UTF8
 18   19   20  ;
Control file created.
SQL> alter database open;
Database altered.
SQL> select file# from v$datafile;
     FILE#
----------
         1
         2
         3
         4
SQL> select file# from file$;
     FILE#
----------
         1
         2
         3
         4
         6
         7
6 rows selected.
SQL> select count(*) from t_xifenfei;
select count(*) from t_xifenfei
                     *
ERROR at line 1:
ORA-00600: internal error code, arguments: [25029], [6], [], [], [], [], [],[], [], [], [], []

删除file$中记录,然后重启库之后,大于删除的file#之后的数据文件全部丢失.

alert日志报错

Fri Jun 02 23:49:42 2017
alter database open
Fri Jun 02 23:49:42 2017
Thread 1 advanced to log sequence 9 (thread open)
Thread 1 opened at log sequence 9
  Current log# 3 seq# 9 mem# 0: /home/oracle/oradata/xifenfei/redo03.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Jun 02 23:49:42 2017
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 2.
Dictionary check beginning
Tablespace 'TEMP' #3 found in data dictionary,
but not in the controlfile. Adding to controlfile.
File #5 in the controlfile not found in data dictionary.
Removing file from controlfile.
data file 5: '/home/oracle/oradata/xifenfei/file01.dbf'
File #6 in the controlfile not found in data dictionary.
Removing file from controlfile.
data file 6: '/home/oracle/oradata/xifenfei/file02.dbf'
File #7 in the controlfile not found in data dictionary.
Removing file from controlfile.
data file 7: '/home/oracle/oradata/xifenfei/file03.dbf'
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
*********************************************************************
WARNING: The following temporary tablespaces contain no files.
         This condition can occur when a backup controlfile has
         been restored.  It may be necessary to add files to these
         tablespaces.  That can be done using the SQL statement:
         ALTER TABLESPACE <tablespace_name> ADD TEMPFILE
         Alternatively, if these temporary tablespaces are no longer
         needed, then they can be dropped.
           Empty temporary tablespace: TEMP
*********************************************************************
Database Characterset is AL32UTF8
No Resource Manager plan active
**********************************************************
WARNING: Files may exists in db_recovery_file_dest
that are not known to the database. Use the RMAN command
CATALOG RECOVERY AREA to re-catalog any such files.
If files cannot be cataloged, then manually delete them
using OS command.
One of the following events caused this:
1. A backup controlfile was restored.
2. A standby controlfile was restored.
3. The controlfile was re-created.
4. db_recovery_file_dest had previously been enabled and
   then disabled.
**********************************************************
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Fri Jun 02 23:49:43 2017
QMNC started with pid=20, OS id=11886
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Completed: alter database open
Fri Jun 02 23:49:44 2017
db_recovery_file_dest_size of 3882 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Fri Jun 02 23:49:44 2017
Errors in file /opt/oracle/diag/rdbms/test/test/trace/test_m001_11890.trc  (incident=84344):
ORA-00600: internal error code, arguments: [25029], [6], [], [], [], [], [], [], [], [], [], []
Incident details in: /opt/oracle/diag/rdbms/test/test/incident/incdir_84344/test_m001_11890_i84344.trc
Errors in file /opt/oracle/diag/rdbms/test/test/trace/test_m001_11890.trc:
ORA-00600: internal error code, arguments: [25029], [6], [], [], [], [], [], [], [], [], [], []
Fri Jun 02 23:49:45 2017
Trace dumping is performing id=[cdmp_20170602234945]
Fri Jun 02 23:49:46 2017
Starting background process CJQ0
Fri Jun 02 23:49:46 2017
CJQ0 started with pid=21, OS id=11902

这里报错明显,由于file$.file#=5被删除,重建控制文件之后,file#在5之后的数据文件全部从控制文件中删除,类似提示File #5 in the controlfile not found in data dictionary. Removing file from controlfile.

插入file$记录恢复

--通过dump文件头和相关block,找出来对应记录
插入被file$删除记录
SQL> insert into file$ values(5,2,16384,6,5,0,0,0,993135,null,20971522,null,null,null);
1 row created.
SQL> commit;
Commit complete.
SQL> select file# from file$;
     FILE#
----------
         1
         2
         3
         4
         5
         6
         7
7 rows selected.
--重启数据库,创建控制文件
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup nomount
ORACLE instance started.
Total System Global Area 2421825536 bytes
Fixed Size                  2215744 bytes
Variable Size            1828716736 bytes
Database Buffers          570425344 bytes
Redo Buffers               20467712 bytes
SQL> CREATE CONTROLFILE REUSE DATABASE "TEST" NORESETLOGS  NOARCHIVELOG
  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
    MAXDATAFILES 100
    MAXINSTANCES 8
  4    5    6      MAXLOGHISTORY 292
  7  LOGFILE
  GROUP 1 '/home/oracle/oradata/xifenfei/redo01.log'  SIZE 50M BLOCKSIZE 512,
  GROUP 2 '/home/oracle/oradata/xifenfei/redo02.log'  SIZE 50M BLOCKSIZE 512,
  8    9   10    GROUP 3 '/home/oracle/oradata/xifenfei/redo03.log'  SIZE 50M BLOCKSIZE 512
DATAFILE
 11   12    '/home/oracle/oradata/xifenfei/system01.dbf',
  '/home/oracle/oradata/xifenfei/sysaux01.dbf',
 13   14    '/home/oracle/oradata/xifenfei/undotbs01.dbf',
 15    '/home/oracle/oradata/xifenfei/users01.dbf',
  '/home/oracle/oradata/xifenfei/file01.dbf',
  '/home/oracle/oradata/xifenfei/file02.dbf',
 16   17   18    '/home/oracle/oradata/xifenfei/file03.dbf'
 19  CHARACTER SET AL32UTF8
; 20
Control file created.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 5 needs media recovery
ORA-01110: data file 5: '/home/oracle/oradata/xifenfei/file01.dbf'
SQL> recover database;
Media recovery complete.
SQL> alter database open;
Database altered.
SQL> select file# from file$;
     FILE#
----------
         1
         2
         3
         4
         5
         6
         7
7 rows selected.
SQL> select file# from v$datafile;
     FILE#
----------
         1
         2
         3
         4
         5
         6
         7
7 rows selected.
SQL> select count(*) from t_xifenfei;
  COUNT(*)
----------
    359475

alert日志正常,未提示删除控制文件中数据文件

alter database open
Fri Jun 02 23:56:52 2017
Thread 1 advanced to log sequence 10 (thread open)
Thread 1 opened at log sequence 10
  Current log# 1 seq# 10 mem# 0: /home/oracle/oradata/xifenfei/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Jun 02 23:56:52 2017
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 2.
Dictionary check beginning
Tablespace 'TEMP' #3 found in data dictionary,
but not in the controlfile. Adding to controlfile.
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
*********************************************************************
WARNING: The following temporary tablespaces contain no files.
         This condition can occur when a backup controlfile has
         been restored.  It may be necessary to add files to these
         tablespaces.  That can be done using the SQL statement:
         ALTER TABLESPACE <tablespace_name> ADD TEMPFILE
         Alternatively, if these temporary tablespaces are no longer
         needed, then they can be dropped.
           Empty temporary tablespace: TEMP
*********************************************************************
Database Characterset is AL32UTF8
No Resource Manager plan active
**********************************************************
WARNING: Files may exists in db_recovery_file_dest
that are not known to the database. Use the RMAN command
CATALOG RECOVERY AREA to re-catalog any such files.
If files cannot be cataloged, then manually delete them
using OS command.
One of the following events caused this:
1. A backup controlfile was restored.
2. A standby controlfile was restored.
3. The controlfile was re-created.
4. db_recovery_file_dest had previously been enabled and
   then disabled.
**********************************************************
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Fri Jun 02 23:56:53 2017
QMNC started with pid=20, OS id=12127
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Completed: alter database open

通过插入删除记录,重建控制文件,数据库恢复正常,而且相关记录也可以查询。

结论总结
1. 如果删除file$中的记录,而且控制文件未重建,数据库可以正常启动,而且可以查询数据
2. 如果删除file$中的记录,而且控制文件被重建,在数据库启动过程中,从被删除文件之后的所有文件记录从控制文件中删除(类似:File N in the controlfile not found in data dictionary. Removing file from controlfile.).可以在数据库open之后,插入被删除的记录,重建控制文件,数据会被恢复回来.如果数据库无法启动,需要通过其他方式进行恢复被删除记录