oracle 8.1.6因断电无法启动恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:oracle 8.1.6因断电无法启动恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到一个oralce 8.1.6的数据库恢复请求,由于断电之后,系统无法正常恢复,通过尝试recover datafile报错如下

SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle8i Enterprise Edition Release 8.1.6.0.0 - Production
PL/SQL Release 8.1.6.0.0 - Production
CORE	8.1.6.0.0	Production
TNS for 32-bit Windows: Version 8.1.6.0.0 - Production
NLSRTL Version 3.4.1.0.0 - Production
Tue Oct 16 14:17:01 2018
Media Recovery Datafile: 1
Media Recovery Start
Media Recovery Log
Recovery of Online Redo Log: Thread 1 Group 1 Seq 146597 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\ORADATA\ORCL\REDO01.LOG
Tue Oct 16 14:17:02 2018
Errors in file D:\Oracle\admin\orcl\udump\ORA03040.TRC:
ORA-00600: internal error code, arguments: [kcoapl_blkchk], [1], [30547], [6101], [], [], [], []
Tue Oct 16 14:17:03 2018
Errors in file D:\Oracle\admin\orcl\udump\ORA03040.TRC:
ORA-01578: ORACLE data block corrupted (file # 1, block # 30547)
ORA-01110: data file 1: 'D:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF'
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [kcoapl_blkchk], [1], [30547], [6101], [], [], [], []
Tue Oct 16 14:17:03 2018
Errors in file D:\Oracle\admin\orcl\udump\ORA03040.TRC:
ORA-00314: log 2 of thread 1, expected sequence# 146598 doesn't match 146415
ORA-00312: online log 2 thread 1: 'D:\ORACLE\ORADATA\ORCL\REDO02.LOG'

这里可以获取到两个信息:1)system文件可能有坏块,导致数据库recover的时候报ORA-600 kcoapl_blkchk错误,2)数据库的redo可能异常了,由于8i数据库默认redo 10m,在业务繁忙时候切换较为频繁,而文件系统的cache导致redo信息比较老,而数据文件需要redo比较新,从而无法正常恢复成功。比较明显对于当前这样的情况只能是屏蔽数据一致性,强制拉库

ARC0: media recovery disabled
Tue Oct 16 14:17:39 2018
SMON: enabling cache recovery
Tue Oct 16 14:17:39 2018
Errors in file D:\Oracle\admin\orcl\udump\ORA02256.TRC:
ORA-00600: internal error code, arguments: [2662], [1], [1712082681], [1], [1712107587], [8388610], [], []
Tue Oct 16 14:17:41 2018
Errors in file D:\Oracle\admin\orcl\udump\ORA02256.TRC:
ORA-00600: internal error code, arguments: [2662], [1], [1712082682], [1], [1712107587], [8388610], [], []
ORA-00600: internal error code, arguments: [2662], [1], [1712082681], [1], [1712107587], [8388610], [], []
Tue Oct 16 14:17:43 2018
Errors in file D:\Oracle\admin\orcl\udump\ORA02256.TRC:
ORA-00600: internal error code, arguments: [2662], [1], [1712082682], [1], [1712107587], [8388610], [], []
ORA-00600: internal error code, arguments: [2662], [1], [1712082681], [1], [1712107587], [8388610], [], []
Tue Oct 16 14:17:45 2018
Errors in file D:\Oracle\admin\orcl\udump\ORA02256.TRC:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [1], [1712082682], [1], [1712107587], [8388610], [], []
ORA-00600: internal error code, arguments: [2662], [1], [1712082681], [1], [1712107587], [8388610], [], []

ORA-600 2662这个错误比较常见,直接推数据库scn,启动库

Tue Oct 16 16:09:11 2018
Errors in file D:\Oracle\admin\orcl\bdump\orclSMON.TRC:
ORA-00600: internal error code, arguments: [4193], [29469], [29477], [], [], [], [], []
Recovery of Online Redo Log: Thread 1 Group 3 Seq 1 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\ORADATA\ORCL\REDO03.LOG
Tue Oct 16 16:09:12 2018
Errors in file D:\Oracle\admin\orcl\bdump\orclSMON.TRC:
ORA-01595: error freeing extent (3) of rollback segment (2))
ORA-00600: internal error code, arguments: [4193], [29469], [29477], [], [], [], [], []
Tue Oct 16 16:09:13 2018
Completed: ALTER DATABASE OPEN
Tue Oct 16 16:12:06 2018
CREATE ROLLBACK SEGMENT nrbs1 TABLESPACE rbs
Tue Oct 16 16:12:06 2018
Errors in file D:\Oracle\admin\orcl\udump\ORA02252.TRC:
ORA-00600: internal error code, arguments: [4194], [79], [38], [], [], [], [], []

错误比较明显ORA-600 4194,而且已经告知是由于rollback segment 2异常,通过屏蔽回滚段,open数据库,删除老回滚段,创建新回滚段(8i无undo自动管理)

SQL> startup
ORACLE instance started.
Total System Global Area 1549432076 bytes
Fixed Size                    70924 bytes
Variable Size             500707328 bytes
Database Buffers         1048576000 bytes
Redo Buffers                  77824 bytes
Database mounted.
Database opened.
SQL> drop rollback segment "RBS1";
Rollback segment dropped.
SQL> drop rollback segment "RBS2";
Rollback segment dropped.
SQL> drop rollback segment "RBS3";
Rollback segment dropped.
SQL> drop rollback segment "RBS4";
Rollback segment dropped.
SQL> drop rollback segment "RBS5";
Rollback segment dropped.
SQL> drop rollback segment "RBS6";
Rollback segment dropped.
SQL> CREATE ROLLBACK SEGMENT nrbs1 TABLESPACE rbs;
Rollback segment created.
SQL> CREATE ROLLBACK SEGMENT nrbs2 TABLESPACE rbs;
Rollback segment created.
SQL> CREATE ROLLBACK SEGMENT nrbs3 TABLESPACE rbs;
Rollback segment created.
SQL> CREATE ROLLBACK SEGMENT nrbs4 TABLESPACE rbs;
Rollback segment created.
SQL> CREATE ROLLBACK SEGMENT nrbs5 TABLESPACE rbs;
Rollback segment created.
SQL> CREATE ROLLBACK SEGMENT nrbs6 TABLESPACE rbs;
Rollback segment created.
SQL> CREATE ROLLBACK SEGMENT nrbs7 TABLESPACE rbs;
Rollback segment created.

客户安排导出导入,至此该库恢复完成

由于bootstrap$异常导致数据库启动报ORA-03113 ORA-07445 lmebucp

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:由于bootstrap$异常导致数据库启动报ORA-03113 ORA-07445 lmebucp

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库无法正常启动,报ORA-03113

SQL> startup
ORACLE 例程已经启动。
Total System Global Area 5016387584 bytes
Fixed Size                  2011136 bytes
Variable Size             905969664 bytes
Database Buffers         4093640704 bytes
Redo Buffers               14766080 bytes
数据库装载完毕。
ORA-03113: 通信通道的文件结束

alert日志报错ORA-07445 lmebucp

Mon Aug 27 15:31:37 2018
Thread 1 advanced to log sequence 21691
Thread 1 opened at log sequence 21691
  Current log# 2 seq# 21691 mem# 0: /data/oracle/orcl/redo02.log
Successful open of redo thread 1
Mon Aug 27 15:31:37 2018
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon Aug 27 15:31:37 2018
SMON: enabling cache recovery
Mon Aug 27 15:31:37 2018
Errors in file /home/oracle/oracle/product/10.2.0/db_1/admin/orcl/udump/orcl_ora_5827.trc:
ORA-07445: exception encountered: core dump [lmebucp()+24] [SIGSEGV]
[Address not mapped to object] [0x000000000] [] []

跟踪启动10046 trace

WAIT #1: nam='instance state change' ela= 822 layer=2 value=1 waited=1 obj#=-1 tim=1499370211971345
WAIT #1: nam='db file sequential read' ela= 29 file#=1 block#=257 blocks=1 obj#=-1 tim=1499370211971896
=====================
PARSING IN CURSOR #2 len=188 dep=1 uid=0 oct=1 lid=0 tim=1499370211972625 hv=2809067040 ad='b5fe2d00'
create table bootstrap$ ( line#         number not null,   obj#
number not null,   sql_text   varchar2(4000) not null)
storage (initial 50K objno 41 extents (file 1 block 257))
END OF STMT
PARSE #2:c=0,e=598,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1499370211972621
BINDS #2:
EXEC #2:c=1000,e=195,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1499370211972873
=====================
PARSING IN CURSOR #2 len=55 dep=1 uid=0 oct=3 lid=0 tim=1499370211973429 hv=2111436465 ad='b7bd0530'
select line#, sql_text from bootstrap$ where obj# != :1
END OF STMT
PARSE #2:c=0,e=472,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1499370211973426
BINDS #2:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=2b8c5d50a4d0  bln=22  avl=02  flg=05
  value=41
EXEC #2:c=1000,e=838,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1499370211974375
WAIT #2: nam='db file sequential read' ela= 27 file#=1 block#=257 blocks=1 obj#=-1 tim=1499370211974522
WAIT #2: nam='db file sequential read' ela= 21 file#=1 block#=258 blocks=1 obj#=-1 tim=1499370211974855
FETCH #2:c=1000,e=479,p=2,cr=3,cu=0,mis=0,r=0,dep=1,og=4,tim=1499370211974908
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object),
 addr: 0x0, PC: [0x348772c, lmebucp()+24]
*** 2018-08-27 15:31:37.074
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [lmebucp()+24] [SIGSEGV]
[Address not mapped to object] [0x000000000] [] []
Current SQL statement for this session:
alter database open
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
Cannot find symbol
Cannot find symbol
Cannot find symbol
ksedst()+31          call     ksedst1()            000000001 ? 000000001 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000001 ?
ksedmp()+610         call     ksedst()             000000001 ? 000000001 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000001 ?
ssexhd()+630         call     ksedmp()             000000003 ? 000000001 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000001 ?
<0x336800eca0>       call     ssexhd()             00000000B ? 2B8C5D238D70 ?
                                                   2B8C5D238C40 ? 000000000 ?
                                                   000000000 ? 000000001 ?
--------------------- Binary Stack Dump ---------------------

通过这里发现,数据库启动执行select line#, sql_text from bootstrap$ where obj# != :1然后报ORA-07445 lmebucp错误。这样的错误比较诡异,一般可能是由于bootstrap异常导致,但是这里再往上跟踪发现 create bootstrap$表指定的记录为file 1 block 257,根据经验知道数据库的bootstrap$表记录一般是377 或者520比较常见.通过工具对于file 1进行分析

DUL> dump datafile 1 block 257
Block Header:
block type=0x10 (data segment header block (unlimited extents))
block format=0xa2 (oracle 10)
block rdba=0x00400101 (file#=1, block#=257)
scn=0x0000.0000007e, seq=1, tail=0x007e1001
block checksum value=0xe75c=59228, flag=4
Data Segment Header:
  Extent Control Header
  -------------------------------------------------------------
  Extent Header:: extents: 1  blocks: 7
                  last map: 0x00000000  #maps: 0  offset: 4128
      Highwater:: 0x00400103  (rfile#=1,block#=259)
                  ext#: 0  blk#: 1   ext size:7
      #blocks in seg. hdr's freelists: 0
      #blocks below: 1
      mapblk: 0x00000000   offset: 0
      Map Header:: next: 0x00000000   #extents: 1  obj#: 41  flag: 0x40000000
  Extent Control Header
  -------------------------------------------------------------
   0x00400102  length: 7
  nfl = 1, nfb = 1, typ = 2, nxf = 0, ccnt = 0
  SEG LST:: flg:UNUSED lhd: 0x00000000 ltl: 0x00000000

发现异常比较明显,block 257为data_object_id=41,也就是
41|41|CREATE UNIQUE INDEX I_FILE1 ON FILE$(FILE#) PCTFREE 10 INITRANS 2 MAXTRANS 255 STORAGE ( INITIAL 64K NEXT 1024K MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 OBJNO 41 EXTENTS (FILE 1 BLOCK 257))
这里看数据库的引导异常或者bootstrap$表中记录异常.通过修复bootstrap相关内容,数据库完美启动

又一例asm格式化文件系统恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:又一例asm格式化文件系统恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

又一个客户把win rac中的asm disk给格式化为ntfs了(data磁盘组由三个500G的磁盘组成,被格式化掉前面两个还剩下一个),而且格式化之后,还进行了一系列恢复(比如修复磁盘头,又进行分区等一些磁盘操作),导致恢复难度增加,也增加了一些数据覆盖
asm alert日志报错

Thu Aug 23 11:20:14 2018
NOTE: ASM client orcl1:orcl disconnected unexpectedly.
NOTE: check client alert log.
NOTE: Process state recorded in trace file d:\app\administrator\diag\asm\+asm\+asm1\trace\+asm1_ora_2260.trc
Thu Aug 23 11:20:28 2018
Errors in file d:\app\administrator\diag\asm\+asm\+asm1\trace\+asm1_lgwr_3820.trc:
ORA-27070: async read/write failed
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 87) 参数错误。
WARNING: IO Failed. group:2 disk(number.incarnation):1.0xf0f0a1cb disk_path:\\.\ORCLDISKDATA1
	 AU:26 disk_offset(bytes):27566080 io_size:4096 operation:Write type:synchronous
	 result:I/O error process_id:3820
NOTE: unable to write any mirror side for diskgroup DATA
NOTE: cache initiating offline of disk 1 group DATA
NOTE: process 3268:3820 initiating offline of disk 1.4042301899 (DATA_0001) with mask 0x7e in group 2
WARNING: Disk DATA_0001 in mode 0x7f is now being taken offline
NOTE: initiating PST update: grp = 2, dsk = 1/0xf0f0a1cb, mode = 0x15
kfdp_updateDsk(): 22
Thu Aug 23 11:20:28 2018
kfdp_updateDskBg(): 22
ERROR: too many offline disks in PST (grp 2)
WARNING: Disk DATA_0001 in mode 0x7f offline aborted

数据库alert日志报错

WARNING: IO Failed. group:2 disk(number.incarnation):1.0xf0f0a1cb disk_path:\\.\ORCLDISKDATA1
	 AU:422 disk_offset(bytes):442515456 io_size:16384 operation:Read type:synchronous
	 result:I/O error process_id:11992
WARNING: failed to read mirror side 1 of virtual extent 5 logical extent 0 of file 260 in
group [2.1859146063] from disk DATA_0001  allocation unit 422 reason error; if possible,will try another mirror side
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_ora_11992.trc:
ORA-15080: 与磁盘的同步 I/O 操作失败
WARNING: failed to write mirror side 1 of virtual extent 5 logical extent 0 of file 260
in group 2 on disk 1 allocation unit 422
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_ora_11992.trc:
ORA-00202: 控制文件: ''+DATA/orcl/controlfile/current.260.944422981''
ORA-15081: 无法将 I/O 操作提交到磁盘
Thu Aug 23 11:20:13 2018
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-27070: 异步读取/写入失败
WARNING: IO Failed. group:2 disk(number.incarnation):1.0xf0f0a1cb disk_path:\\.\ORCLDISKDATA1
	 AU:841 disk_offset(bytes):882532352 io_size:131072 operation:Write type:asynchronous
	 result:I/O error process_id:3224
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-15080: 与磁盘的同步 I/O 操作失败
WARNING: failed to write mirror side 1 of virtual extent 240 logical extent 0 of file 259 in group 2 on disk 1
allocation unit 841 KCF: read, write or open error, block=0x7853 online=1
        file=4 '+DATA/orcl/datafile/users.259.944422883'
        error=15081 txt: ''
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 87) 参数错误。
WARNING: IO Failed. group:2 disk(number.incarnation):1.0xf0f0a1cb disk_path:\\.\ORCLDISKDATA1
	 AU:422 disk_offset(bytes):442515456 io_size:16384 operation:Read type:synchronous
	 result:I/O error process_id:3224
WARNING: failed to read mirror side 1 of virtual extent 5 logical extent 0 of file 260 in group [2.1859146063] from
disk DATA_0001  allocation unit 422 reason error; if possible,will try another mirror side
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-15080: 与磁盘的同步 I/O 操作失败
WARNING: failed to write mirror side 1 of virtual extent 5 logical extent 0 of file 260 in group 2 on disk 1
allocation unit 422
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-00202: 控制文件: ''+DATA/orcl/controlfile/current.260.944422981''
ORA-15081: 无法将 I/O 操作提交到磁盘
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-00204: 读取控制文件时出错 (块 41, # 块 1)
ORA-00202: 控制文件: ''+DATA/orcl/controlfile/current.260.944422981''
ORA-15081: 无法将 I/O 操作提交到磁盘
DBW1 (ospid: 3224): terminating the instance due to error 204

由于客户进行了一系列恢复恢复操作导致查看磁盘都不全

D:\>asmtool -list
NTFS                             \Device\Harddisk0\Partition1              100M
NTFS                             \Device\Harddisk0\Partition2           102298M
NTFS                             \Device\Harddisk1\Partition1           102397M
NTFS                             \Device\Harddisk2\Partition1           204797M
---这里还有一个磁盘没有正常显示
ORCLDISKDATA10                   \Device\Harddisk4\Partition1           511997M--客户尝试修复的磁盘
ORCLDISKDATA2                    \Device\Harddisk5\Partition1           511997M
ORCLDISKRECOVERY0                \Device\Harddisk6\Partition1            51197M
ORCLDISKRECOVERY1                \Device\Harddisk7\Partition1            51197M
ORCLDISKRECOVERY2                \Device\Harddisk8\Partition1            51197M
ORCLDISKCRS0                     \Device\Harddisk9\Partition1            10237M
ORCLDISKCRS1                     \Device\Harddisk10\Partition1           10237M
ORCLDISKCRS2                     \Device\Harddisk11\Partition1           10237M
NTFS                             \Device\Harddisk12\Partition2         4194174M

通过主机层面激活卷,删除分区等一系列操作,然后通过kfed构造磁盘头,让这些磁盘在os层面可以正常显示

C:\Users\Administrator>asmtool -list
NTFS                             \Device\Harddisk0\Partition1              100M
NTFS                             \Device\Harddisk0\Partition2           102298M
NTFS                             \Device\Harddisk1\Partition1           102397M
NTFS                             \Device\Harddisk2\Partition1           204797M
------需要处理的磁盘------
ORCLDISKDATA0                    \Device\Harddisk3\Partition1           511997M
ORCLDISKDATA1                    \Device\Harddisk4\Partition1           511997M
ORCLDISKDATA2                    \Device\Harddisk5\Partition1           511997M
-----------------------
ORCLDISKRECOVERY0                \Device\Harddisk6\Partition1            51197M
ORCLDISKRECOVERY1                \Device\Harddisk7\Partition1            51197M
ORCLDISKRECOVERY2                \Device\Harddisk8\Partition1            51197M
ORCLDISKCRS0                     \Device\Harddisk9\Partition1            10237M
ORCLDISKCRS1                     \Device\Harddisk10\Partition1           10237M
ORCLDISKCRS2                     \Device\Harddisk11\Partition1           10237M
NTFS                             \Device\Harddisk12\Partition2         4194174M

由于asm磁盘组内部目录au被彻底损坏,导致无法通过asm直接拷贝出来数据,通过底层扫描,按照au恢复出来相关数据,由于格式化ntfs和后续的误操作导致部分数据au被覆盖.其余数据均恢复,抢救了绝大部分数据.
数据文件恢复参考:asm disk header 彻底损坏恢复
另外有一次win平台类似恢复经历:asm disk格式化为ntfs恢复
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

Automatic datafile offline due to write error on

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:Automatic datafile offline due to write error on

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

由于存储突然掉线导致数据文件无法访问,导致部分数据文件被自动offline

Thu May 17 14:49:03 2018
KCF: read, write or open error, block=0xe93b8 online=1
Thu May 17 14:49:03 2018
KCF: read, write or open error, block=0x24eb65 online=1
        file=25 'F:\ORACLE\ORADATA\ORCL\QYSCZH12.ORA'
        file=28 'F:\ORACLE\ORADATA\ORCL\QYSCZH15.ORA'
        error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 2) 系统找不到指定的文件。'
Automatic datafile offline due to write error on
file 25: F:\ORACLE\ORADATA\ORCL\QYSCZH12.ORA
Thu May 17 14:49:03 2018
KCF: read, write or open error, block=0x22b0a1 online=1
        file=28 'F:\ORACLE\ORADATA\ORCL\QYSCZH15.ORA'
        error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 2) 系统找不到指定的文件。'
Automatic datafile offline due to write error on
file 28: F:\ORACLE\ORADATA\ORCL\QYSCZH15.ORA
Thu May 17 14:49:03 2018
KCF: read, write or open error, block=0x138def online=1
        file=11 'F:\ORACLE\ORADATA\ORCL\QYSCZH4'
        error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 2) 系统找不到指定的文件。'
        file=30 'F:\ORACLE\ORADATA\ORCL\QYSCZH17.ORA'
        file=11 'F:\ORACLE\ORADATA\ORCL\QYSCZH4'
        error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
        error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 2) 系统找不到指定的文件。'
O/S-Error: (OS 2) 系统找不到指定的文件。'
……
        file=15 'F:\ORACLE\ORADATA\ORCL\QYSCZH6.ORA'
        error=27072 txt: 'OSD-04008: WriteFile() 失败, 无法写入文件
O/S-Error: (OS 21) 设备未就绪。'
Automatic datafile offline due to write error on
file 15: F:\ORACLE\ORADATA\ORCL\QYSCZH6.ORA
KCF: read, write or open error, block=0xade96 online=1
        file=9 'F:\ORACLE\ORADATA\ORCL\QYSCZH2'
        error=27072 txt: 'OSD-04008: WriteFile() 失败, 无法写入文件
O/S-Error: (OS 21) 设备未就绪。'
Automatic datafile offline due to write error on
file 9: F:\ORACLE\ORADATA\ORCL\QYSCZH2
Thu May 17 14:49:28 2018
KCF: read, write or open error, block=0x378c66 online=1
        file=15 'F:\ORACLE\ORADATA\ORCL\QYSCZH6.ORA'
        error=27072 txt: 'OSD-04008: WriteFile() 失败, 无法写入文件
O/S-Error: (OS 21) 设备未就绪。'
Automatic datafile offline due to write error on
file 15: F:\ORACLE\ORADATA\ORCL\QYSCZH6.ORA
KCF: read, write or open error, block=0x35f6de online=1
……

存储掉线是悲剧的起点,按理说数据库是归档模式,存储恢复之后,继续recover datafile,然后online应该问题不大,但是由于客户没有及时处理这个问题(也许业务实时性要求不高,可能挂几个小时也没人知道),导致第二个悲剧发生,删除归档的定时任务把数据库的归档日志给删除了.导致后面存储挂载上来之后,数据文件也无法正常online成功

Tue May 22 16:28:13 2018
ALTER DATABASE RECOVER  datafile 'F:\ORACLE\ORADATA\ORCL\QYSCZH'
Media Recovery Start
Serial Media Recovery started
ORA-279 signalled during: ALTER DATABASE RECOVER  datafile 'F:\ORACLE\ORADATA\ORCL\QYSCZH'  ...
Tue May 22 16:28:42 2018
ALTER DATABASE RECOVER    CONTINUE DEFAULT
Media Recovery Log D:\ORALCE\ADMINISTRATOR\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2018_05_22\O1_MF_1_267346_%U_.ARC
Errors with log D:\ORALCE\ADMINISTRATOR\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2018_05_22\O1_MF_1_267346_%U_.ARC
ORA-308 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
ALTER DATABASE RECOVER    CONTINUE DEFAULT
Media Recovery Log D:\ORALCE\ADMINISTRATOR\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2018_05_22\O1_MF_1_267346_%U_.ARC
Errors with log D:\ORALCE\ADMINISTRATOR\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2018_05_22\O1_MF_1_267346_%U_.ARC
ORA-308 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...

通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)脚本检测发现结果如下:
recover


遭遇这种情况,常规方法无法恢复,考虑使用bbed或者其他方法强制online文件,由于存储突然掉线,这样恢复的库可能后续还有大量工作需要处理,最常见的可能有表和index不一致,表的segment header信息和extent实际信息不匹配等

truncate IDL_UB1$恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:truncate IDL_UB1$恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

世界之大无奇不有,已经记不清这是第几个客户咨询IDL_UB1$ 被truncate之后导致数据库无法启动的case了.idl_ub1$表是用来存储PL/SQL的代码单元的,包括DIANA等,IDL在这里代表Interface Definition Language. 在数据库的启动过程中通过10046跟踪可以知道,有类似:select /*+ index(idl_ub1$ i_idl_ub11) +*/ piece#,length,piece from idl_ub1$ where obj#=:1 and part=:2 and version=:3 order by piece#的查询语句,由于该表被truncate之后,导致数据库启动无法绕过该sql,而hang住无法完全open成功.下午闲着没事通过模拟,可以对该故障实现正常open,并且导出数据
模拟业务数据

create user xff identified by oracle;
grant dba to xff;
conn xff/oracle
create table t_xifenfei as select * from dba_objects;
create index i_xifenfei on t_xifenfei(object_id);
create view v_xifenfei as select * from t_xifenfei;
create or replace procedure proc1(
para1 varchar2,
para2 out varchar2,
para3 in out varchar2
) as
v_name varchar2(20);
begin
 v_name :='xifenfei';
 para3 := v_name;
dbms_output.put_line('para3:'||para3);
end;
/
alter system checkpoint;

创建xff账户,并且创建有代表性的表,索引,存储过程,视图等.

模拟truncate IDL_UB1$表

SQL> conn / as sysdba
Connected.
SQL> truncate table IDL_UB1$;
truncate table IDL_UB1$
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 31325
Session ID: 177 Serial number: 7

重启数据库

SQL> conn / as sysdba
Connected.
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>
SQL>
SQL>
SQL>
SQL> startup
ORACLE instance started.
Total System Global Area 7499329536 bytes
Fixed Size                  2267832 bytes
Variable Size            1409287496 bytes
Database Buffers         6073352192 bytes
Redo Buffers               14422016 bytes
Database mounted.

数据库在mount之后,一直处于hang住状态,查看alert日志

Sun May 20 17:02:34 2018
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x18] [PC:0x98D94A7, hshuid()+273] [flags: 0x0, count: 1]
Errors in file /home/u01/app/oracle/diag/rdbms/test/test/trace/test_ora_31325.trc  (incident=21781):
ORA-07445: exception encountered: core dump [hshuid()+273] [SIGSEGV] [ADDR:0x18] [PC:0x98D94A7] [Address not mapped to object] []
Incident details in: /home/u01/app/oracle/diag/rdbms/test/test/incident/incdir_21781/test_ora_31325_i21781.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sun May 20 17:02:37 2018
Sweep [inc][21781]: completed
Sweep [inc2][21781]: completed
Sun May 20 17:02:37 2018
Dumping diagnostic data in directory=[cdmp_20180520170237], requested by (instance=1, osid=31325), summary=[incident=21781].
Sun May 20 17:02:55 2018
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x18] [PC:0x98D94A7, hshuid()+273] [flags: 0x0, count: 1]
Errors in file /home/u01/app/oracle/diag/rdbms/test/test/trace/test_m000_31373.trc  (incident=21821):
ORA-07445: exception encountered: core dump [hshuid()+273] [SIGSEGV] [ADDR:0x18] [PC:0x98D94A7] [Address not mapped to object] []
Incident details in: /home/u01/app/oracle/diag/rdbms/test/test/incident/incdir_21821/test_m000_31373_i21821.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sun May 20 17:02:56 2018
Dumping diagnostic data in directory=[cdmp_20180520170256], requested by (instance=1, osid=31373 (M000)), summary=[incident=21821].

这类问题比较明显,正常方法无法打开,通过工具分析system文件,发现虽然truncate IDL_UB1$操作报错了,但是IDL_UB1$和对应的index I_IDL_UB11 obj#,dataobj#均已经改变,而且相关对象的segment header也变化为新dataobj#(truncate之后的),也就是说truncate在数据库中的主要更改操作已经完成.现在在缺少记录的情况下,数据库执行如下sql无法获取记录,从而无法open

PARSING IN CURSOR #140342551421712 len=132 dep=2 uid=0 oct=3 lid=0 tim=1526843464635335 hv=4260389146 ad='21af73218' sqlid='cvn54b7yz0s8u'
select /*+ index(idl_ub1$ i_idl_ub11) +*/ piece#,length,piece from idl_ub1$ where obj#=:1 and part=:2 and version=:3 order by piece#
END OF STMT
PARSE #140342551421712:c=0,e=14,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,plh=3246118364,tim=1526843464635335
BINDS #140342551421712:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fa40bec7d80  bln=22  avl=03  flg=05
  value=1310
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fa40bec7d50  bln=24  avl=01  flg=05
  value=0
 Bind#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fa40bec7d20  bln=24  avl=06  flg=05
  value=184549376
EXEC #140342551421712:c=0,e=76,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,plh=3246118364,tim=1526843464635449
FETCH #140342551421712:c=0,e=4,p=0,cr=1,cu=0,mis=0,r=0,dep=2,og=4,plh=3246118364,tim=1526843464635462
STAT #140342551421712 id=1 cnt=0 pid=0 pos=1 obj=225 op='TABLE ACCESS BY INDEX ROWID IDL_UB1$ (cr=1 pr=0 pw=0 time=4 us cost=3 size=44 card=2)'
STAT #140342551421712 id=2 cnt=0 pid=1 pos=1 obj=236 op='INDEX RANGE SCAN I_IDL_UB11 (cr=1 pr=0 pw=0 time=4 us cost=2 size=0 card=2)'
CLOSE #140342551421712:c=0,e=2,dep=2,type=0,tim=1526843464635496

通过对数据库采用技术欺骗手段,让数据库启动相关sql能够获取到记录(和正常查询的相同),从而实现数据库正常open

SQL> startup mount
ORACLE instance started.
Total System Global Area 7499329536 bytes
Fixed Size                  2267832 bytes
Variable Size            1409287496 bytes
Database Buffers         6073352192 bytes
Redo Buffers               14422016 bytes
Database mounted.
SQL> alter database open;
Database altered.

open成功之后,后台报大量的ORA-08103: object no longer exists,通过分析是由于truncate IDL_UB1$没有完全成功,导致出现该错误.解决方法就是对Oracle数据字典进行人工更新,把没有完成的truncate操作在数据库中给予完成.
导出数据
exp导出数据成功
exp-data


但是expdp无法执行

[oracle@bogon oradata]$ expdp '"/ as sysdba"' schemas=xff file=1.dmp
Export: Release 11.2.0.4.0 - Production on Sun May 20 17:10:53 2018
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
UDE-31623: operation generated ORACLE error 31623
ORA-31623: a job is not attached to this session via the specified handle
ORA-06512: at "SYS.DBMS_DATAPUMP", line 3326
ORA-06512: at "SYS.DBMS_DATAPUMP", line 4551
ORA-06512: at line 1

暂时未去研究对这个表进行重建,使用exp导出然后再imp导入是比较理想的办法

恶意删除bootstrap$导致数据库无法正常启动

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:恶意删除bootstrap$导致数据库无法正常启动

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户10.2.0.5的数据库关闭之后,无法正常启动报ORA-00704 ORA-00702错误.

Fri May 18 22:42:26  2018
ALTER DATABASE OPEN
Fri May 18 22:42:27  2018
Beginning crash recovery of 1 threads
 parallel recovery started with 7 processes
Fri May 18 22:42:27  2018
Started redo scan
Fri May 18 22:42:27  2018
Completed redo scan
 1 redo blocks read, 0 data blocks need recovery
Fri May 18 22:42:27  2018
Started redo application at
 Thread 1: logseq 2, block 2, scn 8448162573
Fri May 18 22:42:27  2018
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: D:\DATABASE\xifenfei\REDO02.LOG
Fri May 18 22:42:27  2018
Completed redo application
Fri May 18 22:42:27  2018
Completed crash recovery at
 Thread 1: logseq 2, block 3, scn 8448182575
 0 data blocks read, 0 data blocks written, 1 redo blocks read
Fri May 18 22:42:28  2018
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=23, OS id=3188
ARC1 started with pid=24, OS id=3168
ARC2 started with pid=25, OS id=996
ARC3 started with pid=26, OS id=432
ARC4 started with pid=27, OS id=3728
Fri May 18 22:42:28  2018
ARC0: Archival started
ARC1: Archival started
ARC5 started with pid=28, OS id=2876
Fri May 18 22:42:28  2018
ARC2: Archival started
ARC3: Archival started
ARC4: Archival started
ARC5: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
Fri May 18 22:42:28  2018
Thread 1 advanced to log sequence 3 (thread open)
Thread 1 opened at log sequence 3
  Current log# 3 seq# 3 mem# 0: D:\DATABASE\xifenfei\REDO03.LOG
Successful open of redo thread 1
Fri May 18 22:42:28  2018
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri May 18 22:42:28  2018
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Fri May 18 22:42:28  2018
ARC2: Becoming the heartbeat ARCH
Fri May 18 22:42:28  2018
SMON: enabling cache recovery
Fri May 18 22:42:28  2018
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\udump\xifenfei_ora_3148.trc:
ORA-00704: 引导程序进程失败
ORA-00702: 引导程序版本 '' 与版本 '8.0.0.0.0' 不一致
Fri May 18 22:42:28  2018
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 3148
ORA-1092 signalled during: ALTER DATABASE OPEN...

根据以前恢复经验ORA-00702: bootstrap verison ” inconsistent with version ’8.0.0.0.0′,很可能是由于bootstrap$表异常了.
通过dbv检查system文件确认没有坏块
dbv-system


通过bbed分析,确认记录被删除
把数据文件拷贝到本地,通过bbed进行分析,确认记录丢失

BBED> map
 File: d:/system01.dbf (0)
 Block: 379                                   Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Table/Cluster)
 struct kcbh, 20 bytes                      @0
 struct ktbbh, 48 bytes                     @20
 struct kdbh, 14 bytes                      @68
 struct kdbt[1], 4 bytes                    @82
 sb2 kdbr[24]                               @86
 ub1 freespace[1158]                        @134
 ub1 rowdata[6896]                          @1292
 ub4 tailchk                                @8188
BBED> p *kdbr[0]
rowdata[6875]
-------------
ub1 rowdata[6875]                           @8167     0x3c
BBED> x /rnnc
rowdata[6875]                               @8167
-------------
flag@8167: 0x3c (KDRHFL, KDRHFF, KDRHFD, KDRHFH)
lock@8168: 0x01
cols@8169:    0

故障原因跟踪
有人在数据库中注入了恶意脚本,导致数据库删除了bootstrap$中数据,关闭之后无法正常启动
delete-bootstrap$


处理方法
通过oracle bbed 修复数据字典,正常启动数据库

Oracle 数据文件大小为0kb或者文件丢失恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:Oracle 数据文件大小为0kb或者文件丢失恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到一个朋友恢复请求,由于rose频繁切换导致文件系统部分数据文件变化为0kb和文件丢失.
故障现象
部分数据文件变化为0kb和文件丢失.
file_lost
file_size_0


这里比较明显,数据库的users03变为了0kb和users04丢失.数据库alert日志报错信息如下:

Completed: alter database mount exclusive
alter database open
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_dbw0_12008.trc:
ORA-01157: ????/?????? 7 - ??? DBWR ????
ORA-01110: ???? 7: 'E:\APP\ADMINISTRATOR\ORADATA\ORCL\USERS03.DBF'
ORA-27047: ??????????
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 38) 已到文件结尾。
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_dbw0_12008.trc:
ORA-01157: ????/?????? 8 - ??? DBWR ????
ORA-01110: ???? 8: 'E:\APP\ADMINISTRATOR\ORADATA\ORCL\USERS04.DBF'
ORA-27041: ??????
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_12040.trc:
ORA-01157: ????/?????? 7 - ??? DBWR ????
ORA-01110: ???? 7: 'E:\APP\ADMINISTRATOR\ORADATA\ORCL\USERS03.DBF'
ORA-1157 signalled during: alter database open...
Fri May 04 09:35:10 2018
Checker run found 2 new persistent data failures

alert日志的报错也比较明显,users03是文件超过了大小(大小为0kb,读取之后肯定超过大小),users04提示无法打开文件(文件在文件系统层面已经丢失).现在问题比较明显由于文件系统故障导致文件大小为0和丢失

碎片扫描恢复
常规的方法肯定无法恢复,比较好的方法只能是底层碎片扫描重组,结合多种扫描工具,最后发现一个做底层恢复的朋友的工具效果不错,扫描结果如下
file_scan


通过工具分析坏块情况

C:\Users\Administrator>dbv FiLe=D:\0504\ORCL_TS.4_FILE.7_10.ora
DBVERIFY: Release 11.2.0.4.0 - Production on 星期六 5月 5 08:52:53 2018
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
DBVERIFY - 开始验证: FILE = D:\0504\ORCL_TS.4_FILE.7_10.ora
………………
页 382565 标记为损坏
Corrupt block relative dba: 0x01c5d665 (file 7, block 382565)
Completely zero block found during dbv:
页 382566 标记为损坏
Corrupt block relative dba: 0x01c5d666 (file 7, block 382566)
Completely zero block found during dbv:
页 382567 标记为损坏
Corrupt block relative dba: 0x01c5d667 (file 7, block 382567)
Completely zero block found during dbv:
DBVERIFY - 验证完成
检查的页总数: 1374720
处理的页总数 (数据): 27582
失败的页总数 (数据): 0
处理的页总数 (索引): 20114
失败的页总数 (索引): 0
处理的页总数 (其他): 1319752
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 1
标记为损坏的总页数: 7271
流入的页总数: 0
加密的总页数        : 0
最高块 SCN            : 228271996 (0.228271996)
C:\Users\Administrator>dbv FiLe=D:\0504\ORCL_TS.4_FILE.8_8.ora
DBVERIFY: Release 11.2.0.4.0 - Production on 星期六 5月 5 08:52:53 2018
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
DBVERIFY - 开始验证: FILE = D:\0504\ORCL_TS.4_FILE.8_8.ora
DBVERIFY - 验证完成
检查的页总数: 1136896
处理的页总数 (数据): 36639
失败的页总数 (数据): 0
处理的页总数 (索引): 57038
失败的页总数 (索引): 0
处理的页总数 (其他): 1043218
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 1
标记为损坏的总页数: 0
流入的页总数: 0
加密的总页数        : 0
最高块 SCN            : 228271997 (0.228271997)
C:\Users\Administrator>

scan_resulte


这里通过分析恢复的两个文件总的block数量2511618,其中连续损坏7271个block损坏,由于出现问题之后,数据库被offline这两个文件继续启动运行了几个小时,导致少量block被覆盖,恢复软件直接置空.后续的恢复比较顺利,正常open数据库,然后处理坏块对象(正好不是业务核心表的lob字段,所有部分丢失影响不是非常大).

温馨提醒:
1. 数据文件和备份不要放在同一个阵列上,更不能是同一个分区(卷)上
2. 出现此类问题之后,应当理解停止对该分区的任何写操作,方式丢失或者大小为0KB的文件被覆盖.
如果需要专业ORACLE数据库恢复技术支持,请联系我们
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

asm磁盘组操作不当导致数据文件丢失恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:asm磁盘组操作不当导致数据文件丢失恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最近遇到数据库恢复case,客户是要更换存储,在数据库mount状态把使用omf方式存储数据的asm 磁盘组通过rman copy到新的通过别名方式存储的新的asm 磁盘组的存储中,但是由于操作人员粗心,copy语句中部分目标磁盘组的数据文件别名重复了,最后执行rename file之后,导致部分数据文件彻底丢失.我们通过底层碎片扫描(参考:asm disk header 彻底损坏恢复)对于该用户的数据实现完全恢复.
因为整个过程重现比较麻烦,这里测试从一个data磁盘组中有一个omf方式存储的含有两个数据文件的表空间,通过rman copy 把这个表空间的两个文件拷贝到datanew磁盘组中,但是由于粗心把两个数据文件的别名写成一样,结果导致该表空间的一个数据文件彻底丢失的测试.

创建测试表空间
在datanew磁盘组中创建omf方式管理的xifenfei表空间,含有两个数据文件,file#分别为14和15

SQL> create tablespace xifenfei datafile '+DATA' SIZE 128m;
Tablespace created.
SQL> ALTER TABLESPACE XIFENFEI ADD DATAFILE '+DATA' SIZE 128m AUTOEXTEND ON;
Tablespace altered.
SQL> SELECT FILE_NAME,FILE_ID FROM  DBA_DATA_FILES WHERE TABLESPACE_NAME='XIFENFEI';
FILE_NAME
--------------------------------------------------------------------------------
   FILE_ID
----------
+DATA/XFF/DATAFILE/xifenfei.276.961143809
        14
+DATA/XFF/DATAFILE/xifenfei.277.961143825
        15

rman copy datafile 14
通过rman copy把datafile 14拷贝到data磁盘组中,目标端为别名方式存储

RMAN> copy datafile 14 to '+datanew/xifenfei.dbf';
Starting backup at 27-NOV-17
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=24 device type=DISK
channel ORA_DISK_1: starting datafile copy
input datafile file number=00014 name=+DATA/XFF/DATAFILE/xifenfei.276.961143809
output file name=+DATANEW/xifenfei.dbf tag=TAG20171127T082643 RECID=4 STAMP=961144006
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:07
Finished backup at 27-NOV-17
[grid@localhost ~]$ asmcmd
ASMCMD> cd datanew
ASMCMD> ls
XFF/
xifenfei.dbf
ASMCMD> ls -l
Type      Redund  Striped  Time             Sys  Name
                                            Y    XFF/
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  N    xifenfei.dbf => +DATANEW/XFF/DATAFILE/XIFENFEI.256.961144003
ASMCMD>

这里通过asmcmd的ls命令,可以看到虽然我们存储的为datanew磁盘组的别名文件,实际上是link到asm的omf方式的文件(本质上asm中的文件都是omf方式存储,只是在使用的时候体现asm的客户端程序方式不一样,是直接asm中的omf方式,还是asm中的别名).

rman copy datafile 15
通过rman copy把datafile 15 拷贝到和datafile 14别名一样的文件了

RMAN> copy datafile 15 to '+datanew/xifenfei.dbf';
Starting backup at 27-NOV-17
using channel ORA_DISK_1
channel ORA_DISK_1: starting datafile copy
input datafile file number=00015 name=+DATA/XFF/DATAFILE/xifenfei.277.961143825
output file name=+DATANEW/xifenfei.dbf tag=TAG20171127T082731 RECID=5 STAMP=961144053
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:03
Finished backup at 27-NOV-17
ASMCMD> ls -l
Type      Redund  Striped  Time             Sys  Name
                                            Y    XFF/
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  N    xifenfei.dbf => +DATANEW/XFF/DATAFILE/XIFENFEI.256.961144003
ASMCMD> cd xff
ASMCMD> ls
DATAFILE/
ASMCMD> cd datafile
ASMCMD> ls
XIFENFEI.256.961144003
ASMCMD>

这里可以看出来,在data磁盘组中,file 14被file 15覆盖掉了

rename file
把data磁盘组中的数据文件rename 到datanew磁盘组中

SQL> alter database rename file '+DATA/XFF/DATAFILE/xifenfei.276.961143809' to '+datanew/xifenfei.dbf';
Database altered.
SQL> alter database rename file '+DATA/XFF/DATAFILE/xifenfei.277.961143825' to '+datanew/xifenfei.dbf';
alter database rename file '+DATA/XFF/DATAFILE/xifenfei.277.961143825' to '+datanew/xifenfei.dbf'
*
ERROR at line 1:
ORA-01511: error in renaming log/data files
ORA-01523: cannot rename data file to '+data/xifenfei.dbf' - file already part of database

这里我们可以看到,file 14 rename 成功,但是file 15 rename失败,因为在数据库中,已经有了别名的文件(数据文件的路径)

omf自动删除文件
查看原磁盘组datanew中,发现datafile 14被自动删除

ASMCMD> pwd
+DATA/XFF/DATAFILE
ASMCMD> ls -l
Type      Redund  Striped  Time             Sys  Name
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  Y    SYSAUX.257.942061433
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  Y    SYSTEM.256.942061393
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  Y    UNDOTBS1.258.942061449
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  Y    USERS.259.942061449
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  Y    XIFENFEI.277.961143825
ASMCMD>

alert日志证实数据文件被删除

2017-11-27T09:05:03.054741-05:00
alter database rename file '+DATA/XFF/DATAFILE/xifenfei.276.961143809' to '+datanew/xifenfei.dbf'
2017-11-27T09:05:03.114947-05:00
NOTE: Under CF enqueue, no dependency request for disk group DATANEW
Deleted Oracle managed file +DATA/XFF/DATAFILE/xifenfei.276.961143809
Completed: alter database rename file '+DATA/XFF/DATAFILE/xifenfei.276.961143809' to '+datanew/xifenfei.dbf'
2017-11-27T09:05:21.471474-05:00
alter database rename file '+DATA/XFF/DATAFILE/xifenfei.277.961143825' to '+data/xifenfei.dbf'
ORA-1511 signalled during:alter database rename file
      '+DATA/XFF/DATAFILE/xifenfei.277.961143825' to'+datanew/xifenfei.dbf'

这里可以证实,数据文件的omf方式管理,在数据文件执行rename file的时候,会自动删除掉老的数据文件.这里悲剧已经发生,由于rman copy 覆盖了datanew磁盘组中的datafile 14,rename file又导致data磁盘组中的datafile 14被自动删除,从而使得datafile 14这个数据文件在两个磁盘组中都丢失.从常规角度来说,如果没有合适的备份该文件无法恢复.如果遭遇到oracle asm中数据文件丢失或者部分覆盖,请保护现场,联系我们(ORACLE数据库恢复技术支持),将为您提供专业数据库技术支持:Phone:17813235971    Q Q:107644445    E-Mail:dba@xifenfei.com最大限度抢救您的数据

mount: wrong fs type, bad option, bad superblock恢复Oracle

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:mount: wrong fs type, bad option, bad superblock恢复Oracle

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友找到我们,说对lv进行收缩操作之后,导致文件系统无法mount,提示超级块损坏.
尝试mount失败

[root@GZGSDB data]# mount /dev/vg_gzgsdb/lv_home /home
mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg_gzgsdb-lv_home,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

系统日志报错

Aug 16 21:23:24 GZGSDB kernel: EXT4-fs (dm-5): ext4_check_descriptors: Block bitmap for group 1 not in group (block 0)!
Aug 16 21:23:24 GZGSDB kernel: EXT4-fs (dm-5): group descriptors corrupted!

主要操作
cz


这里可以看出来,明显操作错误,在没有收缩文件系统之前,直接收缩的lv.
list_history


这里可以看出来,在发生故障之后,又做了很多操作,包括fsck和testdisk等,最终无法恢复,请求我们协助处理。

找出来备份超级块

[root@GZGSDB dul]# dumpe2fs /dev/vg_gzgsdb/lv_home |grep superblock
dumpe2fs 1.41.12 (17-May-2010)
ext2fs_read_bb_inode: A block group is missing an inode table
  Primary superblock at 0, Group descriptors at 1-52
  Backup superblock at 32768, Group descriptors at 32769-32820
  Backup superblock at 98304, Group descriptors at 98305-98356
  Backup superblock at 163840, Group descriptors at 163841-163892
  Backup superblock at 229376, Group descriptors at 229377-229428
  Backup superblock at 294912, Group descriptors at 294913-294964
  Backup superblock at 819200, Group descriptors at 819201-819252
  Backup superblock at 884736, Group descriptors at 884737-884788
  Backup superblock at 1605632, Group descriptors at 1605633-1605684
  Backup superblock at 2654208, Group descriptors at 2654209-2654260
  Backup superblock at 4096000, Group descriptors at 4096001-4096052
  Backup superblock at 7962624, Group descriptors at 7962625-7962676
  Backup superblock at 11239424, Group descriptors at 11239425-11239476
  Backup superblock at 20480000, Group descriptors at 20480001-20480052
  Backup superblock at 23887872, Group descriptors at 23887873-23887924
  Backup superblock at 71663616, Group descriptors at 71663617-71663668
  Backup superblock at 78675968, Group descriptors at 78675969-78676020
  Backup superblock at 102400000, Group descriptors at 102400001-102400052
  Backup superblock at 214990848, Group descriptors at 214990849-214990900

使用fsck修复

[root@GZGSDB data]# fsck -y /dev/vg_gzgsdb/lv_home
fsck from util-linux-ng 2.17.2
e2fsck 1.41.12 (17-May-2010)
fsck.ext4: Group descriptors look bad... trying backup blocks...
fsck.ext4: The ext2 superblock is corrupt when using the backup blocks
fsck.ext4: going back to original superblock
fsck.ext4: Device or resource busy while trying to open /dev/mapper/vg_gzgsdb-lv_home
Filesystem mounted or opened exclusively by another program?
--指定超级块恢复
[root@GZGSDB data]# fsck -y -b 102400000 /dev/vg_gzgsdb/lv_home
…………
Illegal block #0 (1296647770) in inode 354315.  CLEARED.
Illegal block #1 (1398362886) in inode 354315.  CLEARED.
Illegal block #3 (453538936) in inode 354315.  CLEARED.
Illegal block #5 (808333361) in inode 354315.  CLEARED.
Illegal block #6 (775434798) in inode 354315.  CLEARED.
Illegal block #8 (1180306180) in inode 354315.  CLEARED.
Illegal block #9 (1413893971) in inode 354315.  CLEARED.
Illegal block #10 (1229347423) in inode 354315.  CLEARED.
Illegal block #11 (1498613325) in inode 354315.  CLEARED.
Illegal indirect block (1296389203) in inode 354315.  CLEARED.
Inode 354315 is too big.  Truncate? yes
Block #1074301965 (69632) causes directory to be too big.  CLEARED.
Warning... fsck.ext4 for device /dev/mapper/vg_gzgsdb-lv_home exited with signal 11.
[root@GZGSDB data]# mount /dev/vg_gzgsdb/lv_home /home
mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg_gzgsdb-lv_home,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

至此基本上可以判断,直接修复该文件系统,让其正常mount的概率很小.直接通过lun层面来恢复.

通过工具解析lun
disk_size


比较明显,客户经过一系列操作现在的现象是lv 1.03T,文件系统只有821G,明显和客户给我反馈的操作不符(应该是lv 821G),证明客户做了大量操作,已经导致文件系统损坏
inode_recovery


由于文件系统严重异常,工具获取到的文件都是没有名称,直接从inode里面读取的数据.获取到这些数据之后,然后结合oracle的特性,判断出来对应的文件号关系(这里有大量文件重复);另外有个别文件通过inode恢复丢失,通过底层碎片重组进行恢复出来文件,然后恢复出来其中数据(asm disk header 彻底损坏恢复).这个客户比较幸运,system文件在另外一个分区中,不然工作量会大很多.
再次提醒:对lv操作一定要谨慎,特别是lvreduce操作,另外出现发生误操作之后,应该第一时间保护现场,而不是百度着去乱操作,可能导致故障更加悲剧

警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

继续上篇的tab$被清空(ORA-600 16703故障解析—tab$表被清空),导致数据库启动异常的case
ORA-600 16703报错
ora-600-16703


数据库日志分析
数据库open成功同时报ORA-7445 kokeicbegin和ORA-600 kzrini:!uprofile错误
ora-600-kzrini-uprofile


再次启动数据库直接报ORA-600 16703错误
ora-600-16703


这里有一个其他现象,就是数据库在open成功的同时(同一秒内),开始报异常.重启之后直接报
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
根据10046分析结果

=====================
select rowcnt,blkcnt,empcnt,avgspc,chncnt,avgrln,nvl(degree,1), nvl(instances,1) from tab$ where obj# = :1
END OF STMT
PARSE #140048443935120:c=0,e=390,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905161433
=====================
select blevel, leafcnt, distkey, lblkkey, dblkkey, clufac,        nvl(degree,1), nvl(instances,1) from ind$ where bo# = :1 and obj# = :2
END OF STMT
PARSE #140048443934176:c=1000,e=601,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162088
=====================
PARSING IN CURSOR #140048443933232 len=70 dep=1 uid=0 oct=3 lid=0 tim=1499185905162444 hv=3377894161 ad='84f13d70' sqlid='32d4jrb4pd4sj'
select charsetid, charsetform from col$  where obj# = :1 and col# = :2
END OF STMT
PARSE #140048443933232:c=0,e=294,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162443
=====================
PARSING IN CURSOR #140048443932288 len=52 dep=1 uid=0 oct=3 lid=0 tim=1499185905247020 hv=429618617 ad='84f0f2d0' sqlid='4krwuz0ctqxdt'
select ctime, mtime, stime from obj$ where obj# = :1
END OF STMT
PARSE #140048443932288:c=0,e=549,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905247019
BINDS #140048443932288:
select blevel, leafcnt, distkey, lblkkey, dblkkey, clufac,        nvl(degree,1), nvl(instances,1) from ind$ where bo# = :1 and obj# = :2
END OF STMT
PARSE #140048443934176:c=1000,e=601,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162088
=====================
PARSING IN CURSOR #140048443933232 len=70 dep=1 uid=0 oct=3 lid=0 tim=1499185905162444 hv=3377894161 ad='84f13d70' sqlid='32d4jrb4pd4sj'
select charsetid, charsetform from col$  where obj# = :1 and col# = :2
END OF STMT
PARSE #140048443933232:c=0,e=294,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162443
=====================
PARSING IN CURSOR #140048443932288 len=52 dep=1 uid=0 oct=3 lid=0 tim=1499185905247020 hv=429618617 ad='84f0f2d0' sqlid='4krwuz0ctqxdt'
select ctime, mtime, stime from obj$ where obj# = :1
END OF STMT
PARSE #140048443932288:c=0,e=549,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905247019
BINDS #140048443932288:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f5f91b87bd0  bln=22  avl=02  flg=05
  value=20
EXEC #140048443932288:c=2000,e=2686,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=1218588913,tim=1499185905249810
WAIT #140048443932288: nam='db file sequential read' ela= 6205 file#=1 block#=337 blocks=1 obj#=36 tim=1499185905256132
WAIT #140048443932288: nam='db file sequential read' ela= 3739 file#=1 block#=338 blocks=1 obj#=36 tim=1499185905266704
WAIT #140048443932288: nam='db file sequential read' ela= 4966 file#=1 block#=241 blocks=1 obj#=18 tim=1499185905271761
FETCH #140048443932288:c=0,e=21976,p=3,cr=3,cu=0,mis=0,r=1,dep=1,og=4,plh=1218588913,tim=1499185905271820
STAT #140048443932288 id=1 cnt=1 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ$ (cr=3 pr=3 pw=0 time=21993 us)'
STAT #140048443932288 id=2 cnt=1 pid=1 pos=1 obj=36 op='INDEX RANGE SCAN I_OBJ1 (cr=2 pr=2 pw=0 time=16923 us)'
CLOSE #140048443932288:c=0,e=63,dep=1,type=0,tim=1499185905271941
BINDS #140048443935120:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f5f91c07de8  bln=22  avl=02  flg=05
  value=20
EXEC #140048443935120:c=1000,e=795,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=2970138452,tim=1499185905272802
WAIT #140048443935120: nam='db file sequential read' ela= 3197 file#=1 block#=169 blocks=1 obj#=3 tim=1499185905276069
WAIT #140048443935120: nam='db file sequential read' ela= 3459 file#=1 block#=170 blocks=1 obj#=3 tim=1499185905404084
WAIT #140048443935120: nam='db file sequential read' ela= 6358 file#=1 block#=145 blocks=1 obj#=4 tim=1499185905410548
FETCH #140048443935120:c=999,e=137805,p=3,cr=3,cu=0,mis=0,r=0,dep=1,og=4,plh=2970138452,tim=1499185905410635
STAT #140048443935120 id=1 cnt=0 pid=0 pos=1 obj=4 op='TABLE ACCESS CLUSTER TAB$ (cr=3 pr=3 pw=0 time=137810 us)'
STAT #140048443935120 id=2 cnt=1 pid=1 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=2 pw=0 time=131330 us)'
*** 2017-07-05 00:31:46.094
Incident 176395 created, dump file: /oracle/diag/rdbms/orcl/orcl2/incident/incdir_176395/orcl_ora_51261_i176395.trc
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []

以及以往恢复经验和mos,基本上可以确定是由于tab$和obj$记录不匹配导致该问题.而且是obj#=20的记录在tab$和obj$中不匹配.

分析tab$和obj$记录

Data UnLoader: 11.2.0.1.5 - Internal Only - on Wed Jul 05 01:28:53 2017
with 64-bit io functions and the decompression option
 
Copyright (c) 1994 2017 Bernard van Duijnen All rights reserved.
 
 Strictly Oracle Internal Use Only
 
 
Found db_id = 1334610369
Found db_name = orcl
DUL> unload table TAB$( OBJ# number, DATAOBJ# number,
  2      TS# number, FILE# number, BLOCK# number,
  3      BOBJ# number, TAB# number, COLS number, CLUCOLS number,
  4      PCTFREE$ ignore, PCTUSED$ ignore, INITRANS ignore, MAXTRANS ignore,
  5      FLAGS ignore, AUDIT$ ignore, ROWCNT ignore, BLKCNT ignore,
  6      EMPCNT ignore, AVGSPC ignore, CHNCNT ignore, AVGRLN ignore,
  7      AVGSPC_FLB ignore, FLBCNT ignore,
  8      ANALYZETIME ignore, SAMPLESIZE ignore,
  9      DEGREE ignore, INSTANCES ignore,
 10      INTCOLS ignore, KERNELCOLS number, PROPERTY number)
 11      cluster  C_OBJ#(OBJ#)
 12      storage ( tablespace 0 segobjno 2 tabno 1 file 1 block 144);
. unloading table                      TAB$       0 rows unloaded
DUL> unload table OBJ$( OBJ# number, DATAOBJ# number, OWNER# number,
  2      NAME clean varchar2(30), NAMESPACE ignore, SUBNAME clean varchar2(30),
  3      TYPE# number, CTIME ignore, MTIME ignore, STIME ignore,
  4      STATUS ignore, REMOTEOWNER ignore, LINKNAME ignore,
  5      FLAGS ignore, OID$ hexraw)
  6      storage ( tablespace 0 segobjno 18 file 1 block 240);
. unloading table                      OBJ$   89804 rows unloaded
DUL>

这里可以明确表示tab$无记录,obj$有记录,从而启动的过程出现ORA-600 16703错误可以很好解释.由于数据库启动成功和报错几乎同时进行,以及上面看到的tab$记录不存在的情况,初步怀疑是有startup触发器清空tab$表记录
工具分析触发器表trigger$
startup-trigger


这里果然看到一个after startup on database的触发器,名字为DBMS_SUPPORT_DBMONITOR,而它调用的是DBMS_SUPPORT_DBMONITORP存储.

工具分析pl/sql表source$
DBMS_SUPPORT_DBMONITOR


对wraped加密的程序进行解密
DBMS_SUPPORT_DBMONITOR-unwraped


这里可以明确的看到DBMS_SUPPORT_DBMONITORP存储过程备份tab$表到orachk中然后delete tab$表,现在已经明确是由于DBMS_SUPPORT_DBMONITOR触发器在数据库重启之后开始执行调用DBMS_SUPPORT_DBMONITORP程序,如果判断数据库创建时间大于等于300天,便干掉tab$表,实现数据库破坏.

查找DBMS_SUPPORT_DBMONITOR等程序源头
分析相关程序创建时间,通过obj$表的ctime和name来判断
DBMS_SUPPORT_DBMONITOR-ctime
bootstrap-ctime


这里可以看出来DBMS_SUPPORT_DBMONITOR和DBMS_SUPPORT_DBMONITORP的创建时间基本上和数据库核心对象的创建时间相差无几,我们可以大概排除掉,是由于pl sql等工具连接数据库导致该发问题(类似:plsql dev引起的数据库被黑勒索比特币实现原理分析和解决方案),很可能是在dbca创建库的过程中就已经带有了DBMS_SUPPORT_DBMONITOR等程序,如果这样那很可能是由于数据库的安装介质被破坏导致该问题.

分析DBMS_SUPPORT_DBMONITOR来源
prvtsupp
20170711001626


这里已经很清晰,由于prvtsupp.plb被人注入了恶意脚本从而使得数据库被创建了DBMS_SUPPORT_DBMONITOR的触发器和DBMS_SUPPORT_DBMONITORP的存储过程,从而实现了对数据库的而易破坏.

确定破坏文件prvtsupp.plb来源于介质
jar


这里比较明显,文件就是来源database\stage\Components\oracle.rdbms.dbscripts\11.2.0.4.0\1\DataFiles\filegroup2.jar\rdbms\admin\prvtsupp.plb文件被修改导致
md5


通过md5码对比,可以确定是有人对软件的安装介质进行了破坏,从而实现了恶意代码的注入,从而实现了数据库300天之后重启之后无法正常启动而是出现类似ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []的错误.

温馨提示
各位一定要从官方途径下载oracle安装介质,如果是从其他互联网途径下载一定要验证md5,确保文件没有被人恶意篡改,造成无可挽回的损坏.如果真的不幸遇到这类问题,请保护现场联系我们
Tel:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com
我们可以通过使用bbed对tab$表数据数据进行恢复实现数据库正常启动,实现数据0丢失,最大限度抢救您的数据和减少业务恢复时间