联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
通过分析文件发现该加密主要是对头尾部分block按照每16byte中8个byte置空和部分加密
通过进行恢复,对损坏部分进行跳过,剩余数据直接导入数据库,通过show=y测试数据可以正常入库,实现了dmp文件表数据的完美恢复
联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
通过分析文件发现该加密主要是对头尾部分block按照每16byte中8个byte置空和部分加密
通过进行恢复,对损坏部分进行跳过,剩余数据直接导入数据库,通过show=y测试数据可以正常入库,实现了dmp文件表数据的完美恢复
联系:手机/微信(+86 17813235971) QQ(107644445)
标题:再一起asm disk被格式化成ext3文件系统故障恢复
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
国庆节前夕接到朋友求救电话asm disk被格式化成ext3格式了,具体操作如下
由于/被挂载新格式化的控盘,导致asm磁盘组访问其他盘报错
Sun Sep 29 18:15:02 2019 Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_b000_8094.trc: ORA-15025: could not open disk "/dev/asmdisk/sdh" ORA-27041: unable to open file Linux-x86_64 Error: 2: No such file or directory Additional information: 3 Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_b000_8094.trc: ORA-15025: could not open disk "/dev/asmdisk/sdh" ORA-27041: unable to open file Linux-x86_64 Error: 2: No such file or directory Additional information: 3 WARNING: cache failed reading from group=1(DATA) fn=9 blk=0 count=1 from disk= 5 (DATA_0005) kfkist=0x20 status=0x02 osderr=0x0 file=kfc.c line=11596 Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_b000_8094.trc: ORA-15025: could not open disk "/dev/asmdisk/sdh" ORA-27041: unable to open file Linux-x86_64 Error: 2: No such file or directory Additional information: 3 ORA-15080: synchronous I/O operation to a disk failed WARNING: cache succeeded reading from group=1(DATA) fn=9 blk=0 count=1 from disk= 7 (DATA_0007) kfkist=0x20 status=0x01 osderr=0x0 file=kfc.c line=11637 Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_b000_8094.trc: ORA-15025: could not open disk "/dev/asmdisk/sdh" ORA-27041: unable to open file Linux-x86_64 Error: 2: No such file or directory Additional information: 3 WARNING: PST-initiated drop of 1 disk(s) in group 1(.2380027701))
重启系统之后,重试mount 磁盘组
GMON dismounting group 1 at 18 for pid 31, osid 44279 NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment NOTE: Disk DATA_0002 in mode 0x7f marked for de-assignment NOTE: Disk DATA_0003 in mode 0x7f marked for de-assignment NOTE: Disk DATA_0004 in mode 0x1 marked for de-assignment NOTE: Disk DATA_0005 in mode 0x7f marked for de-assignment NOTE: Disk DATA_0006 in mode 0x7f marked for de-assignment NOTE: Disk DATA_0007 in mode 0x7f marked for de-assignment NOTE: Disk in mode 0x7f marked for de-assignment NOTE: Disk DATA_0009 in mode 0x7f marked for de-assignment NOTE: Disk DATA_0010 in mode 0x7f marked for de-assignment NOTE: Disk DATA_0011 in mode 0x7f marked for de-assignment NOTE: Disk DATA_0012 in mode 0x7f marked for de-assignment NOTE: Disk DATA_0013 in mode 0x7f marked for de-assignment NOTE: Disk DATA_0014 in mode 0x7f marked for de-assignment ERROR: diskgroup DATA was not mounted ORA-15032: not all alterations performed ORA-15040: diskgroup is incomplete ORA-15042: ASM disk "8" is missing from group number "1" ERROR: ALTER DISKGROUP DATA MOUNT /* asm agent *//* {1:2587:2} */
由于sdb(asm disk 8)被格式化,导致data磁盘组无法正常mount.这个客户运气比较好data 磁盘组是normal模式,但是由于mount到/,导致disk 4被强制drop,因此无法mount成功,但是通过一系列处理数据实现完美恢复,0数据丢失
如果磁盘组是外部冗余,请参考:
又一例asm格式化文件系统恢复
一次完美的asm disk被格式化ntfs恢复
oracle asm disk格式化恢复—格式化为ext4文件系统
oracle asm disk格式化恢复—格式化为ntfs文件系统
联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
有客户和我们反馈,他们数据库现在使用异常,查询dba_data_files报ORA-600 ktfbhget-4
*** 2019-09-21 13:56:42.944 ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [ktfbhget-4], [384], [16], [], [], [], [], [] Current SQL statement for this session: select tablespace_name from dba_data_files ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- ksedmp+0148 bl ksedst 1028F23AC ? ksfdmp+0018 bl 01FD7F4C kgerinv+00e8 bl _ptrgl kgesinv+0020 bl kgerinv 000000000 ? 000000000 ? FFFFFFFFFFF8920 ? 000400002 ? 000000000 ? ksesin+005c bl kgesinv 7000001637868E8 ? 110359FD8 ? 000000100 ? 000007FFF ? 000007FFF ? ktfbhget+047c bl ksesin 1029B1D18 ? 200000002 ? 000000000 ? 000000180 ? 000000000 ? 000000010 ? 000000008 ? 000000000 ? ktfbhcf+03c0 bl ktfbhget FFFFFFFFFFF8DA0 ? 1103589D8 ? 201035A57C ? 1C610005A40 ? qerfxFetch+0c94 bl 01FD7AC4 qerjoFetch+037c bl _ptrgl rwsfcd+0060 bl _ptrgl qersoFetch+0108 bl _ptrgl qerjoFetch+037c bl _ptrgl qerjoFetch+037c bl _ptrgl rwsfcd+0060 bl _ptrgl qeruaFetch+0140 bl 01FD7AC4 qervwFetch+008c bl 01FD7AC4 opifch2+0c68 bl 01FD7AC4 opiall0+1244 bl opifch2 7000001626B2F28 ? 100000001 ? FFFFFFFFFFFA3D8 ? kpoal8+0a68 bl opiall0 5E1000C818 ? 2200000014 ? FFFFFFFFFFFAAD8 ? 000000000 ? FFFFFFFFFFFAA28 ? 11028C2C8 ? 080000000 ? 4000000007FFF ? opiodr+08e8 bl _ptrgl ttcpip+0c54 bl _ptrgl opitsk+0c28 bl ttcpip 11000C818 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? opiino+0798 bl opitsk 000000000 ? 000000000 ? opiodr+08e8 bl _ptrgl opidrv+032c bl opiodr 3C00000018 ? 4101F5E80 ? FFFFFFFFFFFF4C0 ? 0A000FD98 ? sou2o+0028 bl opidrv 3C0C000000 ? 400000000 ? FFFFFFFFFFFF4C0 ? main+0138 bl 01FD758C __start+0098 bl main 000000000 ? 000000000 ?
通过询问知道由于file#=17最初由于裸设备权限异常,导致无法访问,执行了delete from file$ where file#=17,然后遭遇异常又人工插入了一条file#=17的记录到file$中,但是由于不知道具体值,可能是由于部分值插入错误引起现在问题.
通过分析file$表定义
create table file$ /* file table */ ( file# number not null, /* file identifier number */ status$ number not null, /* status (see KTS.H): */ /* 1 = INVALID, 2 = AVAILABLE */ blocks number not null, /* size of file in blocks */ /* zero for bitmapped tablespaces */ ts# number, /* tablespace that owns file */ relfile# number, /* relative file number */ maxextend number, /* maximum file size */ inc number, /* increment amount */ crscnwrp number, /* creation SCN wrap */ crscnbas number, /* creation SCN base */ ownerinstance varchar("M_IDEN"), /* Owner instance name */ spare1 number, /* tablespace-relative DBA of space file header */ /* NULL for dictionary-mapped tablespaces */ spare2 number, spare3 varchar2(1000), spare4 date )
通过dump文件头(文件创建大小/SCN等),结合一些计算和经验值,获取到相关值重新插入正确值,一切恢复正常
delete from file$ where file#=17 and ts#=384; insert into sys.file$ (FILE#, STATUS$, BLOCKS, TS#, RELFILE#, MAXEXTEND, INC, CRSCNWRP, CRSCNBAS, OWNERINSTANCE, SPARE1, SPARE2, SPARE3, SPARE4) values (17,2,256000,384,17,0,0,3087,1631091037,null,71303170, null, null, null); commit;
再次提醒,file$中记录请勿人工修改,以前写过相关casefile$ 删除记录恢复(delete file$ recovery)
联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
有一个客户恢复请求,6个节点11.2.0.3 RAC,非归档模式,数据量近200T
由于存储掉电导致数据库6个节点全部宕机,恢复硬件之后,数据库无法正常启动,报错如下:
SQL> recover database; ORA-00279: change 318472018583 generated at 05/04/2019 17:58:05 needed for thread 4 ORA-00289: suggestion : /u01/app/oracle/product/11.2.0/db_1/dbs/arch4_322810_870181839.dbf ORA-00280: change 318472018583 for thread 4 is in sequence #322810 Wed Aug 28 11:19:55 2019 ALTER DATABASE RECOVER DATABAE Media Recovery Start Serial Media Recovery started Recovery of Online Redo Log: Thread 1 Group 14 Seq 552 Reading mem 0 Mem# 0: +REDO/xff/log2.ora Recovery of Online Redo Log: Thread 2 Group 15 Seq 126 Reading mem 0 Mem# 0: +REDO/xff/log3.ora Recovery of Online Redo Log: Thread 3 Group 18 Seq 122 Reading mem 0 Mem# 0: +REDO/xff/log6.ora ORA-279 signalled during: ALTER DATABASE RECOVER database ... Wed Aug 28 11:21:31 2019 ALTER DATABASE RECOVER CANCEL Media Recovery Canceled Completed: ALTER DATABASE RECOVER CANCEL
数据库恢复需要thread 4 sequence #322810,查询redo信息
redo已经被覆盖,数据库无法通过正常途径恢复实现数据库open,尝试屏蔽一致性强制拉库操作后
Wed Aug 28 12:40:15 2019 SMON: enabling tx recovery Database Characterset is ZHS16GBK Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_smon_51338.trc (incident=244209): ORA-00600: internal error code, arguments: [4137], [44.47.613406], [0], [0], [], [], [], [], [], [], [], [] Incident details in: /u01/app/oracle/diag/rdbms/xff/xff1/incident/incdir_244209/xff1_smon_51338_i244209.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. No Resource Manager plan active replication_dependency_tracking turned off (no async multimaster replication found) Wed Aug 28 12:40:16 2019 ORACLE Instance xff1 (pid = 26) - Error 600 encountered while recovering transaction (44, 47). Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_smon_51338.trc: ORA-00600: internal error code, arguments: [4137], [44.47.613406], [0], [0], [], [], [], [], [], [], [], [] Wed Aug 28 12:40:20 2019 Exception[type: SIGSEGV,Address not mapped to object][ADDR:0x5122000000C8][PC:0xE1B4D3,ktugru()+87][flags:0x0,count:1] Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_p086_54066.trc (incident=245017): ORA-07445:exception encountered:core dump [ktugru()+87][SIGSEGV][ADDR:0x5122000000C8][Address not mapped to object] Incident details in: /u01/app/oracle/diag/rdbms/xff/xff1/incident/incdir_245017/xff1_p086_54066_i245017.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Wed Aug 28 12:40:20 2019 Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_p000_53873.trc (incident=244305): ORA-00600: internal error code, arguments: [4198], [], [], [], [], [], [], [], [], [], [], [] Incident details in: /u01/app/oracle/diag/rdbms/xff/xff1/incident/incdir_244305/xff1_p000_53873_i244305.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details.
提示undo异常,屏蔽回滚段之后,数据库正常打开没有任何报错信息
Wed Aug 28 12:57:15 2019 SMON: enabling cache recovery Instance recovery: looking for dead threads Instance recovery: lock domain invalid but no dead threads [57676] Successfully onlined Undo Tablespace 22. Undo initialization finished serial:0 start:2386111306 end:2386112316 diff:1010 (10 seconds) Verifying file header compatibility for 11g tablespace encryption.. Verifying 11g file header compatibility for tablespace encryption completed SMON: enabling tx recovery Database Characterset is ZHS16GBK Wed Aug 28 12:57:17 2019 minact-scn: Inst 1 is now the master inc#:2 mmon proc-id:57624 status:0x7 minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000 No Resource Manager plan active Starting background process GTX0 Wed Aug 28 12:57:18 2019 GTX0 started with pid=45, OS id=57777 Starting background process RCBG Wed Aug 28 12:57:18 2019 RCBG started with pid=46, OS id=57779 replication_dependency_tracking turned off (no async multimaster replication found) Starting background process QMNC Wed Aug 28 12:57:19 2019 QMNC started with pid=47, OS id=57788 Completed: ALTER DATABASE OPEN
后续涉及创建新undo,删除老undo并处理一些类似,基本上恢复正常
联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
又一起win rac的asm disk被格式化为ntfs,导致数据库异常恢复的请求,客户描述有一个500G的data磁盘组(只有一个磁盘,被误操作进行了格式化).格式化asm disk之后
asm的alert日志报错
Tue Jul 09 17:09:45 2019 NOTE: ASM client orcl1:ORCL disconnected unexpectedly. NOTE: check client alert log. NOTE: Trace records dumped in trace file D:\APP\ADMINISTRATOR\diag\asm\+asm\+asm1\trace\+asm1_ora_5376.trc Tue Jul 09 17:10:19 2019 Errors in file D:\APP\ADMINISTRATOR\diag\asm\+asm\+asm1\trace\+asm1_lgwr_1448.trc: ORA-27070: async read/write failed OSD-04008: WriteFile() 失败, 无法写入文件 O/S-Error: (OS 21) 设备未就绪。 WARNING: Write Failed. group:1 disk:0 AU:15 offset:876544 size:4096 NOTE: unable to write any mirror side for diskgroup DATA NOTE: cache initiating offline of disk 0 group DATA NOTE: process _lgwr_+asm1 (3764:1448) initiating offline of disk 0.4042281525 (DATA_0000) with mask 0x7e in group 1 Tue Jul 09 17:10:19 2019 WARNING: Disk 0 (DATA_0000) in group 1 mode 0x7f is now being offlined WARNING: Disk 0 (DATA_0000) in group 1 in mode 0x7f is now being taken offline on ASM inst 1 NOTE: initiating PST update: grp = 1, dsk = 0/0xf0f05235, mask = 0x6a, op = clear GMON updating disk modes for group 1 at 10 for pid 15, osid 1448 ERROR: Disk 0 cannot be offlined, since diskgroup has external redundancy. ERROR: too many offline disks in PST (grp 1) WARNING: Disk 0 (DATA_0000) in group 1 mode 0x7f offline is being aborted WARNING: Offline of disk 0 (DATA_0000) in group 1 and mode 0x7f failed on ASM inst 1 Tue Jul 09 17:10:20 2019 NOTE: halting all I/Os to diskgroup 1 (DATA) NOTE: unable to offline disks after getting write error for diskgroup DATA Tue Jul 09 17:10:20 2019 NOTE: cache dismounting (not clean) group 1/0xBDB0A2C0 (DATA) NOTE: disk 0 had IO error NOTE: messaging CKPT to quiesce pins Windows thread id: 520528, image: ORACLE.EXE (B000) Tue Jul 09 17:10:20 2019 NOTE: Deferred communication with ASM instance Errors in file D:\APP\ADMINISTRATOR\diag\asm\+asm\+asm1\trace\+asm1_ora_6140.trc: ORA-15130: diskgroup "DATA" is being dismounted NOTE: deferred map free for map id 4 NOTE: LGWR doing non-clean dismount of group 1 (DATA) NOTE: LGWR sync ABA=38.3028 last written ABA 38.3029
数据库的alert日志报错
Tue Jul 09 17:09:12 2019 Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_lgwr_5396.trc: ORA-27072: 文件 I/O 错误 WARNING: IO Failed. group:1 disk(number.incarnation):0.0xf0f05235 disk_path:\\.\ORCLDISKDATA0 AU:1305 disk_offset(bytes):1368456704 io_size:512 operation:Write type:asynchronous result:I/O error process_id:5396 Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_lgwr_5396.trc: ORA-15080: 与磁盘的同步 I/O 操作失败 WARNING: failed to write mirror side 1 of virtual extent 23 logical extent 0 of file 261 in group 1 on disk 0 allocation unit 1305 Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_lgwr_5396.trc: ORA-00345: 重做日志写入块 47231 计数 1 出错 ORA-00312: 联机日志 1 线程 1: '+DATA/orcl/onlinelog/group_1.261.909498607' ORA-15081: 无法将 I/O 操作提交到磁盘 ORA-15081: 无法将 I/O 操作提交到磁盘 Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_lgwr_5396.trc: ORA-27070: 异步读取/写入失败 OSD-04006: ReadFile() 失败, 无法读取文件 O/S-Error: (OS 21) 设备未就绪。
这个客户比较幸运,asm disk被格式化之后,没有进行任何写操作,理解对现场进行了保护,没有任何的二次破坏.因为处理过多起类似故障(oracle asm disk格式化恢复—格式化为ntfs文件系统,oracle asm disk格式化恢复—格式化为ext4文件系统又一例asm格式化文件系统恢复),有一定的经验,一般三种方案恢复数据文件:
1)运气好直接通过kfed进行修复asm disk,然后mount起来,然后把数据文件拷贝到文件系统中
2)通过相关工具把asm disk中相关的数据文件拷贝到文件系统中
3)如果损坏的严重,通过底层碎片,把相关数据文件恢复到文件系统中
拷贝完成数据文件之后,然后根据文件的情况有几种可能性:
1)直接open数据库,处理可能的其他坏块
2)通过一些方法强制拉库,然后对其进行导出导入新库
3)通过工具直接恢复表数据,甚至恢复部分核心数据
这次的恢复运气不错,对asm disk进行一系列修复之后,asm 磁盘组mount成功
C:\Users\Administrator>asmtool -list NTFS \Device\Harddisk0\Partition1 80000M NTFS \Device\Harddisk0\Partition2 491133M ORCLDISKOCR0 \Device\Harddisk1\Partition1 34132M ORCLDISKOCR1 \Device\Harddisk1\Partition2 34132M ORCLDISKOCR2 \Device\Harddisk1\Partition3 34133M ORCLDISKDATA0 \Device\Harddisk2\Partition1 511997M ORCLDISKFRA0 \Device\Harddisk3\Partition2 1225000M NTFS \Device\Harddisk3\Partition3 1225800M \Device\Harddisk3\Partition4 1314861M C:\Users\Administrator>sqlplus / as sysasm SQL*Plus: Release 11.2.0.1.0 Production on 星期六 7月 13 11:20:02 2019 Copyright (c) 1982, 2010, Oracle. All rights reserved. 连接到: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options SQL> alter diskgroup data mount; Diskgroup altered. SQL> exit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64 bit Production With the Real Application Clusters and Automatic Storage Management options
mount数据库拷贝文件
C:\Users\Administrator>sqlplus / as sysdba SQL*Plus: Release 11.2.0.1.0 Production on 星期六 7月 13 16:02:08 2019 Copyright (c) 1982, 2010, Oracle. All rights reserved. 已连接到空闲例程。 SQL> startup mount; ORACLE 例程已经启动。 Total System Global Area 3.4206E+10 bytes Fixed Size 2192864 bytes Variable Size 7516195360 bytes Database Buffers 2.6642E+10 bytes Redo Buffers 45727744 bytes 数据库已装载 RMAN> backup as copy database format 'I:\rmanback\df_%U.dbf'; 启动 backup 于 13-7月 -19 使用通道 ORA_DISK_1 使用通道 ORA_DISK_2 使用通道 ORA_DISK_3 使用通道 ORA_DISK_4 通道 ORA_DISK_1: 启动数据文件副本 输入数据文件: 文件号=00006 名称=+DATA/orcl/datafile/XFF5.dmp 通道 ORA_DISK_2: 启动数据文件副本 输入数据文件: 文件号=00016 名称=+DATA/orcl/datafile/XFF53 通道 ORA_DISK_3: 启动数据文件副本 输入数据文件: 文件号=00017 名称=+DATA/orcl/datafile/XFF54 通道 ORA_DISK_4: 启动数据文件副本 输入数据文件: 文件号=00018 名称=+DATA/orcl/datafile/XFF55 输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-16_0VU6I1FO.DBF 标记=TAG20190713T120927 RECID=2 STAMP=1013517954 通道 ORA_DISK_2: 数据文件复制完毕, 经过时间: 00:36:29 通道 ORA_DISK_2: 启动数据文件副本 输入数据文件: 文件号=00019 名称=+DATA/orcl/datafile/XFF5.281.988042407 输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-17_10U6I1FO.DBF 标记=TAG20190713T120927 RECID=1 STAMP=1013517951 通道 ORA_DISK_3: 数据文件复制完毕, 经过时间: 00:36:29 通道 ORA_DISK_3: 启动数据文件副本 输入数据文件: 文件号=00020 名称=+DATA/orcl/datafile/XFF5.282.988042943 输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-18_11U6I1FO.DBF 标记=TAG20190713T120927 RECID=3 STAMP=1013517962 通道 ORA_DISK_4: 数据文件复制完毕, 经过时间: 00:36:37 通道 ORA_DISK_4: 启动数据文件副本 输入数据文件: 文件号=00021 名称=+DATA/orcl/datafile/XFF5.283.988043279 输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-6_0UU6I1FO.DBF 标记=TAG20190713T120927 RECID=4 STAMP=1013518052 通道 ORA_DISK_1: 数据文件复制完毕, 经过时间: 00:38:12 通道 ORA_DISK_1: 启动数据文件副本 输入数据文件: 文件号=00022 名称=+DATA/orcl/datafile/XFF5.284.988043721 输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-20_13U6I3K5.DBF 标记=TAG20190713T120927 RECID=5 STAMP=1013519584 通道 ORA_DISK_3: 数据文件复制完毕, 经过时间: 00:27:10 通道 ORA_DISK_3: 启动数据文件副本 输入数据文件: 文件号=00009 名称=+DATA/orcl/datafile/XFF51.dmp 输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-21_14U6I3KD.DBF 标记=TAG20190713T120927 RECID=6 STAMP=1013519586 通道 ORA_DISK_4: 数据文件复制完毕, 经过时间: 00:27:03 通道 ORA_DISK_4: 启动数据文件副本 输入数据文件: 文件号=00010 名称=+DATA/orcl/datafile/XFF52.dmp 输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-19_12U6I3K5.DBF 标记=TAG20190713T120927 RECID=7 STAMP=1013519599 通道 ORA_DISK_2: 数据文件复制完毕, 经过时间: 00:27:26 通道 ORA_DISK_2: 启动数据文件副本 输入数据文件: 文件号=00003 名称=+DATA/orcl/datafile/undotbs1.258.909498475 输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-22_15U6I3NC.DBF 标记=TAG20190713T120927 RECID=8 STAMP=1013519695 通道 ORA_DISK_1: 数据文件复制完毕, 经过时间: 00:27:19 通道 ORA_DISK_1: 启动数据文件副本 输入数据文件: 文件号=00005 名称=+DATA/orcl/datafile/undotbs2.264.909498739 输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-UNDOTBS1_FNO-3_18U6I57K.DB F 标记=TAG20190713T120927 RECID=9 STAMP=1013520920 通道 ORA_DISK_2: 数据文件复制完毕, 经过时间: 00:22:02 通道 ORA_DISK_2: 启动数据文件副本 输入数据文件: 文件号=00012 名称=+DATA/orcl/datafile/XFF5.275.954036129 输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-UNDOTBS2_FNO-5_19U6I5AJ.DB F 标记=TAG20190713T120927 RECID=10 STAMP=1013521015 通道 ORA_DISK_1: 数据文件复制完毕, 经过时间: 00:22:03 通道 ORA_DISK_1: 启动数据文件副本 输入数据文件: 文件号=00013 名称=+DATA/orcl/datafile/XFF5.276.954036233 输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-10_17U6I574.DBF 标记=TAG20190713T120927 RECID=11 STAMP=1013521231 通道 ORA_DISK_4: 数据文件复制完毕, 经过时间: 00:27:30 通道 ORA_DISK_4: 启动数据文件副本 输入数据文件: 文件号=00002 名称=+DATA/orcl/datafile/sysaux.257.909498475 输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-9_16U6I574.DBF 标记=TAG20190713T120927 RECID=12 STAMP=1013521244 通道 ORA_DISK_3: 数据文件复制完毕, 经过时间: 00:27:37 通道 ORA_DISK_3: 启动数据文件副本 输入数据文件: 文件号=00023 名称=+DATA/orcl/datafile/XFF60.dmp 输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-12_1AU6I6GU.DBF 标记=TAG20190713T120927 RECID=13 STAMP=1013522267 通道 ORA_DISK_2: 数据文件复制完毕, 经过时间: 00:22:25 通道 ORA_DISK_2: 启动数据文件副本 输入数据文件: 文件号=00024 名称=+DATA/orcl/datafile/XFF61.dmp 输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-13_1BU6I6JU.DBF 标记=TAG20190713T120927 RECID=15 STAMP=1013522367 通道 ORA_DISK_1: 数据文件复制完毕, 经过时间: 00:22:26 通道 ORA_DISK_1: 启动数据文件副本 输入数据文件: 文件号=00011 名称=+DATA/orcl/datafile/system.dmp 输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-SYSAUX_FNO-2_1CU6I6QM.DBF 标记=TAG20190713T120927 RECID=14 STAMP=1013522361 通道 ORA_DISK_4: 数据文件复制完毕, 经过时间: 00:18:50 通道 ORA_DISK_4: 启动数据文件副本 输入数据文件: 文件号=00001 名称=+DATA/orcl/datafile/system.256.909498475 输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-23_1DU6I6QT.DBF 标记=TAG20190713T120927 RECID=16 STAMP=1013522400 通道 ORA_DISK_3: 数据文件复制完毕, 经过时间: 00:19:19 通道 ORA_DISK_3: 启动数据文件副本 输入数据文件: 文件号=00025 名称=+DATA/orcl/datafile/XFF62.dmp 输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-25_1HU6I7V4.DBF 标记=TAG20190713T120927 RECID=17 STAMP=1013522490 通道 ORA_DISK_3: 数据文件复制完毕, 经过时间: 00:01:35 通道 ORA_DISK_3: 启动数据文件副本 输入数据文件: 文件号=00026 名称=+DATA/orcl/datafile/XFF63.dmp 输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-SYSTEM_FNO-1_1GU6I7U0.DBF 标记=TAG20190713T120927 RECID=18 STAMP=1013522526 通道 ORA_DISK_4: 数据文件复制完毕, 经过时间: 00:02:47 通道 ORA_DISK_4: 启动数据文件副本 输入数据文件: 文件号=00007 名称=+DATA/orcl/datafile/precise.dbf 输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-SYSTEM_FNO-11_1FU6I7U0.DBF 标记=TAG20190713T120927 RECID=20 STAMP=1013522576 通道 ORA_DISK_1: 数据文件复制完毕, 经过时间: 00:03:33 通道 ORA_DISK_1: 启动数据文件副本 输入数据文件: 文件号=00004 名称=+DATA/orcl/datafile/users.259.909498477 RMAN-03009: backup 命令 (ORA_DISK_3 通道上, 在 07/13/2019 14:03:01 上) 失败 ORA-19566: 超出损坏块限制 0 (文件 +DATA/orcl/datafile/XFF63.dmp) 继续执行其他作业步骤, 将不重新运行失败的作业 通道 ORA_DISK_3: 启动数据文件副本 输入数据文件: 文件号=00014 名称=+DATA/orcl/datafile/test.280.972807149 输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-PRECISE_FNO-7_1JU6I837.DBF 标记=TAG20190713T120927 RECID=19 STAMP=1013522572 通道 ORA_DISK_4: 数据文件复制完毕, 经过时间: 00:00:46 输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-USERS_FNO-4_1KU6I84L.DBF 标记=TAG20190713T120927 RECID=21 STAMP=1013522591 通道 ORA_DISK_1: 数据文件复制完毕, 经过时间: 00:00:15 输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-TEST_FNO-14_1LU6I84L.DBF 标记=TAG20190713T120927 RECID=22 STAMP=1013522591 通道 ORA_DISK_3: 数据文件复制完毕, 经过时间: 00:00:15 输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-24_1EU6I7R0.DBF 标记=TAG20190713T120927 RECID=23 STAMP=1013522974 通道 ORA_DISK_2: 数据文件复制完毕, 经过时间: 00:11:44 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03009: backup 命令 (ORA_DISK_3 通道上, 在 07/13/2019 14:03:01 上) 失败 ORA-19566: 超出损坏块限制 0 (文件 +DATA/orcl/datafile/XFF63.dmp)
运气不错,除+DATA/orcl/datafile/XFF63.dmp文件上面有坏块之外其他文件没有发现坏块,对该文件进行特殊方式拷贝处理
dbv检查该文件
C:\Users\Administrator>dbv file=i:/rmanback/26.dbf DBVERIFY: Release 11.2.0.1.0 - Production on 星期六 7月 13 14:09:32 2019 Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved. DBVERIFY - 开始验证: FILE = I:\RMANBACK\26.DBF 页 132735 流入 - 很可能是介质损坏 Corrupt block relative dba: 0x0682067f (file 26, block 132735) Fractured block found during dbv: Data in bad block: type: 40 format: 2 rdba: 0x0682067f last change scn: 0x0000.333eaa21 seq: 0x2 flg: 0x04 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0xaa550000 check value in block header: 0x4242 computed block checksum: 0xb24f DBVERIFY - 验证完成 检查的页总数: 153600 处理的页总数 (数据): 9283 失败的页总数 (数据): 0 处理的页总数 (索引): 4789 失败的页总数 (索引): 0 处理的页总数 (其他): 139463 处理的总页数 (段) : 0 失败的总页数 (段) : 0 空的页总数: 64 标记为损坏的总页数: 1 流入的页总数: 1 加密的总页数 : 0 最高块 SCN : 1029666541 (0.1029666541)
这次运气非常好,该文件也只有一个坏块,整体来说,把正在运行的asm disk磁盘格式化为ntfs格式化,整个数据库文件只发现一个坏块.拷贝数据文件,redo,ctl等之后
通过重建控制文件尝试open数据库
SQL> @ctl.sql 控制文件已创建。 SQL> RECOVER DATABASE; 完成介质恢复。 SQL> ALTER DATABASE OPEN; 数据库已更改。
通过dba_extents 定位坏块对象,然后根据实际情况处理(index直接rebuild,表跳过,lob置空等方法),确定数据没有问题,重建磁盘组,数据回迁,恢复完美完成
联系:手机/微信(+86 17813235971) QQ(107644445)
标题:ORA-00600: internal error code, arguments: [16703], [1403], [32]
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
有网友咨询数据库启动报ORA-00600: internal error code, arguments: [16703], [1403], [32]错误,这个错误和以往遇到的不太一样,以前恢复的一些案例
tab$恢复错误汇总
10g数据库遭遇ORA-600 16703
12C数据库遭遇ORA-600 16703
tab$被恶意删除sys用户之外记录
ORA-600 16703故障解析—tab$表被清空
警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703
SQL> startup ORACLE 例程已经启动。 Total System Global Area 1.3892E+10 bytes Fixed Size 5420776 bytes Variable Size 2281703704 bytes Database Buffers 1.1576E+10 bytes Redo Buffers 28131328 bytes 数据库装载完毕。 ORA-01092: ORACLE instance terminated. Disconnection forced ORA-00704: bootstrap process failure ORA-00704: bootstrap process failure ORA-00600: internal error code, arguments: [16703], [1403], [32], [], [], [],[], [], [], [], [], [] 进程 ID: 9512 会话 ID: 272 序列号: 22801
查看alert日志
Sun Jun 30 14:47:55 2019 SMON: enabling cache recovery Errors in file D:\APP\SHORCL\diag\rdbms\orcl\orcl\trace\orcl_ora_7824.trc (incident=177881) (PDBNAME=CDB$ROOT): ORA-00600: 内部错误代码, 参数: [16703], [1403], [32], [], [], [], [], [], [], [], [], [] Incident details in: D:\APP\SHORCL\diag\rdbms\orcl\orcl\incident\incdir_177881\orcl_ora_7824_i177881.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Sun Jun 30 14:47:57 2019 Errors in file D:\APP\SHORCL\diag\rdbms\orcl\orcl\trace\orcl_ora_7824.trc: ORA-00704: 引导程序进程失败 ORA-00600: 内部错误代码, 参数: [16703], [1403], [32], [], [], [], [], [], [], [], [], [] Sun Jun 30 14:47:57 2019 Errors in file D:\APP\SHORCL\diag\rdbms\orcl\orcl\trace\orcl_ora_7824.trc: ORA-00704: 引导程序进程失败 ORA-00704: 引导程序进程失败 ORA-00600: 内部错误代码, 参数: [16703], [1403], [32], [], [], [], [], [], [], [], [], [] Sun Jun 30 14:47:57 2019 Errors in file D:\APP\SHORCL\diag\rdbms\orcl\orcl\trace\orcl_ora_7824.trc: ORA-00704: 引导程序进程失败 ORA-00704: 引导程序进程失败 ORA-00600: 内部错误代码, 参数: [16703], [1403], [32], [], [], [], [], [], [], [], [], [] Sun Jun 30 14:47:57 2019 Error 704 happened during db open, shutting down database USER (ospid: 7824): terminating the instance due to error 704 Sun Jun 30 14:48:02 2019 Instance terminated by USER, pid = 7824 ORA-1092 signalled during: ALTER DATABASE OPEN...
根据以往经验,这个很可能也是tab$数据被删除导致。经过分析,该库的区别是由于该库是12C的pdb
通过分析确认,确实是tab$数据被删除,通过bbed反向删除处理,实现时间完美恢复,open之后删除恶意脚本,数据库直接使用,实现完美恢复
SQL> startup mount ORACLE 例程已经启动。 Total System Global Area 1.3892E+10 bytes Fixed Size 5420776 bytes Variable Size 2281703704 bytes Database Buffers 1.1576E+10 bytes Redo Buffers 28131328 bytes 数据库装载完毕。 SQL> alter database open; 数据库已更改。 SQL> select 'drop '||object_type||' '||owner||'.'||object_name||';' from dba_obj ects where object_name in('DBMS_SUPPORT_DBMONITOR','DBMS_SUPPORT_DBMONITORP'); 'DROP'||OBJECT_TYPE||''||OWNER||'.'||OBJECT_NAME||';' -------------------------------------------------------------------------------- drop TRIGGER SYS.DBMS_SUPPORT_DBMONITOR; drop PROCEDURE SYS.DBMS_SUPPORT_DBMONITORP; SQL> SQL> drop TRIGGER SYS.DBMS_SUPPORT_DBMONITOR; 触发器已删除。 SQL> drop PROCEDURE SYS.DBMS_SUPPORT_DBMONITORP; 过程已删除。 SQL> shutdown immediate; 数据库已经关闭。 已经卸载数据库。 ORACLE 例程已经关闭。 SQL> startup ORACLE 例程已经启动。 Total System Global Area 1.3892E+10 bytes Fixed Size 5420776 bytes Variable Size 2281703704 bytes Database Buffers 1.1576E+10 bytes Redo Buffers 28131328 bytes 数据库装载完毕。 数据库已经打开。 SQL> show pdbs; CON_ID CON_NAME OPEN MODE RESTRICTED ---------- ------------------------------ ---------- ---------- 2 PDB$SEED READ ONLY NO 3 PDBORCL MOUNTED SQL> alter session set container=pdborcl; 会话已更改。 SQL> alter database open; 数据库已更改。
联系:手机/微信(+86 17813235971) QQ(107644445)
标题:ERROR: diskgroup XXXX was not mounted
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
aix平台10.2.0.5 2节点RAC,由于节点2系统盘故障,通过节点1镜像系统,复制到节点2,结果由于节点2磁盘顺序和节点1不匹配,aix工程师进行了相关操作之后,节点1重启之后datadg磁盘组无法mount
SQL> alter diskgroup datadg mount Mon Jun 10 23:23:46 CST 2019 NOTE: cache registered group DATADG number=1 incarn=0x8cf61164 Mon Jun 10 23:23:46 CST 2019 NOTE: Hbeat: instance first (grp 1) Mon Jun 10 23:23:50 CST 2019 NOTE: start heartbeating (grp 1) Mon Jun 10 23:23:50 CST 2019 NOTE: cache dismounting group 1/0x8CF61164 (DATADG) NOTE: dbwr not being msg'd to dismount ERROR: diskgroup DATADG was not mounted
检查datadg磁盘组相关信息
Tue Jan 29 19:21:45 CST 2019 NOTE: start heartbeating (grp 2) NOTE: cache opening disk 0 of grp 2: DATADG_0000 path:/dev/rhdisk6 Tue Jan 29 19:21:45 CST 2019 NOTE: F1X0 found on disk 0 fcn 0.0 NOTE: cache opening disk 1 of grp 2: DATADG_0001 path:/dev/rhdisk7 NOTE: cache opening disk 2 of grp 2: DATADG_0002 path:/dev/rhdisk8 NOTE: cache opening disk 3 of grp 2: DATADG_0003 path:/dev/rhdisk9 NOTE: cache mounting (first) group 2/0x60E59155 (DATADG) * allocate domain 2, invalid = TRUE Tue Jan 29 19:21:45 CST 2019 NOTE: attached to recovery domain 2 Tue Jan 29 19:21:45 CST 2019 NOTE: cache recovered group 2 to fcn 0.849668 Tue Jan 29 19:21:45 CST 2019 NOTE: LGWR attempting to mount thread 1 for disk group 2 NOTE: LGWR mounted thread 1 for disk group 2 NOTE: opening chunk 1 at fcn 0.849668 ABA NOTE: seq=21 blk=5394 Tue Jan 29 19:21:46 CST 2019 NOTE: cache mounting group 2/0x60E59155 (DATADG) succeeded SUCCESS: diskgroup DATADG was mounted
通过这里可以看出来datadg磁盘组是由rhdisk6-9 四块磁盘组成,查询相关磁盘信息发现
这里确定rhdisk7磁盘异常,通过kfed分析磁盘情况
D:\BaiduNetdiskDownload\xifenfei>kfed read rhdisk7.dd kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 34 ; 0x001: 0x22 kfbh.type: 0 ; 0x002: KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 49407 ; 0x004: blk=49407 kfbh.block.obj: 0 ; 0x008: file=0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 58396 ; 0x010: 0x0000e41c kfbh.fcn.wrap: 131072 ; 0x014: 0x00020000 kfbh.spare1: 4294967064 ; 0x018: 0xffffff18 kfbh.spare2: 2105310074 ; 0x01c: 0x7d7c7b7a 005918A00 00002200 0000C0FF 00000000 00000000 [."..............] 005918A10 0000E41C 00020000 FFFFFF18 7D7C7B7A [............z{|}] 005918A20 00000000 00000000 00000000 00000000 [................] Repeat 253 times KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0] D:\BaiduNetdiskDownload\xifenfei>kfed read rhdisk7.dd blkn=1 kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID kfbh.datfmt: 0 ; 0x003: 0x00 kfbh.block.blk: 0 ; 0x004: blk=0 kfbh.block.obj: 0 ; 0x008: file=0 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 006EF8A00 00000000 00000000 00000000 00000000 [................] Repeat 255 times KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0] D:\BaiduNetdiskDownload\xifenfei>kfed read rhdisk7.dd blkn=2|more kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 3 ; 0x002: KFBTYP_ALLOCTBL kfbh.datfmt: 1 ; 0x003: 0x01 kfbh.block.blk: 33554432 ; 0x004: blk=33554432 kfbh.block.obj: 16777344 ; 0x008: file=128 kfbh.check: 3844041089 ; 0x00c: 0xe51f6981 kfbh.fcn.base: 1297484544 ; 0x010: 0x4d560b00 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 kfdatb10.aunum: 0 ; 0x000: 0x00000000 kfdatb10.shrink: 49153 ; 0x004: 0xc001 kfdatb10.ub2pad: 20555 ; 0x006: 0x504b kfdatb10.auinfo[0].link.next: 2048 ; 0x008: 0x0800 kfdatb10.auinfo[0].link.prev: 2048 ; 0x00a: 0x0800 kfdatb10.auinfo[0].free: 0 ; 0x00c: 0x0000 kfdatb10.auinfo[0].total: 49153 ; 0x00e: 0xc001 kfdatb10.auinfo[1].link.next: 4096 ; 0x010: 0x1000 kfdatb10.auinfo[1].link.prev: 4096 ; 0x012: 0x1000 kfdatb10.auinfo[1].free: 0 ; 0x014: 0x0000 kfdatb10.auinfo[1].total: 0 ; 0x016: 0x0000 kfdatb10.auinfo[2].link.next: 6144 ; 0x018: 0x1800 kfdatb10.auinfo[2].link.prev: 6144 ; 0x01a: 0x1800 kfdatb10.auinfo[2].free: 0 ; 0x01c: 0x0000 kfdatb10.auinfo[2].total: 0 ; 0x01e: 0x0000 kfdatb10.auinfo[3].link.next: 8192 ; 0x020: 0x2000 kfdatb10.auinfo[3].link.prev: 8192 ; 0x022: 0x2000 kfdatb10.auinfo[3].free: 0 ; 0x024: 0x0000
对比磁盘可能的损坏情况,由于在aix 平台asm disk的block有一个特征一般0082开头,通过工具打开磁盘,检索该标记对比
正常磁盘
通过上述分析,大概评估rhdisk7 元数据部分损坏的不光是block 0和1,人工修复继续使用的可能性不太大,而且基于客户的数据库不大,采取方案是直接拷贝数据文件、redo、控制文件到文件系统,然后在本地文件系统open库
运气不错,实现完美恢复数据0丢失
联系:手机/微信(+86 17813235971) QQ(107644445)
标题:WARNING: Read Failed.导致asm磁盘组异常
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
有客户对asm dg进行扩容,一段时间之后,asm data 磁盘组直接dismount
Wed May 29 18:37:25 2019 SUCCESS: ALTER DISKGROUP DATA ADD DISK '/dev/oracleasm/disks/DATA_0028' SIZE 511993M , '/dev/oracleasm/disks/DATA_0027' SIZE 511993M , '/dev/oracleasm/disks/DATA_0026' SIZE 511993M , '/dev/oracleasm/disks/DATA_0025' SIZE 511993M /* ASMCA */ NOTE: starting rebalance of group 1/0x9e18e2f1 (DATA) at power 1 Wed May 29 18:37:26 2019 Starting background process ARB0 Wed May 29 18:37:26 2019 ARB0 started with pid=34, OS id=96638 NOTE: assigning ARB0 to group 1/0x9e18e2f1 (DATA) with 1 parallel I/O NOTE: Attempting voting file refresh on diskgroup DATA NOTE: Refresh completed on diskgroup DATA. No voting file found. cellip.ora not found. Wed May 29 19:21:43 2019 WARNING: Read Failed. group:1 disk:27 AU:0 offset:360448 size:4096 WARNING: cache failed reading from group=1(DATA) dsk=27 blk=88 count=1 from disk= 27 (DATA_0027) kfkist=0x20 status=0x02 osderr=0x0 file=kfc.c line=11596 ERROR: cache failed to read group=1(DATA) dsk=27 blk=88 from disk(s): 27(DATA_0027) ORA-15080: synchronous I/O operation to a disk failed ORA-27072: File I/O error Linux-x86_64 Error: 5: Input/output error Additional information: 4 Additional information: 704 Additional information: -1 NOTE: cache initiating offline of disk 27 group DATA NOTE: process _user31879_+asm1 (31879) initiating offline of disk 27.3915911747 (DATA_0027) with mask 0x7e in group 1 NOTE: initiating PST update: grp = 1, dsk = 27/0xe9681243, mask = 0x6a, op = clear Wed May 29 19:21:43 2019 GMON updating disk modes for group 1 at 10 for pid 35, osid 31879 ERROR: Disk 27 cannot be offlined, since diskgroup has external redundancy. ERROR: too many offline disks in PST (grp 1) Wed May 29 19:21:43 2019 NOTE: cache dismounting (not clean) group 1/0x9E18E2F1 (DATA) NOTE: messaging CKPT to quiesce pins Unix process pid: 90256, image: oracle@ftz-db-o1 (B000) Wed May 29 19:21:43 2019 NOTE: halting all I/Os to diskgroup 1 (DATA) WARNING: Offline for disk DATA_0027 in mode 0x7f failed. Wed May 29 19:21:43 2019 NOTE: LGWR doing non-clean dismount of group 1 (DATA) NOTE: LGWR sync ABA=27.3207 last written ABA 27.3207 Wed May 29 19:21:43 2019 ERROR: ORA-15130 thrown in ARB0 for group number 1 Errors in file /oracle/grid_base/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_96638.trc: ORA-15130: diskgroup "" is being dismounted ORA-15130: diskgroup "DATA" is being dismounted Wed May 29 19:21:43 2019 NOTE: stopping process ARB0
后续继续mount data 磁盘组成功,但是立马又dismount
Wed May 29 18:37:25 2019 SUCCESS: ALTER DISKGROUP DATA ADD DISK '/dev/oracleasm/disks/DATA_0028' SIZE 511993M , '/dev/oracleasm/disks/DATA_0027' SIZE 511993M , '/dev/oracleasm/disks/DATA_0026' SIZE 511993M , '/dev/oracleasm/disks/DATA_0025' SIZE 511993M /* ASMCA */ NOTE: starting rebalance of group 1/0x9e18e2f1 (DATA) at power 1 Wed May 29 18:37:26 2019 Starting background process ARB0 Wed May 29 18:37:26 2019 ARB0 started with pid=34, OS id=96638 NOTE: assigning ARB0 to group 1/0x9e18e2f1 (DATA) with 1 parallel I/O NOTE: Attempting voting file refresh on diskgroup DATA NOTE: Refresh completed on diskgroup DATA. No voting file found. cellip.ora not found. Wed May 29 19:21:43 2019 WARNING: Read Failed. group:1 disk:27 AU:0 offset:360448 size:4096 WARNING: cache failed reading from group=1(DATA) dsk=27 blk=88 count=1 from disk= 27 (DATA_0027) kfkist=0x20 status=0x02 osderr=0x0 file=kfc.c line=11596 ERROR: cache failed to read group=1(DATA) dsk=27 blk=88 from disk(s): 27(DATA_0027) ORA-15080: synchronous I/O operation to a disk failed ORA-27072: File I/O error Linux-x86_64 Error: 5: Input/output error Additional information: 4 Additional information: 704 Additional information: -1 NOTE: cache initiating offline of disk 27 group DATA NOTE: process _user31879_+asm1 (31879) initiating offline of disk 27.3915911747 (DATA_0027) with mask 0x7e in group 1 NOTE: initiating PST update: grp = 1, dsk = 27/0xe9681243, mask = 0x6a, op = clear Wed May 29 19:21:43 2019 GMON updating disk modes for group 1 at 10 for pid 35, osid 31879 ERROR: Disk 27 cannot be offlined, since diskgroup has external redundancy. ERROR: too many offline disks in PST (grp 1) Wed May 29 19:21:43 2019 NOTE: cache dismounting (not clean) group 1/0x9E18E2F1 (DATA) NOTE: messaging CKPT to quiesce pins Unix process pid: 90256, image: oracle@ftz-db-o1 (B000) Wed May 29 19:21:43 2019 NOTE: halting all I/Os to diskgroup 1 (DATA) WARNING: Offline for disk DATA_0027 in mode 0x7f failed. Wed May 29 19:21:43 2019 NOTE: LGWR doing non-clean dismount of group 1 (DATA) NOTE: LGWR sync ABA=27.3207 last written ABA 27.3207 Wed May 29 19:21:43 2019 ERROR: ORA-15130 thrown in ARB0 for group number 1 Errors in file /oracle/grid_base/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_96638.trc: ORA-15130: diskgroup "" is being dismounted ORA-15130: diskgroup "DATA" is being dismounted Wed May 29 19:21:43 2019 NOTE: stopping process ARB0
对于上述的故障现象,本质原因是由于asm 磁盘组增加新磁盘之后,开始做rebalance,但是由于遭遇到 27号盘上有IO读错误,使得asm磁盘组无法正常完成rebalance,因而data磁盘组无法稳定的mount。解决该问题思路,通过patch asm磁盘组,禁止rebalance,从而使得data磁盘组不再dismount,再进行后续恢复
联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
数据库启动报ORA-600 kokasgi1错误
SMON: enabling tx recovery Database Characterset is WE8ISO8859P1 Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_10056.trc (incident=269259): ORA-00600: internal error code, arguments: [kokasgi1], [], [], [], [], [], [], [], [], [], [], [] Incident details in: /u01/app/oracle/diag/rdbms/xifenfei/xifenfei1/incident/incdir_269259/xifenfei1_ora_10056_i269259.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_10056.trc: ORA-00600: internal error code, arguments: [kokasgi1], [], [], [], [], [], [], [], [], [], [], [] Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_10056.trc: ORA-00600: internal error code, arguments: [kokasgi1], [], [], [], [], [], [], [], [], [], [], [] Error 600 happened during db open, shutting down database USER (ospid: 10056): terminating the instance due to error 600 Instance terminated by USER, pid = 10056 ORA-1092 signalled during: alter database open resetlogs... opiodr aborting process unknown ospid (10056) as a result of ORA-1092 Sat May 25 09:40:21 2019 ORA-1092 : opitsk aborting process
该错误在mos上没有查询出来明确的解决方案,但是在google中有人删除user$模拟出该故障
数据库启动10046跟踪
PARSING IN CURSOR #140185422046848 len=189 dep=1 uid=0 oct=3 lid=0 tim=1558756188092143 hv=186852205 ad='390983730' sqlid='2tkw12w5k68vd' select user#,password,datats#,tempts#,type#,defrole,resource$, ptime,decode(defschclass,NULL, 'DEFAULT_CONSUMER_GROUP',defschclass),spare1,spare4,ext_username,spare2 from user$ where name=:1 END OF STMT PARSE #140185422046848:c=0,e=784,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1558756188092141 BINDS #140185422046848: Bind#0 oacdty=01 mxl=32(03) mxlc=00 mal=00 scl=00 pre=00 oacflg=18 fl2=0001 frm=01 csi=31 siz=32 off=0 kxsbbbfp=7f7f7648a230 bln=32 avl=03 flg=05 value="SYS" EXEC #140185422046848:c=1000,e=1432,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=1457651150,tim=1558756188093835 WAIT #140185422046848: nam='db file sequential read' ela= 5226 file#=1 block#=417 blocks=1 obj#=46 tim=1558756188099198 FETCH #140185422046848:c=1000,e=5465,p=1,cr=1,cu=0,mis=0,r=0,dep=1,og=4,plh=1457651150,tim=1558756188099349 STAT #140185422046848 id=1 cnt=0 pid=0 pos=1 obj=22 op='TABLE ACCESS BY INDEX ROWID USER$ (cr=1 pr=1 pw=0 time=5463 us)' STAT #140185422046848 id=2 cnt=0 pid=1 pos=1 obj=46 op='INDEX UNIQUE SCAN I_USER1 (cr=1 pr=1 pw=0 time=5461 us)' CLOSE #140185422046848:c=0,e=10,dep=1,type=0,tim=1558756188099578 ORA-00600: internal error code, arguments: [kokasgi1], [], [], [], [], [], [], [], [], [], [], []
这里比较明显数据库在查询user$中的SYS用户的时候,无法查询数据从而出现ORA-00600: internal error code, arguments: [kokasgi1]错误.通过进一步对USER$表进行分析发现,sys和system被人重命名
SQL> select name from user$ WHERE NAME LIKE 'SYS%'; NAME ------------------------------ SYSDW SYSMAN SYSTEMDW
定位到具体问题,解决比较简单,在oracle的open过程中,通过对user$表进行修复,实现数据库完美恢复.
联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
有客户由于存储控制器损坏,修复控制器之后,asm无法正常mount,报ORA-600 kfrValAcd30错误,让我们提供技术支持
asm alert日志报错
Wed Apr 03 16:50:57 2019 SQL> alter diskgroup data mount NOTE: cache registered group DATA number=1 incarn=0x14248741 NOTE: cache began mount (first) of group DATA number=1 incarn=0x14248741 NOTE: Assigning number (1,0) to disk (ORCL:DATAVOL1) Wed Apr 03 16:51:03 2019 NOTE: start heartbeating (grp 1) kfdp_query(DATA): 15 kfdp_queryBg(): 15 NOTE: cache opening disk 0 of grp 1: DATAVOL1 label:DATAVOL1 NOTE: F1X0 found on disk 0 au 2 fcn 0.0 NOTE: cache mounting (first) external redundancy group 1/0x14248741 (DATA) Wed Apr 03 16:51:04 2019 * allocate domain 1, invalid = TRUE Wed Apr 03 16:51:04 2019 NOTE: attached to recovery domain 1 NOTE: starting recovery of thread=1 ckpt=27.2697 group=1 (DATA) Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_15951.trc (incident=23394): ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [1], [27], [2697], [28], [2697], [], [], [], [], [] Incident details in: /u01/app/grid/diag/asm/+asm/+ASM2/incident/incdir_23394/+ASM2_ora_15951_i23394.trc Abort recovery for domain 1 NOTE: crash recovery signalled OER-600 ERROR: ORA-600 signalled during mount of diskgroup DATA ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [1], [27], [2697], [28], [2697], [], [], [], [], [] ERROR: alter diskgroup data mount NOTE: cache dismounting (clean) group 1/0x14248741 (DATA) NOTE: lgwr not being msg'd to dismount freeing rdom 1 Wed Apr 03 16:51:05 2019 Sweep [inc][23394]: completed Sweep [inc2][23394]: completed Wed Apr 03 16:51:05 2019 Trace dumping is performing id=[cdmp_20190403165105] NOTE: detached from domain 1 NOTE: cache dismounted group 1/0x14248741 (DATA) NOTE: cache ending mount (fail) of group DATA number=1 incarn=0x14248741 kfdp_dismount(): 16 kfdp_dismountBg(): 16 NOTE: De-assigning number (1,0) from disk (ORCL:DATAVOL1) ERROR: diskgroup DATA was not mounted NOTE: cache deleting context for group DATA 1/337938241
mos相关记录
参考:ORA-600 [KFRVALACD30] in ASM (Doc ID 2123013.1)
ORA-00600: internal error code, arguments: [kfrValAcd30]可能是bug或者硬件故障导致.基于客户的情况,最大可能就是由于硬件故障导致asm 磁盘组的acd无法正常进行,从而无法mount成功.如果运气好,通过kfed相关修复可以正常mount成功,运气不好可以通过底层进行恢复数据文件,从而最大限度恢复数据.