对恢复案例:因对工作调整不满,链家一员工删除公司 9 TB数据:被判7年事件有感

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:对恢复案例:因对工作调整不满,链家一员工删除公司 9 TB数据:被判7年事件有感

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

从业生涯中恢复过各种删库的场景(有恶意,有无意),有rm数据库,rm数据文件,drop 数据库,dd asm磁盘头,drop table,truncate table,delete table等等.
两年前的一次给链家恢复EBS数据库案例,数据库服务器被人恶意rm 删除了/u01,/u02,rman备份目录等文件(还有ebs服务器相关程序也被删除,由另外一家公司对其进行处理),导致oracle集群异常,无可用备份恢复,通过对asm磁盘组直接恢复,实现数据0丢失.最近有了法院的终审判决:
北京市海淀区人民法院认为,被告人韩冰违反国家规定,对计算机信息系统中存储的数据和应用程序进行删除,造成计算机信息系统不能正常运行,后果特别严重,其行为已构成破坏计算机信息系统罪,依法应予惩处.海淀法院判决:被告人韩冰犯破坏计算机信息系统罪,判处有期徒刑七年。
由衷的感慨一个40岁的老dba,为了一时的赌气冲动的做出了错误的事情,需要使用自己的7年青春去弥补,实在不值得.
1000


具体的关于判决的描述,参考:因对工作调整不满,链家一员工删除公司 9 TB数据:被判7年
我们做dba的,把握了企业的数据命脉,是很多企业最核心的资产,切莫因为一些工作/生活中的赌气去破坏数据库,引起系统无法正常使用,甚至导致数据丢失,给企业带来损失的同时,也让自己失去自由.
对于我们维护的数据库:切莫冲动!!!切莫冲动!!!切莫冲动!!!

Oracle Recovery Tools 解决ORA-01190 ORA-01248等故障

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle Recovery Tools 解决ORA-01190 ORA-01248等故障

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

今天有一个客户数据库恢复请求,通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)脚本分析发现resetlog信息异常
20210106182747


导致数据库恢复报ORA-01190 ORA-01110错

alter database open
Errors in file c:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_4404.trc:
ORA-01190: 控制文件或数据文件 1 来自最后一个 RESETLOGS 之前
ORA-01110: 数据文件 1: 'C:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF'
ORA-1190 signalled during: alter database open...

通过Oracle Recovery Tools工具进行修复resetlog 信息
20210106183450


再次尝试open数据库报ORA-1248错

SQL> alter database open resetlogs;
alter database open resetlogs
*
第 1 行出现错误:
ORA-01248: ?? 44 ????????????
ORA-01110: ???? 44: 'E:\ORADATA\ORCL\XIFENFEI.DBF'

Wed Jan 06 14:44:44 2021
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
ORA-1248 signalled during: alter database open resetlogs...

再次通过Oracle Recovery Tools进行修复SCN,数据库open成功

T:\xff>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on 星期三 1月 6 14:47:36 2021

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

已连接到空闲例程。

SQL> startup mount 
ORACLE 例程已经启动。

Total System Global Area 6.9214E+10 bytes
Fixed Size                  2182712 bytes
Variable Size            3.5165E+10 bytes
Database Buffers         3.3823E+10 bytes
Redo Buffers              224296960 bytes
数据库装载完毕。
SQL>
SQL>
SQL> alter database open;

数据库已更改。

Oracle Recovery Tools 12月份更新

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle Recovery Tools 12月份更新

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

根据以前恢复的场景,对Oracle Recovery Tools进行了一些bug修复和更新
1. 在win 2012版本中出现不能选择文件列表事宜进行了修复
20210105185723


2. 增加了对数据块大小为2048的支持(和数据块大小自动识别功能)
20210105185531

3. 增加了配置文件直接恢复功能,可以通过配置文件直接恢复,实现自定义方式恢复,让恢复更加灵活
20210105185546

下载地址:OraRecovery.zip
使用说明请参考:Oracle Recovery Tools

-bash: /bin/rm: Argument list too long

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:-bash: /bin/rm: Argument list too long

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

linux批量删除大量文件,当使用rm -rf *报-bash: /bin/rm: Argument list too long错误可以使用find+xargs搞定

[grid@xifenfei audit]$ rm -rf +ASM2_ora_1*_2017*.aud
-bash: /bin/rm: Argument list too long
[grid@xifenfei audit]$ ls|wc -l
111650450
[grid@xifenfei audit]$ find ./ -name "*.aud" |xargs rm -r
[grid@xifenfei audit]$ ls
[grid@xifenfei audit]$

ORA-27303: failure occurred at: skgpwinit6

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-27303: failure occurred at: skgpwinit6

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在客户联系我们,数据库在运行过程中突然报ORA-27140 ORA-27300 ORA-27301 ORA-27302 ORA-27303: failure occurred at: skgpwinit6错误
20201231173708


通过Startup Instance Failed with ORA-27140 ORA-27300 ORA-27301 ORA-27302 and ORA-27303on skgpwinit6 (Doc ID 1274030.1)分析,确认可能是由于$ORACLE_HOME/bin/oracle文件权限异常导致,通过分析确认是有系统被chmod修改了权限导致数据库异常.把oracle全下修改为chmod 6751 oracle后,数据库启动正常

ORA-600 kffmLoad_1 kffmVerify_4

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 kffmLoad_1 kffmVerify_4

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友asm运行一段时间asm实例会报错导致数据库实例异常

Wed Dec 23 08:31:55 2020
Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_asmb_6729.trc:
ORA-00600: internal error code, arguments: [kffmLoad_1], [4365], [1], [], [], [], [], []
Wed Dec 23 08:31:55 2020
Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_asmb_6729.trc:
ORA-00600: internal error code, arguments: [kffmLoad_1], [4365], [1], [], [], [], [], []

Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_asmb_29743.trc:
ORA-00600: internal error code, arguments: [kffmLoad_1], [670], [1], [], [], [], [], []
Wed Dec 23 09:10:22 2020
Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_asmb_29743.trc:
ORA-00600: internal error code, arguments: [kffmLoad_1], [670], [1], [], [], [], [], []
Wed Dec 23 09:10:22 2020

Wed Dec 23 10:18:33 2020
Errors in file /u01/app/oracle/admin/+ASM/udump/+asm1_ora_25890.trc:
ORA-00600: internal error code, arguments: [kffmVerify_4], [0], [0], [887], [1005986561], [1352], [1], [0]

对应的trace文件

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/10.2.0/db
System name:	Linux
Node name:	shb01
Release:	2.6.18-348.el5
Version:	#1 SMP Wed Nov 28 21:22:00 EST 2012
Machine:	x86_64
Instance name: +ASM1
Redo thread mounted by this instance: 0 <none>
Oracle process number: 29
Unix process pid: 26337, image: oracle@xff01 (TNS V1-V3)

*** ACTION NAME:() 2020-12-22 19:03:41.272
*** MODULE NAME:(sp_ocap@xff01 (TNS V1-V3)) 2020-12-22 19:03:41.272
*** SERVICE NAME:() 2020-12-22 19:03:41.272
*** SESSION ID:(143.1) 2020-12-22 19:03:41.272
*** 2020-12-22 19:03:41.272
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kffmVerify_4], [0], [0], [1657], [1005987045], [152], [1], [0]
Current SQL statement for this session:
DECLARE 
fileType varchar2(16); 
fileName varchar2(1024); 
blkSz number; 
fileSz number; 
hdl number; 
plksz number;
BEGIN
fileName := '+DATA4/xifenfei/onlinelog/group_6.1657.1005987045'; 
BEGIN
dbms_diskgroup.getfileattr(fileName,fileType,fileSz, blkSz); 
dbms_diskgroup.open(fileName,'r',fileType,blkSz,hdl,plkSz,fileSz); 
EXCEPTION
WHEN OTHERS then
  :rc := SQLCODE;
  :err_msg := SQLERRM;
  return;
END;
:handle := hdl; 
:bsz := blkSz; 
:bcnt := fileSz; 
:rc := 0;
END;
----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
0x15ce59360        96  package body SYS.X$DBMS_DISKGROUP
0x15cd88568        12  anonymous block
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FFFBDFC3450 ? 7FFFBDFC34B0 ?
                                                   7FFFBDFC33F0 ? 000000000 ?
ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
                                                   7FFFBDFC3450 ? 7FFFBDFC34B0 ?
                                                   7FFFBDFC33F0 ? 000000000 ?
ksfdmp()+21          call     ksedmp()             000000003 ? 000000001 ?
                                                   7FFFBDFC3450 ? 7FFFBDFC34B0 ?
                                                   7FFFBDFC33F0 ? 000000000 ?
kgerinv()+161        call     ksfdmp()             000000003 ? 000000001 ?
                                                   7FFFBDFC3450 ? 7FFFBDFC34B0 ?
                                                   7FFFBDFC33F0 ? 000000000 ?
kgeasnmierr()+163    call     kgerinv()            0068996E0 ? 009AA2670 ?
                                                   7FFFBDFC34B0 ? 7FFFBDFC33F0 ?
                                                   000000000 ? 000000000 ?
kffmVerify()+379     call     kgeasnmierr()        0068996E0 ? 009AA2670 ?
                                                   7FFFBDFC34B0 ? 7FFFBDFC33F0 ?
                                                   000000000 ? 000000000 ?
kfioIdentify()+1276  call     kffmVerify()         000000000 ? 00000000D ?
                                                   000000001 ?
                                                   927B814400000004 ?
                                                   3BF624E500000679 ?
                                                   000000000 ?
ksfd_osmopn()+1138   call     kfioIdentify()       7FFFBDFC4820 ? 15DB873F4 ?
                                                   15DB87556 ? 000000200 ?
                                                   7FFF00000003 ? 15DB873C8 ?
ksfdopn()+1014       call     ksfd_osmopn()        7FFFBDFC4820 ? 00000002D ?
                                                   000000200 ? 000000003 ?
                                                   2B3800020000 ? 15F3031F0 ?
kfpkgDGOpenFile()+2  call     ksfdopn()            7FFFBDFC4820 ? 00000002D ?
301                                                000000200 ? 000000003 ?
                                                   000020000 ? 15F3031F0 ?
pevm_icd_call_commo  call     kfpkgDGOpenFile()    2B383F459FA8 ? 00000002D ?
n()+1003                                           2B383F439070 ? 000000003 ?
                                                   000020000 ? 15F3031F0 ?
pfrinstr_ICAL()+228  call     pevm_icd_call_commo  7FFFBDFC5700 ? 000000000 ?
                              n()                  000000001 ? 000000001 ?
                                                   000000007 ? 7FFF00000000 ?
pfrrun_no_tool()+65  call     pfrinstr_ICAL()      2B383F459FA8 ? 005DBD8AA ?
                                                   2B383F45A010 ? 000000001 ?
                                                   000000007 ? 7FFF00000000 ?
pfrrun()+906         call     pfrrun_no_tool()     2B383F459FA8 ? 005DBD8AA ?
                                                   2B383F45A010 ? 000000001 ?
                                                   000000007 ? 7FFF00000000 ?
plsql_run()+841      call     pfrrun()             2B383F459FA8 ? 000000000 ?
                                                   2B383F45A010 ? 7FFFBDFC5700 ?
                                                   000000007 ? 15CD77BD6 ?
peicnt()+298         call     plsql_run()          2B383F459FA8 ? 000000001 ?
                                                   000000000 ? 7FFFBDFC5700 ?
                                                   000000007 ? 900000000 ?
kkxexe()+503         call     peicnt()             7FFFBDFC5700 ? 2B383F459FA8 ?
                                                   2B383F438830 ? 7FFFBDFC5700 ?
                                                   2B383F4367D8 ? 900000000 ?
opiexe()+4691        call     kkxexe()             2B383F4561D8 ? 2B383F459FA8 ?
                                                   2B383F438830 ? 15C160BD8 ?
                                                   0040D677F ? 900000000 ?
kpoal8()+2273        call     opiexe()             000000049 ? 000000003 ?
                                                   7FFFBDFC6950 ? 000000001 ?
                                                   0040D677F ? 900000000 ?
opiodr()+984         call     kpoal8()             00000005E ? 000000017 ?
                                                   7FFFBDFC9830 ? 000000001 ?
                                                   000000001 ? 900000000 ?
ttcpip()+1012        call     opiodr()             00000005E ? 000000017 ?
                                                   7FFFBDFC9830 ? 000000000 ?
                                                   0059C35D0 ? 900000000 ?
opitsk()+1322        call     ttcpip()             0068A13B0 ? 7FFFBDFC75A0 ?
                                                   7FFFBDFC9830 ? 000000000 ?
                                                   7FFFBDFC9328 ? 7FFFBDFC9998 ?
opiino()+1026        call     opitsk()             000000003 ? 000000000 ?
                                                   7FFFBDFC9830 ? 000000001 ?
                                                   000000000 ? 4E6111C00000001 ?
opiodr()+984         call     opiino()             00000003C ? 000000004 ?
                                                   7FFFBDFCA9F8 ? 000000001 ?
                                                   000000000 ? 4E6111C00000001 ?
opidrv()+547         call     opiodr()             00000003C ? 000000004 ?
                                                   7FFFBDFCA9F8 ? 000000000 ?
                                                   0059C3080 ? 4E6111C00000001 ?
sou2o()+114          call     opidrv()             00000003C ? 000000004 ?
                                                   7FFFBDFCA9F8 ? 000000000 ?
                                                   0059C3080 ? 4E6111C00000001 ?
opimai_real()+163    call     sou2o()              7FFFBDFCA9D0 ? 00000003C ?
                                                   000000004 ? 7FFFBDFCA9F8 ?
                                                   0059C3080 ? 4E6111C00000001 ?
main()+116           call     opimai_real()        000000002 ? 7FFFBDFCAA60 ?
                                                   000000004 ? 7FFFBDFCA9F8 ?
                                                   0059C3080 ? 4E6111C00000001 ?
__libc_start_main()  call     main()               000000002 ? 7FFFBDFCAA60 ?
+244                                               000000004 ? 7FFFBDFCA9F8 ?
                                                   0059C3080 ? 4E6111C00000001 ?
_start()+41          call     __libc_start_main()  0007230B8 ? 000000002 ?
                                                   7FFFBDFCABB8 ? 000000000 ?
                                                   0059C3080 ? 000000002 ?
 
--------------------- Binary Stack Dump ---------------------

结合mos信息ORA-600[KFFMVERIFY_4] OR ORA-600 [kffmLoad_1], [131635] REPORTED ON THE ASMINSTANCE (Doc ID 794103.1)的描述,由于多个进程/现场使用dbms_diskgroup访问不同磁盘组之时可能触发
BUG:6377738 – ASMB ORA-00600 [KFFMVERIFY_4]
BUG:8328467 – ASM CRASHED WITH ORA-600[KFFMVERIFY_4] OR [KFFMVERIFY_4] AND [KFFMLOAD_1]
从而导致asm实例crash,引起数据库异常.结合客户这边的情况,确认他们是使用了多个SharePlex程序同步数据,而且redo放在多个磁盘组中,从而出现该问题.临时解决方案为把所有的redo和归档放一个磁盘组,这样多个SharePlex进程调用dbms_diskgroup访问redo/arch不会触发该bug.

ORA-00600 kfrHtAdd01

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-00600 kfrHtAdd01

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

由于存储掉电,报ORA-15096: lost disk write detected错误,无法mount磁盘组.

Sun Dec 20 16:56:51 2020
SQL> alter diskgroup data mount 
NOTE: cache registered group DATA number=1 incarn=0x0c1a7a4e
NOTE: cache began mount (first) of group DATA number=1 incarn=0x0c1a7a4e
NOTE: Assigning number (1,2) to disk (/dev/mapper/multipath12)
NOTE: Assigning number (1,5) to disk (/dev/mapper/multipath15)
NOTE: Assigning number (1,3) to disk (/dev/mapper/multipath13)
NOTE: Assigning number (1,7) to disk (/dev/mapper/multipath17)
NOTE: Assigning number (1,1) to disk (/dev/mapper/multipath11)
NOTE: Assigning number (1,6) to disk (/dev/mapper/multipath16)
NOTE: Assigning number (1,0) to disk (/dev/mapper/multipath10)
NOTE: Assigning number (1,4) to disk (/dev/mapper/multipath14)
Sun Dec 20 16:56:57 2020
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 19 for pid 32, osid 130347
NOTE: cache opening disk 0 of grp 1: DATA_0000 path:/dev/mapper/multipath10
NOTE: F1X0 found on disk 0 au 2 fcn 0.14159360
NOTE: cache opening disk 1 of grp 1: DATA_0001 path:/dev/mapper/multipath11
NOTE: F1X0 found on disk 1 au 2 fcn 0.14159360
NOTE: cache opening disk 2 of grp 1: DATA_0002 path:/dev/mapper/multipath12
NOTE: F1X0 found on disk 2 au 2 fcn 0.14159360
NOTE: cache opening disk 3 of grp 1: DATA_0003 path:/dev/mapper/multipath13
NOTE: cache opening disk 4 of grp 1: DATA_0004 path:/dev/mapper/multipath14
NOTE: cache opening disk 5 of grp 1: DATA_0005 path:/dev/mapper/multipath15
NOTE: cache opening disk 6 of grp 1: DATA_0006 path:/dev/mapper/multipath16
NOTE: cache opening disk 7 of grp 1: DATA_0007 path:/dev/mapper/multipath17
NOTE: cache mounting (first) normal redundancy group 1/0x0C1A7A4E (DATA)
Sun Dec 20 16:56:57 2020
* allocate domain 1, invalid = TRUE 
Sun Dec 20 16:56:58 2020
NOTE: attached to recovery domain 1
NOTE: starting recovery of thread=1 ckpt=233.4189 group=1 (DATA)
NOTE: starting recovery of thread=2 ckpt=542.6409 group=1 (DATA)
lost disk write detected during recovery (apply)
NOTE: recovery (pass 2) of diskgroup 1 (DATA) caught error ORA-15096
Errors in file /grid/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_130347.trc:
ORA-15096: lost disk write detected
Abort recovery for domain 1
NOTE: crash recovery signalled OER-15096
ERROR: ORA-15096 signalled during mount of diskgroup DATA
NOTE: cache dismounting (clean) group 1/0x0C1A7A4E (DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 130347, image: oracle@db1.rac.com (TNS V1-V3)
NOTE: lgwr not being msg'd to dismount

通过一系列修复之后报错如下

Sun Dec 20 20:12:35 2020
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 23 for pid 26, osid 67538
Sun Dec 20 20:12:35 2020
NOTE: cache opening disk 0 of grp 1: DATA_0000 path:/dev/mapper/multipath10
NOTE: F1X0 found on disk 0 au 2 fcn 0.14159360
NOTE: cache opening disk 1 of grp 1: DATA_0001 path:/dev/mapper/multipath11
NOTE: F1X0 found on disk 1 au 2 fcn 0.14159360
NOTE: cache opening disk 2 of grp 1: DATA_0002 path:/dev/mapper/multipath12
NOTE: F1X0 found on disk 2 au 2 fcn 0.14159360
NOTE: cache opening disk 3 of grp 1: DATA_0003 path:/dev/mapper/multipath13
NOTE: cache opening disk 4 of grp 1: DATA_0004 path:/dev/mapper/multipath14
NOTE: cache opening disk 5 of grp 1: DATA_0005 path:/dev/mapper/multipath15
NOTE: cache opening disk 6 of grp 1: DATA_0006 path:/dev/mapper/multipath16
NOTE: cache opening disk 7 of grp 1: DATA_0007 path:/dev/mapper/multipath17
NOTE: cache mounting (first) normal redundancy group 1/0x64848829 (DATA)
Sun Dec 20 20:12:36 2020
* allocate domain 1, invalid = TRUE 
Sun Dec 20 20:12:36 2020
NOTE: attached to recovery domain 1
NOTE: Fallback recovery: thread 2 read 10751 blocks oldest redo found in ABA 540.6429
NOTE: Fallback recovery: thread 1 read 10751 blocks oldest redo found in ABA 232.4218
Errors in file /grid/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_67538.trc  (incident=1692689):
ORA-00600: internal error code, arguments: [kfrHtAdd01], [2147483651], [1025], [0], [38660545], [0],
 [38687990], [1], [2], [6429], [], []
Incident details in: /grid/app/grid/diag/asm/+asm/+ASM1/incident/incdir_1692689/+ASM1_ora_67538_i1692689.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sun Dec 20 20:12:39 2020
Sweep [inc][1692689]: completed
Sweep [inc2][1692689]: completed
Errors in file /grid/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_67538.trc:
ORA-00600: internal error code, arguments: [kfrHtAdd01], [2147483651], [1025], [0], [38660545], [0],
 [38687990], [1], [2], [6429], [], []
NOTE: crash recovery signalled OER-600
ERROR: ORA-600 signalled during mount of diskgroup DATA
NOTE: cache dismounting (clean) group 1/0x64848829 (DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 67538, image: oracle@db1.rac.com (TNS V1-V3)
NOTE: lgwr not being msg'd to dismount
freeing rdom 1
NOTE: detached from domain 1
NOTE: cache dismounted group 1/0x64848829 (DATA) 
NOTE: cache ending mount (fail) of group DATA number=1 incarn=0x64848829
NOTE: cache deleting context for group DATA 1/0x64848829
GMON dismounting group 1 at 24 for pid 26, osid 67538
NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0002 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0003 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0004 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0005 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0006 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0007 in mode 0x7f marked for de-assignment
ERROR: diskgroup DATA was not mounted
ORA-00600: internal error code, arguments: [kfrHtAdd01], [2147483651], [1025], [0],
 [38660545], [0], [38687990], [1], [2], [6429], [], []
ERROR: alter diskgroup data mount

分析trace文件

*** 2020-12-20 20:11:54.956
kfdp_query(DATA): 19 
----- Abridged Call Stack Trace -----
ksedsts()+465<-kfdp_query()+530<-kfdPstSyncPriv()+585<-kfgFinalizeMount()+1630<-kfgscFinalize()+1433<
-kfgForEachKfgsc()+285<-kfgsoFinalize()+135<-kfgFinalize()+398<-kfxdrvMount()+5558<-kfxdrvEntry()
+2207<-opiexe()+20624<-opiosq0()+3932<-kpooprx()+274<-kpoal8()+842<-opiodr()+917<-ttcpip()
+2183<-opitsk()+1710<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()
+103<-opimai_real()+133<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253 
----- End of Abridged Call Stack Trace -----
2020-12-20 20:11:55.393106 : Start recovery for domain=1, valid=0, flags=0x4
NOTE: starting recovery of thread=1 ckpt=233.4189 group=1 (DATA)
NOTE: starting recovery of thread=2 ckpt=542.6409 group=1 (DATA)
lost disk write detected during recovery (apply):
last written kfcn: 0.38747593 aba=233.4208 thd=1
kfcn_kfrbcd=0.38747593 flags_kfrbcd=0x001c aba=542.6410 thd=2
CE: (0x0x66edc798) group=1 (DATA) fn=4 blk=1
    hashFlags=0x0000 lid=0x0002 lruFlags=0x0000 bastCount=1
    mirror=0
    flags_kfcpba=0x38 copies=3 blockIndex=1 AUindex=0 AUcount=0 loctr fcn=0.0
    copy #0:  disk=6  au=35 flags=01
    copy #1:  disk=0  au=34 flags=01
    copy #2:  disk=4  au=52 flags=01
BH: (0x0x66e10d00) bnum=33 type=COD_RBO state=rcv chgSt=not modifying pageIn=rcvRead
    flags=0x00000000 pinmode=excl lockmode=null bf=0x66020000
    kfbh_kfcbh.fcn_kfbh = 0.38747538 lowAba=0.0 highAba=0.0
    modTime=0
    last kfcbInitSlot return code=null chgCount=0 cpkt lnk is null ralFlags=0x00000000
    PINS:
    (kfcbps) pin=91 get by kfr.c line 7879 mode=excl
             fn=4 blk=1 status=pinned
             flags=0x88000000 flags2=0x00000000
             class=0 type=INVALID stateWanted=rcvRead
             bastCount=1 waitStatus=0x00000000 relocCount=0
             scanBastCount=0 scanBxid=0 scanSkipCode=0
             last released by kfc.c 21183
NOTE: recovery (pass 2) of diskgroup 1 (DATA) caught error ORA-15096
last new 0.0
kfrPass2: dump of current log buffer for error 15096 follows
=======================
OSM metadata block dump:
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            8 ; 0x002: KFBTYP_CHNGDIR
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                   17162 ; 0x004: blk=17162
kfbh.block.obj:                       3 ; 0x008: file=3
kfbh.check:                  4226524538 ; 0x00c: 0xfbeba57a
kfbh.fcn.base:                 38747431 ; 0x010: 0x024f3d27
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfracdb.aba.seq:                    542 ; 0x000: 0x0000021e
kfracdb.aba.blk:                   6409 ; 0x004: 0x00001909
kfracdb.ents:                         1 ; 0x008: 0x0001
kfracdb.ub2spare:                     0 ; 0x00a: 0x0000
kfracdb.lge[0].valid:                 1 ; 0x00c: V=1 B=0 M=0
kfracdb.lge[0].chgCount:              1 ; 0x00d: 0x01
kfracdb.lge[0].len:                  68 ; 0x00e: 0x0044
kfracdb.lge[0].kfcn.base:      38747432 ; 0x010: 0x024f3d28
kfracdb.lge[0].kfcn.wrap:             0 ; 0x014: 0x00000000
kfracdb.lge[0].bcd[0].kfbl.blk:    1292 ; 0x018: blk=1292
kfracdb.lge[0].bcd[0].kfbl.obj:       1 ; 0x01c: file=1
kfracdb.lge[0].bcd[0].kfcn.base:38743102 ; 0x020: 0x024f2c3e
kfracdb.lge[0].bcd[0].kfcn.wrap:      0 ; 0x024: 0x00000000
kfracdb.lge[0].bcd[0].oplen:          8 ; 0x028: 0x0008
kfracdb.lge[0].bcd[0].blkIndex:      12 ; 0x02a: 0x000c
kfracdb.lge[0].bcd[0].flags:         28 ; 0x02c: F=0 N=0 F=1 L=1 V=1 A=0 C=0
kfracdb.lge[0].bcd[0].opcode:       135 ; 0x02e: 0x0087
kfracdb.lge[0].bcd[0].kfbtyp:         4 ; 0x030: KFBTYP_FILEDIR
kfracdb.lge[0].bcd[0].redund:        19 ; 0x031: SCHE=0x1 NUMB=0x3
kfracdb.lge[0].bcd[0].pad:        63903 ; 0x032: 0xf99f
kfracdb.lge[0].bcd[0].KFFFD_COMMIT.modts.hi:33108586 ; 0x034: HOUR=0xa DAYS=0x13 MNTH=0xc YEAR=0x7e4
kfracdb.lge[0].bcd[0].KFFFD_COMMIT.modts.lo:0 ; 0x038: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfracdb.lge[0].bcd[0].au[0]:     292415 ; 0x03c: 0x0004763f
kfracdb.lge[0].bcd[0].au[1]:     292452 ; 0x040: 0x00047664
kfracdb.lge[0].bcd[0].au[2]:     292474 ; 0x044: 0x0004767a
kfracdb.lge[0].bcd[0].disks[0]:       2 ; 0x048: 0x0002
kfracdb.lge[0].bcd[0].disks[1]:       1 ; 0x04a: 0x0001
kfracdb.lge[0].bcd[0].disks[2]:       0 ; 0x04c: 0x0000

彻底屏蔽asm的实例恢复,mount磁盘组,尝试启动库进行数据库恢复.如果如果此类asm无法mount问题,无法自行解决请联系我们
电话/微信:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

记录oracle安装的两个小问题(INS-30060和弹出子窗口异常)

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:记录oracle安装的两个小问题(INS-30060和弹出子窗口异常)

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

很久没有做安装的活,今天帮朋友处理安装的活遇到两个小问题,顺手记录下
1. 在linux 7.2中安装11.2.0.4在安装oui界面中弹出子窗口异常问题,如下图
20201225141012


问题原因是由于java兼容性的问题导致,不使用数据库软件自带程序,人工指定系统自带java即可
20201225140812

2. 报INS-30060错误
20201225141205

SEVERE: [FATAL] [INS-30060] Check for group existence failed.
CAUSE: Unexpected error occurred while trying to check for group existence.
ACTION: Refer to the logs or contact Oracle Support Services. 
Note for advanced users: Launch the installer by passing the following flag ''-ignoreInternalDriverError''..

出现这个问题的原因是由于开始创建了不合适的oracle用户,我删除了重建,导致uid不一致,从而使得CVU_11.2.0.3.0_oracle(及其内容)的权限不合适,从而出现该问题,删除掉/tmp下面相关目录解决

dblink会话引起library cache lock

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:dblink会话引起library cache lock

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户反馈,系统最近几天晚上都有卡顿
alert日志里面报如下错误
20201213180505


查看对应的trace文件
20201213180555

确定在收集统计信息的时候报ORA-04021错误.
查看这段时间ash报告
20201213180855

大量的library cache lock等待,而且引起等待的是类似语句,主要都是集中在一张表上,和收集统计信息报错的trace表匹配.
正好当天有对该表进行增加分区维护hang住
20201213181100

查询等待事件和阻塞情况

SQL> select distinct sid,a.BLOCKING_SESSION_STATUS,a.BLOCKING_INSTANCE,a.BLOCKING_SESSION
  2  ,event from gv$session a where sid=7399;

       SID BLOCKING_SE BLOCKING_INSTANCE BLOCKING_SESSION
---------- ----------- ----------------- ----------------
EVENT
----------------------------------------------------------------
      7399 VALID                       3             3593
library cache lock

SQL>  select a.INST_ID,a.sid,a.paddr,a.sql_id,a.event,a.MACHINE,a.PROGRAM 
   2 ,status from gv$session a where a.sid=3593;

   INST_ID        SID PADDR            SQL_ID
---------- ---------- ---------------- -------------
EVENT
----------------------------------------------------------------
MACHINE
----------------------------------------------------------------
PROGRAM                                          STATUS
------------------------------------------------ --------
         3       3593 0000001F91E80670 grxhz2vpmsrc6
SQL*Net message from dblink
WORKGROUP\XG
PlatformSyn.exe                                  ACTIVE

由于对应的sql_id 无法找到sql语句,不过根据等待事件基本上确认是调用一个dblink导致该问题,通过查询该回话,发现该回话一致处于active状态,但是一致无任何变化,实在可能处于僵死状态,对其进行kill之后,增加分区正常,收集统计信息正常.

asm磁盘类似_DROPPED_0001_DATA名称故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:asm磁盘类似_DROPPED_0001_DATA名称故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

发现一客户数据库的asm磁盘组中有磁盘掉线(通过分析日志确认2016年就已经掉线,而且不在做rebalance)
20201205195855


20201205221937

进一步检查

SQL> /

NAME			       PATH		  GROUP_NUMBER DISK_NUMBER MOUNT_STATUS   HEADER_STATUS
------------------------------ --------------------- ------------ ----------- -------------- ------------------------
MODE_STATUS    STATE		FAILGROUP
-------------- ---------------- --------------------
			       ORCL:DATA2	  0		 0 CLOSED	  MEMBER
ONLINE	       NORMAL

			       ORCL:FLASH1	  0		 1 CLOSED	  MEMBER
ONLINE	       NORMAL

			       ORCL:GRID3	  0		 2 CLOSED	  MEMBER
ONLINE	       NORMAL

_DROPPED_0000_FLASH				  2		 0 MISSING	  UNKNOWN
OFFLINE        FORCING		FLASH1

_DROPPED_0001_DATA				  1		 1 MISSING	  UNKNOWN
OFFLINE        FORCING		DATA2

DATA1			       ORCL:DATA1	  1		 0 CACHED	  MEMBER
ONLINE	       NORMAL		DATA1

FLASH2			       ORCL:FLASH2	  2		 1 CACHED	  MEMBER
ONLINE	       NORMAL		FLASH2

GRID1			       ORCL:GRID1	  3		 0 CACHED	  MEMBER
ONLINE	       NORMAL		GRID1

GRID2			       ORCL:GRID2	  3		 1 CACHED	  MEMBER
ONLINE	       NORMAL		GRID2

GRID4			       ORCL:GRID4	  3		 3 CACHED	  MEMBER
ONLINE	       NORMAL		GRID4

GRID5			       ORCL:GRID5	  3		 4 CACHED	  MEMBER
ONLINE	       NORMAL		GRID5

GRID6			       ORCL:GRID6	  3		 5 CACHED	  MEMBER
ONLINE	       NORMAL		GRID6


12 rows selected.


SQL> select NAME,STATE,TYPE,OFFLINE_DISKS from v$asm_diskgroup;

NAME
------------------------------------------------------------
STATE		       TYPE	    OFFLINE_DISKS
---------------------- ------------ -------------
DATA
MOUNTED 	       NORMAL			1

FLASH
MOUNTED 	       NORMAL			1

GRID
MOUNTED 	       NORMAL			0

主要问题是由于ORCL:FLASH1和ORCL:DATA2磁盘掉线导致处于_DROPPED_0000_FLASH和_DROPPED_0001_DATA状态.底层检查,确定现在这些磁盘都正常.然后使用force命令进行强制增加掉线的磁盘到对应的磁盘组中

SQL> alter diskgroup FLASH add failgroup flg1 disk 'ORCL:FLASH1'  force;

Diskgroup altered.

SQL> alter diskgroup data add failgroup dg2 disk 'ORCL:DATA2'  force;

Diskgroup altered.

观察asm 日志,等rebalance完成

Sat Dec 05 16:48:10 2020
SQL> alter diskgroup FLASH add failgroup flg1 disk 'ORCL:FLASH1'  force 
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (2,2) to disk (ORCL:FLASH1)
NOTE: requesting all-instance membership refresh for group=2
NOTE: initializing header on grp 2 disk FLASH1
NOTE: requesting all-instance disk validation for group=2
Sat Dec 05 16:48:13 2020
NOTE: skipping rediscovery for group 2/0x58e713e7 (FLASH) on local instance.
NOTE: requesting all-instance disk validation for group=2
NOTE: skipping rediscovery for group 2/0x58e713e7 (FLASH) on local instance.
Sat Dec 05 16:48:19 2020
GMON updating for reconfiguration, group 2 at 14 for pid 34, osid 12203
NOTE: group 2 PST updated.
NOTE: initiating PST update: grp = 2
GMON updating group 2 at 15 for pid 34, osid 12203
NOTE: cache closing disk 0 of grp 2: (not open) _DROPPED_0000_FLASH
NOTE: group FLASH: updated PST location: disk 0001 (PST copy 0)
NOTE: group FLASH: updated PST location: disk 0002 (PST copy 1)
NOTE: PST update grp = 2 completed successfully 
NOTE: membership refresh pending for group 2/0x58e713e7 (FLASH)
GMON querying group 2 at 16 for pid 18, osid 41180
NOTE: cache closing disk 0 of grp 2: (not open) _DROPPED_0000_FLASH
NOTE: cache opening disk 2 of grp 2: FLASH1 label:FLASH1
NOTE: Attempting voting file refresh on diskgroup FLASH
NOTE: Refresh completed on diskgroup FLASH. No voting file found.
GMON querying group 2 at 17 for pid 18, osid 41180
NOTE: cache closing disk 0 of grp 2: (not open) _DROPPED_0000_FLASH
Sat Dec 05 16:48:25 2020
SUCCESS: refreshed membership for 2/0x58e713e7 (FLASH)
Sat Dec 05 16:48:25 2020
SUCCESS: alter diskgroup FLASH add failgroup flg1 disk 'ORCL:FLASH1'  force
NOTE: starting rebalance of group 2/0x58e713e7 (FLASH) at power 1
Starting background process ARB0
Sat Dec 05 16:48:26 2020
ARB0 started with pid=36, OS id=12451 
NOTE: assigning ARB0 to group 2/0x58e713e7 (FLASH) with 1 parallel I/O
cellip.ora not found.
NOTE: F1X0 copy 2 relocating from 0:2 to 2:2 for diskgroup 2 (FLASH)
NOTE: Attempting voting file refresh on diskgroup FLASH
NOTE: Refresh completed on diskgroup FLASH. No voting file found.
Sat Dec 05 16:48:45 2020
NOTE: Rebalance has restored redundancy for any existing control file or redo log in disk group FLASH
Sat Dec 05 16:49:06 2020
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 2/0x58e713e7 (FLASH)
Sat Dec 05 16:49:08 2020
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=2
Sat Dec 05 16:49:11 2020
GMON updating for reconfiguration, group 2 at 18 for pid 36, osid 12681
NOTE: cache closing disk 0 of grp 2: (not open) _DROPPED_0000_FLASH
NOTE: group FLASH: updated PST location: disk 0001 (PST copy 0)
NOTE: group FLASH: updated PST location: disk 0002 (PST copy 1)
NOTE: group 2 PST updated.
SUCCESS: grp 2 disk _DROPPED_0000_FLASH going offline 
GMON updating for reconfiguration, group 2 at 19 for pid 36, osid 12681
NOTE: cache closing disk 0 of grp 2: (not open) _DROPPED_0000_FLASH
NOTE: group FLASH: updated PST location: disk 0001 (PST copy 0)
NOTE: group FLASH: updated PST location: disk 0002 (PST copy 1)
NOTE: group 2 PST updated.
NOTE: membership refresh pending for group 2/0x58e713e7 (FLASH)
GMON querying group 2 at 20 for pid 18, osid 41180
GMON querying group 2 at 21 for pid 18, osid 41180
NOTE: Disk _DROPPED_0000_FLASH in mode 0x0 marked for de-assignment
SUCCESS: refreshed membership for 2/0x58e713e7 (FLASH)
Sat Dec 05 16:51:56 2020
SQL> alter diskgroup data add failgroup dg2 disk 'ORCL:DATA2'  force 
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (1,2) to disk (ORCL:DATA2)
NOTE: requesting all-instance membership refresh for group=1
NOTE: initializing header on grp 1 disk DATA2
NOTE: requesting all-instance disk validation for group=1
Sat Dec 05 16:51:57 2020
NOTE: skipping rediscovery for group 1/0x58d713e6 (DATA) on local instance.
NOTE: requesting all-instance disk validation for group=1
NOTE: skipping rediscovery for group 1/0x58d713e6 (DATA) on local instance.
Sat Dec 05 16:52:02 2020
GMON updating for reconfiguration, group 1 at 22 for pid 34, osid 12203
NOTE: group 1 PST updated.
NOTE: initiating PST update: grp = 1
GMON updating group 1 at 23 for pid 34, osid 12203
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: group DATA: updated PST location: disk 0000 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0002 (PST copy 1)
NOTE: PST update grp = 1 completed successfully 
NOTE: membership refresh pending for group 1/0x58d713e6 (DATA)
GMON querying group 1 at 24 for pid 18, osid 41180
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: cache opening disk 2 of grp 1: DATA2 label:DATA2
Sat Dec 05 16:52:08 2020
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
GMON querying group 1 at 25 for pid 18, osid 41180
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
SUCCESS: refreshed membership for 1/0x58d713e6 (DATA)
Sat Dec 05 16:52:08 2020
SUCCESS: alter diskgroup data add failgroup dg2 disk 'ORCL:DATA2'  force
NOTE: starting rebalance of group 1/0x58d713e6 (DATA) at power 1
Starting background process ARB0
Sat Dec 05 16:52:08 2020
ARB0 started with pid=37, OS id=13463 
NOTE: assigning ARB0 to group 1/0x58d713e6 (DATA) with 1 parallel I/O
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
Sat Dec 05 16:52:44 2020
cellip.ora not found.
NOTE: F1X0 copy 2 relocating from 1:2 to 2:2 for diskgroup 1 (DATA)
Sat Dec 05 16:53:22 2020
NOTE: Rebalance has restored redundancy for any existing control file or redo log in disk group DATA
NOTE: membership refresh pending for group 1/0x58d713e6 (DATA)
GMON querying group 1 at 27 for pid 18, osid 41180
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
SUCCESS: refreshed membership for 1/0x58d713e6 (DATA)
SUCCESS: alter diskgroup data rebalance power 11
NOTE: starting rebalance of group 1/0x58d713e6 (DATA) at power 11
Starting background process ARB0
Sat Dec 05 17:27:52 2020
ARB0 started with pid=35, OS id=23318 
NOTE: assigning ARB0 to group 1/0x58d713e6 (DATA) with 11 parallel I/Os
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
Sat Dec 05 17:28:29 2020
cellip.ora not found.
Sat Dec 05 17:28:45 2020
NOTE: Rebalance has restored redundancy for any existing control file or redo log in disk group DATA
Sat Dec 05 18:48:10 2020
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=1
Sat Dec 05 18:48:32 2020
GMON updating for reconfiguration, group 1 at 28 for pid 36, osid 47454
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: group DATA: updated PST location: disk 0000 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0002 (PST copy 1)
Sat Dec 05 18:48:32 2020
NOTE: group 1 PST updated.
SUCCESS: grp 1 disk _DROPPED_0001_DATA going offline 
GMON updating for reconfiguration, group 1 at 29 for pid 36, osid 47454
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: group DATA: updated PST location: disk 0000 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0002 (PST copy 1)
NOTE: group 1 PST updated.
Sat Dec 05 18:48:32 2020
NOTE: membership refresh pending for group 1/0x58d713e6 (DATA)
GMON querying group 1 at 30 for pid 18, osid 41180
GMON querying group 1 at 31 for pid 18, osid 41180
NOTE: Disk _DROPPED_0001_DATA in mode 0x0 marked for de-assignment
SUCCESS: refreshed membership for 1/0x58d713e6 (DATA)
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
Sat Dec 05 18:52:24 2020
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 1/0x58d713e6 (DATA)

查询磁盘状态,掉线磁盘已经被加入,asm磁盘组恢复正常
20201205201841


20201205201851
总结:对于normal磁盘组由于某种原因磁盘从磁盘组中掉,v$asm_disk.name类似_DROPPED_0001_DATA,v$asm_disk.state为FORCING,可以通过类似alter diskgroup data add failgroup dg2 disk ‘ORCL:DATA2′ force;方式强制增加掉线的磁盘进入磁盘组,然后待rebalance完成,问题修复