kfed修复ORA-15196

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:kfed修复ORA-15196

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友的asm磁盘组因为以前遗留问题(在另外一套机器上的asm disk被加入到了一个新的asm磁盘组中,导致老的dg直接dismount,新加入asm disk的磁盘组一直在使用,未听建议进行重建),昨天突然意外dismount了

Mon Dec 18 08:38:13 2023
NOTE: No asm libraries found in the system
ASM Health Checker found 1 new failures
Mon Dec 18 08:38:35 2023
NOTE: client his2:his registered, osid 3998514, mbr 0x1
Thu Jan 04 21:44:55 2024
WARNING: cache read  a corrupt block: group=2(DATA) fn=1 blk=6743 disk=8 (DATA_0008) incarn=1428496145 au=3 blk=87 count=1
Errors in file /u01/app/grid/diag/asm/+asm/+ASM4/trace/+ASM4_ora_4915366.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
NOTE: a corrupted block from group DATA was dumped to /u01/app/grid/diag/asm/+asm/+ASM4/trace/+ASM4_ora_4915366.trc
WARNING: cache read (retry) a corrupt block: 
 group=2(DATA) fn=1 blk=6743 disk=8 (DATA_0008) incarn=1428496145 au=3 blk=87 count=1
Errors in file /u01/app/grid/diag/asm/+asm/+ASM4/trace/+ASM4_ora_4915366.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
ERROR: cache failed to read group=2(DATA) fn=1 blk=6743 from disk(s): 8(DATA_0008)
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
NOTE: cache initiating offline of disk 8 group DATA
NOTE: process _user4915366_+asm4 (4915366) initiating offline of disk 8.1428496145 (DATA_0008) with mask 0x7e in group 2
NOTE: initiating PST update: grp = 2, dsk = 8/0x55251f11, mask = 0x6a, op = clear
Thu Jan 04 21:44:55 2024
GMON updating disk modes for group 2 at 9 for pid 24, osid 4915366
ERROR: Disk 8 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 2)
Thu Jan 04 21:44:55 2024
NOTE: cache dismounting (not clean) group 2/0x7F35EE0E (DATA)
WARNING: Offline for disk DATA_0008 in mode 0x7f failed.
Thu Jan 04 21:44:55 2024
NOTE: halting all I/Os to diskgroup 2 (DATA)
NOTE: messaging CKPT to quiesce pins Unix process pid: 3473846, image: oracle@zzzx1 (B000)
Errors in file /u01/app/grid/diag/asm/+asm/+ASM4/trace/+ASM4_ora_4915366.trc  (incident=4023553):
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0008" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM4/incident/incdir_4023553/+ASM4_ora_4915366_i4023553.trc
Thu Jan 04 21:44:57 2024
ERROR: ORA-15130 in COD recovery for diskgroup 2/0x7f35ee0e (DATA)
ERROR: ORA-15130 thrown in RBAL for group number 2
Errors in file /u01/app/grid/diag/asm/+asm/+ASM4/trace/+ASM4_rbal_2228716.trc:
ORA-15130: diskgroup "DATA" is being dismounted

尝试重新mount 磁盘组,片刻之后自动dismount

Thu Jan 04 23:10:35 2024
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 2
SUCCESS: diskgroup DATA was mounted
SUCCESS: alter diskgroup data mount
Thu Jan 04 23:10:42 2024
NOTE: diskgroup resource ora.DATA.dg is online
Thu Jan 04 23:10:47 2024
NOTE: client his2:his registered, osid 3998052, mbr 0x1
Thu Jan 04 23:11:00 2024
WARNING: cache read  a corrupt block: group=2(DATA) fn=1 blk=6743 disk=8 (DATA_0008) incarn=1428496181 au=3 blk=87 count=1
Errors in file /u01/app/grid/diag/asm/+asm/+ASM4/trace/+ASM4_ora_4129826.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
NOTE: a corrupted block from group DATA was dumped to /u01/app/grid/diag/asm/+asm/+ASM4/trace/+ASM4_ora_4129826.trc
WARNING: cache read (retry) a corrupt block: 
  group=2(DATA) fn=1 blk=6743 disk=8 (DATA_0008) incarn=1428496181 au=3 blk=87 count=1
Errors in file /u01/app/grid/diag/asm/+asm/+ASM4/trace/+ASM4_ora_4129826.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
ERROR: cache failed to read group=2(DATA) fn=1 blk=6743 from disk(s): 8(DATA_0008)
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
NOTE: cache initiating offline of disk 8 group DATA
NOTE: process _user4129826_+asm4 (4129826) initiating offline of disk 8.1428496181 (DATA_0008) with mask 0x7e in group 2
NOTE: initiating PST update: grp = 2, dsk = 8/0x55251f35, mask = 0x6a, op = clear
Thu Jan 04 23:11:01 2024
GMON updating disk modes for group 2 at 21 for pid 35, osid 4129826
ERROR: Disk 8 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 2)
Thu Jan 04 23:11:01 2024
NOTE: cache dismounting (not clean) group 2/0x1CB5EE3B (DATA)
WARNING: Offline for disk DATA_0008 in mode 0x7f failed.
NOTE: messaging CKPT to quiesce pins Unix process pid: 5112822, image: oracle@zzzx1 (B000)

从报错信息看是DATA_0008磁盘的au 3 blkn 87的block异常,应该是block 6743被写成了6999导致了该问题

kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            4 ; 0x002: KFBTYP_FILEDIR
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                    6999 ; 0x004: blk=6999
kfbh.block.obj:                       1 ; 0x008: file=1
kfbh.check:                  3317183844 ; 0x00c: 0xc5b83564
kfbh.fcn.base:                165670551 ; 0x010: 0x09dfee97
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfffdb.node.incarn:          1145623147 ; 0x000: A=1 NUMM=0x22246935
kfffdb.node.frlist.number:   4294967295 ; 0x004: 0xffffffff
kfffdb.node.frlist.incarn:            0 ; 0x008: A=0 NUMM=0x0
kfffdb.hibytes:                       0 ; 0x00c: 0x00000000
kfffdb.lobytes:                83482624 ; 0x010: 0x04f9d800

这个处理比较简单吧

kfbh.block.blk:                    6999 ; 0x004: blk=6999
修改为
kfbh.block.blk:                    6743; 0x004: blk=6743

然后kefd merge并且尝试mount磁盘组
20240105202425


通过检查确认磁盘组不再dismount,但是由于后续元数据还有问题,导致asm无法创建新的文件,后续建议:在数据库在mount状态下,rman备份,重建该磁盘组