kfed修复ORA-15196

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:kfed修复ORA-15196

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友的asm磁盘组因为以前遗留问题(在另外一套机器上的asm disk被加入到了一个新的asm磁盘组中,导致老的dg直接dismount,新加入asm disk的磁盘组一直在使用,未听建议进行重建),昨天突然意外dismount了

Mon Dec 18 08:38:13 2023
NOTE: No asm libraries found in the system
ASM Health Checker found 1 new failures
Mon Dec 18 08:38:35 2023
NOTE: client his2:his registered, osid 3998514, mbr 0x1
Thu Jan 04 21:44:55 2024
WARNING: cache read  a corrupt block: group=2(DATA) fn=1 blk=6743 disk=8 (DATA_0008) incarn=1428496145 au=3 blk=87 count=1
Errors in file /u01/app/grid/diag/asm/+asm/+ASM4/trace/+ASM4_ora_4915366.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
NOTE: a corrupted block from group DATA was dumped to /u01/app/grid/diag/asm/+asm/+ASM4/trace/+ASM4_ora_4915366.trc
WARNING: cache read (retry) a corrupt block: 
 group=2(DATA) fn=1 blk=6743 disk=8 (DATA_0008) incarn=1428496145 au=3 blk=87 count=1
Errors in file /u01/app/grid/diag/asm/+asm/+ASM4/trace/+ASM4_ora_4915366.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
ERROR: cache failed to read group=2(DATA) fn=1 blk=6743 from disk(s): 8(DATA_0008)
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
NOTE: cache initiating offline of disk 8 group DATA
NOTE: process _user4915366_+asm4 (4915366) initiating offline of disk 8.1428496145 (DATA_0008) with mask 0x7e in group 2
NOTE: initiating PST update: grp = 2, dsk = 8/0x55251f11, mask = 0x6a, op = clear
Thu Jan 04 21:44:55 2024
GMON updating disk modes for group 2 at 9 for pid 24, osid 4915366
ERROR: Disk 8 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 2)
Thu Jan 04 21:44:55 2024
NOTE: cache dismounting (not clean) group 2/0x7F35EE0E (DATA)
WARNING: Offline for disk DATA_0008 in mode 0x7f failed.
Thu Jan 04 21:44:55 2024
NOTE: halting all I/Os to diskgroup 2 (DATA)
NOTE: messaging CKPT to quiesce pins Unix process pid: 3473846, image: oracle@zzzx1 (B000)
Errors in file /u01/app/grid/diag/asm/+asm/+ASM4/trace/+ASM4_ora_4915366.trc  (incident=4023553):
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0008" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM4/incident/incdir_4023553/+ASM4_ora_4915366_i4023553.trc
Thu Jan 04 21:44:57 2024
ERROR: ORA-15130 in COD recovery for diskgroup 2/0x7f35ee0e (DATA)
ERROR: ORA-15130 thrown in RBAL for group number 2
Errors in file /u01/app/grid/diag/asm/+asm/+ASM4/trace/+ASM4_rbal_2228716.trc:
ORA-15130: diskgroup "DATA" is being dismounted

尝试重新mount 磁盘组,片刻之后自动dismount

Thu Jan 04 23:10:35 2024
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 2
SUCCESS: diskgroup DATA was mounted
SUCCESS: alter diskgroup data mount
Thu Jan 04 23:10:42 2024
NOTE: diskgroup resource ora.DATA.dg is online
Thu Jan 04 23:10:47 2024
NOTE: client his2:his registered, osid 3998052, mbr 0x1
Thu Jan 04 23:11:00 2024
WARNING: cache read  a corrupt block: group=2(DATA) fn=1 blk=6743 disk=8 (DATA_0008) incarn=1428496181 au=3 blk=87 count=1
Errors in file /u01/app/grid/diag/asm/+asm/+ASM4/trace/+ASM4_ora_4129826.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
NOTE: a corrupted block from group DATA was dumped to /u01/app/grid/diag/asm/+asm/+ASM4/trace/+ASM4_ora_4129826.trc
WARNING: cache read (retry) a corrupt block: 
  group=2(DATA) fn=1 blk=6743 disk=8 (DATA_0008) incarn=1428496181 au=3 blk=87 count=1
Errors in file /u01/app/grid/diag/asm/+asm/+ASM4/trace/+ASM4_ora_4129826.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
ERROR: cache failed to read group=2(DATA) fn=1 blk=6743 from disk(s): 8(DATA_0008)
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [1] [6743] [6999 != 6743]
NOTE: cache initiating offline of disk 8 group DATA
NOTE: process _user4129826_+asm4 (4129826) initiating offline of disk 8.1428496181 (DATA_0008) with mask 0x7e in group 2
NOTE: initiating PST update: grp = 2, dsk = 8/0x55251f35, mask = 0x6a, op = clear
Thu Jan 04 23:11:01 2024
GMON updating disk modes for group 2 at 21 for pid 35, osid 4129826
ERROR: Disk 8 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 2)
Thu Jan 04 23:11:01 2024
NOTE: cache dismounting (not clean) group 2/0x1CB5EE3B (DATA)
WARNING: Offline for disk DATA_0008 in mode 0x7f failed.
NOTE: messaging CKPT to quiesce pins Unix process pid: 5112822, image: oracle@zzzx1 (B000)

从报错信息看是DATA_0008磁盘的au 3 blkn 87的block异常,应该是block 6743被写成了6999导致了该问题

kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            4 ; 0x002: KFBTYP_FILEDIR
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                    6999 ; 0x004: blk=6999
kfbh.block.obj:                       1 ; 0x008: file=1
kfbh.check:                  3317183844 ; 0x00c: 0xc5b83564
kfbh.fcn.base:                165670551 ; 0x010: 0x09dfee97
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfffdb.node.incarn:          1145623147 ; 0x000: A=1 NUMM=0x22246935
kfffdb.node.frlist.number:   4294967295 ; 0x004: 0xffffffff
kfffdb.node.frlist.incarn:            0 ; 0x008: A=0 NUMM=0x0
kfffdb.hibytes:                       0 ; 0x00c: 0x00000000
kfffdb.lobytes:                83482624 ; 0x010: 0x04f9d800

这个处理比较简单吧

kfbh.block.blk:                    6999 ; 0x004: blk=6999
修改为
kfbh.block.blk:                    6743; 0x004: blk=6743

然后kefd merge并且尝试mount磁盘组
20240105202425


通过检查确认磁盘组不再dismount,但是由于后续元数据还有问题,导致asm无法创建新的文件,后续建议:在数据库在mount状态下,rman备份,重建该磁盘组

ORA-15335 ORA-15130 ORA-15066 ORA-15196

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-15335 ORA-15130 ORA-15066 ORA-15196

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户反馈,数据库无法正常启动,通过分析asm的alert日志发现,data磁盘组mount成功之后,没有一会儿自动dismount掉

Mon Sep 26 16:40:14 2022
SQL> /* ASMCMD */ALTER DISKGROUP data MOUNT  
NOTE: cache registered group DATA number=2 incarn=0x9dfa705f
NOTE: cache began mount (first) of group DATA number=2 incarn=0x9dfa705f
NOTE: Assigning number (2,1) to disk (/dev/oracleasm/disks/DATA02)
NOTE: Assigning number (2,0) to disk (/dev/oracleasm/disks/DATA01)
Mon Sep 26 16:40:20 2022
NOTE: GMON heartbeating for grp 2
GMON querying group 2 at 68 for pid 25, osid 14650
NOTE: cache opening disk 0 of grp 2: DATA_0000 path:/dev/oracleasm/disks/DATA01
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 of grp 2: DATA_0001 path:/dev/oracleasm/disks/DATA02
NOTE: cache mounting (first) external redundancy group 2/0x9DFA705F (DATA)
Mon Sep 26 16:40:20 2022
* allocate domain 2, invalid = TRUE 
kjbdomatt send to inst 2
Mon Sep 26 16:40:20 2022
NOTE: attached to recovery domain 2
NOTE: cache recovered group 2 to fcn 0.321845
NOTE: redo buffer size is 256 blocks (1053184 bytes)
Mon Sep 26 16:40:20 2022
NOTE: LGWR attempting to mount thread 1 for diskgroup 2 (DATA)
NOTE: LGWR found thread 1 closed at ABA 20.3546
NOTE: LGWR mounted thread 1 for diskgroup 2 (DATA)
NOTE: LGWR opening thread 1 at fcn 0.321845 ABA 21.3547
NOTE: cache mounting group 2/0x9DFA705F (DATA) succeeded
NOTE: cache ending mount (success) of group DATA number=2 incarn=0x9dfa705f
Mon Sep 26 16:40:20 2022
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 2
SUCCESS: diskgroup DATA was mounted
SUCCESS: /* ASMCMD */ALTER DISKGROUP data MOUNT 
Mon Sep 26 16:40:22 2022
WARNING: failed to online diskgroup resource ora.DATA.dg (unable to communicate with CRSD/OHASD)
Mon Sep 26 16:40:47 2022
NOTE: client xff1:xff registered, osid 14742, mbr 0x0
Mon Sep 26 16:40:57 2022
WARNING: cache read  a corrupt block: group=2(DATA) dsk=1 blk=257 disk=1 (DATA_0001) 
incarn=3916071178 au=113792 blk=1 count=1
Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
NOTE: a corrupted block from group DATA was dumped to /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trc
WARNING: cache read (retry) a corrupt block: group=2(DATA) dsk=1 blk=257 
disk=1 (DATA_0001) incarn=3916071178 au=113792 blk=1 count=1
Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ERROR: cache failed to read group=2(DATA) dsk=1 blk=257 from disk(s): 1(DATA_0001)
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
NOTE: cache initiating offline of disk 1 group DATA
NOTE: process _user14778_+asm1 (14778) initiating offline of 
disk 1.3916071178 (DATA_0001) with mask 0x7e in group 2
NOTE: initiating PST update: grp = 2, dsk = 1/0xe96a810a, mask = 0x6a, op = clear
Mon Sep 26 16:40:58 2022
GMON updating disk modes for group 2 at 70 for pid 28, osid 14778
ERROR: Disk 1 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 2)
Mon Sep 26 16:40:58 2022
NOTE: cache dismounting (not clean) group 2/0x9DFA705F (DATA) 
WARNING: Offline for disk DATA_0001 in mode 0x7f failed.
NOTE: messaging CKPT to quiesce pins Unix process pid: 14782, image: oracle@oracle11grac1 (B000)
Mon Sep 26 16:40:58 2022
NOTE: halting all I/Os to diskgroup 2 (DATA)
Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trc  (incident=144548):
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0001" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
Incident details in: /opt/grid/diag/asm/+asm/+ASM1/incident/incdir_144548/+ASM1_ora_14778_i144548.trc
Mon Sep 26 16:40:58 2022
Sweep [inc][144548]: completed
System State dumped to trace file /opt/grid/diag/asm/+asm/+ASM1/incident/incdir_144548/+ASM1_ora_14778_i144548.trc
Mon Sep 26 16:40:58 2022
NOTE: AMDU dump of disk group DATA created at /opt/grid/diag/asm/+asm/+ASM1/incident/incdir_144548
Mon Sep 26 16:41:00 2022
NOTE: LGWR doing non-clean dismount of group 2 (DATA)
NOTE: LGWR sync ABA=21.3550 last written ABA 21.3550
Mon Sep 26 16:41:00 2022
Sweep [inc2][144548]: completed
Mon Sep 26 16:41:00 2022
ERROR: ORA-15130 in COD recovery for diskgroup 2/0x9dfa705f (DATA)
ERROR: ORA-15130 thrown in RBAL for group number 2
Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_5162.trc:
ORA-15130: diskgroup "DATA" is being dismounted

这里看主要是由于asm 磁盘组需要做COD recovery导致无法正常稳定的mount,主要原因是遭遇到asm disk的逻辑坏块(存储物理上看是ok的,但是实际数据在asm中看是异常的)

数据库alert日志报错

Mon Sep 26 16:40:52 2022
Successful mount of redo thread 1, with mount id 1097279951
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Lost write protection disabled
Completed: alter database mount
alter database open
This instance was first to open
Picked broadcast on commit scheme to generate SCNs
LGWR: STARTING ARCH PROCESSES
Mon Sep 26 16:40:56 2022
ARC0 started with pid=40, OS id=14761 
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Mon Sep 26 16:40:57 2022
ARC1 started with pid=41, OS id=14764 
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_lgwr_14479.trc:
ORA-00313: ??????? 1 (???? 1) ???
Mon Sep 26 16:40:57 2022
ARC2 started with pid=42, OS id=14766 
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_lgwr_14479.trc:
ORA-00313: ??????? 2 (???? 1) ???
Mon Sep 26 16:40:57 2022
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-00313: open failed for members of log group 1 of thread 1
Mon Sep 26 16:40:57 2022
ARC3 started with pid=44, OS id=14770 
ARC1: Archival started
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-00313: open failed for members of log group 1 of thread 1
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc2_14766.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc1_14764.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc  (incident=180281):
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0001" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc0_14761.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215
ORA-15130: diskgroup "DATA" is being dismounted
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc3_14770.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215
ORA-15130: diskgroup "DATA" is being dismounted
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc0_14761.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215
ORA-15130: diskgroup "DATA" is being dismounted
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc3_14770.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215
ORA-15130: diskgroup "DATA" is being dismounted
Unable to create archive log file '+DATA'
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-19816: WARNING: Files may exist in db_recovery_file_dest that are not known to database.
ORA-17502: ksfdcre:4 Failed to create file +DATA
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0001" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
*************************************************************
WARNING: A file of type ARCHIVED LOG may exist in
db_recovery_file_dest that is not known to the database.
Use the RMAN command CATALOG RECOVERY AREA to re-catalog
any such files. If files cannot be cataloged, then manually
delete them using OS command. This is most likely the
result of a crash during file creation.
*************************************************************
ARCH: Error 19504 Creating archive log file to '+DATA'
NOTE: Deferred communication with ASM instance
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-15130: diskgroup "DATA" is being dismounted
NOTE: deferred map free for map id 23
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-16038: log 1 sequence# 14235 cannot be archived
ORA-19504: failed to create file ""
ORA-00312: online log 1 thread 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-00312: online log 1 thread 1: '+ARCH/xff/onlinelog/group_1.279.1025610217'
Mon Sep 26 16:40:58 2022
Sweep [inc][180281]: completed
Sweep [inc2][180281]: completed
USER (ospid: 14732): terminating the instance due to error 16038
Mon Sep 26 16:40:59 2022
System state dump requested by (instance=1, osid=14732), summary=[abnormal instance termination].
Instance terminated by USER, pid = 14732

对于这类故障处理相对比较容易,通过patch asm,让data磁盘组稳定mount,然后open库,迁移数据,实现数据0丢失,完美恢复

ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh]故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh]故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户对asm进行扩容,由于配置不恰当,在使用asmca增加asm disk的时候直接选中了已经被用作文件系统的vg中的磁盘

Tue Nov 19 09:48:48 2019
Non critical error ORA-48180 cFri Nov 22 12:47:48 2019
SQL> ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk29' SIZE 491520M ,
'/dev/rhdisk30' SIZE 491520M ,
'/dev/rhdisk31' SIZE 491520M /* ASMCA */
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (4,15) to disk (/dev/rhdisk29)
NOTE: Assigning number (4,16) to disk (/dev/rhdisk30)
NOTE: Assigning number (4,17) to disk (/dev/rhdisk31)
NOTE: requesting all-instance membership refresh for group=4
NOTE: initializing header on grp 4 disk XIFENFEI_0015
NOTE: initializing header on grp 4 disk XIFENFEI_0016
NOTE: initializing header on grp 4 disk XIFENFEI_0017
NOTE: requesting all-instance disk validation for group=4
Fri Nov 22 12:47:51 2019
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
NOTE: requesting all-instance disk validation for group=4
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
Fri Nov 22 12:47:59 2019
NOTE: initiating PST update: grp = 4
Fri Nov 22 12:47:59 2019
GMON updating group 4 at 12 for pid 27, osid 12649908
NOTE: PST update grp = 4 completed successfully
NOTE: membership refresh pending for group 4/0xb08c40b (XIFENFEI)
GMON querying group 4 at 13 for pid 18, osid 39912680
Fri Nov 22 12:48:01 2019
NOTE: cache opening disk 15 of grp 4: XIFENFEI_0015 path:/dev/rhdisk29
NOTE: cache opening disk 16 of grp 4: XIFENFEI_0016 path:/dev/rhdisk30
NOTE: cache opening disk 17 of grp 4: XIFENFEI_0017 path:/dev/rhdisk31
NOTE: Attempting voting file refresh on diskgroup XIFENFEI
NOTE: Refresh completed on diskgroup XIFENFEI. No voting file found.
GMON querying group 4 at 14 for pid 18, osid 39912680
SUCCESS: refreshed membership for 4/0xb08c40b (XIFENFEI)
SUCCESS: ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk29' SIZE 491520M ,
'/dev/rhdisk30' SIZE 491520M ,
'/dev/rhdisk31' SIZE 491520M /* ASMCA */

发现增加错磁盘之后,从vg里面强制踢掉被asm使用的磁盘,并且尝试在asm中删除这些磁盘,并加入新磁盘

Fri Nov 22 12:52:03 2019
SQL> ALTER DISKGROUP XIFENFEI DROP  DISK 'XIFENFEI_0015','XIFENFEI_0016','XIFENFEI_0017' /* ASMCA */
NOTE: GroupBlock outside rolling migration privileged region
Fri Nov 22 12:52:03 2019
NOTE: stopping process ARB0
NOTE: rebalance interrupted for group 4/0xb08c40b (XIFENFEI)
NOTE: requesting all-instance membership refresh for group=4
NOTE: membership refresh pending for group 4/0xb08c40b (XIFENFEI)
Fri Nov 22 12:52:12 2019
GMON querying group 4 at 15 for pid 18, osid 39912680
SUCCESS: refreshed membership for 4/0xb08c40b (XIFENFEI)
SUCCESS: ALTER DISKGROUP XIFENFEI DROP  DISK 'XIFENFEI_0015','XIFENFEI_0016','XIFENFEI_0017' /* ASMCA */
NOTE: starting rebalance of group 4/0xb08c40b (XIFENFEI) at power 1
Starting background process ARB0
…………
Fri Nov 22 12:58:26 2019
SQL> ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk7' SIZE 491520M /* ASMCA */
NOTE: GroupBlock outside rolling migration privileged region
Fri Nov 22 12:58:26 2019
NOTE: stopping process ARB0
NOTE: rebalance interrupted for group 4/0xb08c40b (XIFENFEI)
NOTE: ASM did background COD recovery for group 4/0xb08c40b (XIFENFEI)
NOTE: Assigning number (4,18) to disk (/dev/rhdisk7)
NOTE: requesting all-instance membership refresh for group=4
NOTE: initializing header on grp 4 disk XIFENFEI_0018
NOTE: requesting all-instance disk validation for group=4
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
NOTE: requesting all-instance disk validation for group=4
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
Fri Nov 22 12:58:41 2019
NOTE: initiating PST update: grp = 4
Fri Nov 22 12:58:41 2019
GMON updating group 4 at 16 for pid 27, osid 12649908
NOTE: PST update grp = 4 completed successfully
Fri Nov 22 12:58:41 2019
NOTE: membership refresh pending for group 4/0xb08c40b (XIFENFEI)
GMON querying group 4 at 17 for pid 18, osid 39912680
NOTE: cache opening disk 18 of grp 4: XIFENFEI_0018 path:/dev/rhdisk7
NOTE: Attempting voting file refresh on diskgroup XIFENFEI
NOTE: Refresh completed on diskgroup XIFENFEI. No voting file found.
GMON querying group 4 at 18 for pid 18, osid 39912680
SUCCESS: refreshed membership for 4/0xb08c40b (XIFENFEI)
NOTE: starting rebalance of group 4/0xb08c40b (XIFENFEI) at power 1
SUCCESS: ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk7' SIZE 491520M /* ASMCA */
Starting background process ARB0
Fri Nov 22 12:58:46 2019
ARB0 started with pid=44, OS id=54460432
…………
Fri Nov 22 12:59:57 2019
SQL> ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk10' SIZE 491520M ,
'/dev/rhdisk11' SIZE 491520M ,
'/dev/rhdisk8' SIZE 491520M ,
'/dev/rhdisk9' SIZE 491520M /* ASMCA */
NOTE: GroupBlock outside rolling migration privileged region
Fri Nov 22 12:59:57 2019
NOTE: stopping process ARB0
NOTE: rebalance interrupted for group 4/0xb08c40b (XIFENFEI)
NOTE: ASM did background COD recovery for group 4/0xb08c40b (XIFENFEI)
NOTE: Assigning number (4,19) to disk (/dev/rhdisk10)
NOTE: Assigning number (4,20) to disk (/dev/rhdisk11)
NOTE: Assigning number (4,21) to disk (/dev/rhdisk8)
NOTE: Assigning number (4,22) to disk (/dev/rhdisk9)
NOTE: requesting all-instance membership refresh for group=4
NOTE: initializing header on grp 4 disk XIFENFEI_0019
NOTE: initializing header on grp 4 disk XIFENFEI_0020
NOTE: initializing header on grp 4 disk XIFENFEI_0021
NOTE: initializing header on grp 4 disk XIFENFEI_0022
NOTE: requesting all-instance disk validation for group=4
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
Fri Nov 22 13:00:08 2019
NOTE: requesting all-instance disk validation for group=4
Fri Nov 22 13:00:08 2019
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
NOTE: initiating PST update: grp = 4
Fri Nov 22 13:00:13 2019
GMON updating group 4 at 19 for pid 27, osid 12649908
NOTE: PST update grp = 4 completed successfully
NOTE: membership refresh pending for group 4/0xb08c40b (XIFENFEI)
GMON querying group 4 at 20 for pid 18, osid 39912680
NOTE: cache opening disk 19 of grp 4: XIFENFEI_0019 path:/dev/rhdisk10
NOTE: cache opening disk 20 of grp 4: XIFENFEI_0020 path:/dev/rhdisk11
NOTE: cache opening disk 21 of grp 4: XIFENFEI_0021 path:/dev/rhdisk8
NOTE: cache opening disk 22 of grp 4: XIFENFEI_0022 path:/dev/rhdisk9
NOTE: Attempting voting file refresh on diskgroup XIFENFEI
NOTE: Refresh completed on diskgroup XIFENFEI. No voting file found.
GMON querying group 4 at 21 for pid 18, osid 39912680
SUCCESS: refreshed membership for 4/0xb08c40b (XIFENFEI)
SUCCESS: ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk10' SIZE 491520M ,
'/dev/rhdisk11' SIZE 491520M ,
'/dev/rhdisk8' SIZE 491520M ,
'/dev/rhdisk9' SIZE 491520M /* ASMCA */
NOTE: starting rebalance of group 4/0xb08c40b (XIFENFEI) at power 1
Starting background process ARB0

asm在做着reblance的过程中遭遇到坏块,直接导致磁盘组dismount

Sun Nov 24 04:42:27 2019
NOTE: group 4 PST updated.
WARNING: cache read  a corrupt block: group=4(XIFENFEI) dsk=15 blk=258 disk=15
 (XIFENFEI_0015) incarn=1717056824 au=113792 blk=2 count=254
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_x000_28639240.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483663] [258] [56 != 0]
NOTE: a corrupted block from group XIFENFEI was dumped to
   /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_x000_28639240.trc
WARNING: cache read (retry) a corrupt block: group=4(XIFENFEI)
 dsk=15 blk=258 disk=15 (XIFENFEI_0015) incarn=1717056824 au=113792 blk=2 count=1
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_x000_28639240.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483663] [258] [56 != 0]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483663] [258] [56 != 0]
ERROR: cache failed to read group=4(XIFENFEI) dsk=15 blk=258 from disk(s): 15(XIFENFEI_0015)
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483663] [258] [56 != 0]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483663] [258] [56 != 0]
NOTE: cache initiating offline of disk 15 group XIFENFEI
NOTE: process _x000_+asm2 (28639240) initiating offline of disk
    15.1717056824 (XIFENFEI_0015) with mask 0x7e in group 4
NOTE: initiating PST update: grp = 4, dsk = 15/0x66583538, mask = 0x6a, op = clear
GMON updating disk modes for group 4 at 23 for pid 28, osid 28639240
ERROR: Disk 15 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 4)
Sun Nov 24 04:42:27 2019
NOTE: cache dismounting (not clean) group 4/0x0B08C40B (XIFENFEI)
WARNING: Offline for disk XIFENFEI_0015 in mode 0x7f failed.
Sun Nov 24 04:42:27 2019
NOTE: halting all I/Os to diskgroup 4 (XIFENFEI)
NOTE: messaging CKPT to quiesce pins Unix process pid: 59441780, image: oracle@xifenfei2 (B000)
Sun Nov 24 04:42:27 2019
ERROR: ORA-15130 thrown in ARB0 for group number 4
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_arb0_50856926.trc:
ORA-15130: diskgroup "XIFENFEI" is being dismounted

至此两个节点的该磁盘组就陷入了不停的mount,然后dismount的轮流循环中.这里我们可以大概的分析出来,由于vg的磁盘组被写入了数据或者强制剔除的时候导致asm写入该文件的数据被破坏,导致后续的asm reblance遭遇坏块,然后直接dismount.对于该问题的解决方案,通过对对该磁盘组的acd和cod进行patch,让其不进行reblance,保持该磁盘组现在,稳定的mount状态,然后对其数据进行备份和重建该磁盘组.这个客户运气不错,vg中的asm disk磁盘写入较少,数据库运行正常.
对于这种情况,如果发生极端损坏,比如asm磁盘组无法mount,可以参考:找回ASM中数据文件
如果是asm的元数据大量损坏,无法通过asm字典级别恢复,可以通过参考:asm disk header 彻底损坏恢复

asm 加磁盘导致磁盘组损坏恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:asm 加磁盘导致磁盘组损坏恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到客户恢复case请求,希望我们接入恢复数据。大概过程是这样的,16年9月份由于硬件问题,导致normal磁盘组(只有2个磁盘)中的一个磁盘丢失,然后在17年3月6日,运维方尝试增加该磁盘进入磁盘组,结果通过force命令加入成功之后,磁盘组dismount,然后再也无法mount成功。
磁盘组创建信息

Fri Jun 24 19:31:38 2016
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 2
SUCCESS: diskgroup DATADG was mounted
SUCCESS: CREATE DISKGROUP DATADG NORMAL REDUNDANCY  DISK '/dev/asm-diskdata01' SIZE 1048576M ,
'/dev/asm-diskdata02' SIZE 1048576M  ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='4M' /* ASMCA */

这里可以看出来datadg是一个normal的au为4M的一个磁盘组

自动drop异常asm disk

Mon Sep 12 11:41:54 2016
WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.
Mon Sep 12 11:41:55 2016
NOTE: process _b000_+asm1 (19491) initiating offline of disk 1.3915923833 (DATADG_0001) with mask 0x7e in group 1
NOTE: checking PST: grp = 1
GMON checking disk modes for group 1 at 9 for pid 29, osid 19491
NOTE: group DATADG: updated PST location: disk 0000 (PST copy 0)
NOTE: checking PST for grp 1 done.
NOTE: sending set offline flag message 2870990318 to 1 disk(s) in group 1
WARNING: Disk DATADG_0001 in mode 0x7f is now being offlined
NOTE: initiating PST update: grp = 1, dsk = 1/0xe9684179, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 10 for pid 29, osid 19491
NOTE: group DATADG: updated PST location: disk 0000 (PST copy 0)
NOTE: group DATADG: updated PST location: disk 0000 (PST copy 0)
NOTE: PST update grp = 1 completed successfully
NOTE: initiating PST update: grp = 1, dsk = 1/0xe9684179, mask = 0x7e, op = clear
GMON updating disk modes for group 1 at 11 for pid 29, osid 19491
NOTE: group DATADG: updated PST location: disk 0000 (PST copy 0)
NOTE: group DATADG: updated PST location: disk 0000 (PST copy 0)
NOTE: cache closing disk 1 of grp 1: DATADG_0001
NOTE: PST update grp = 1 completed successfully
Mon Sep 12 11:42:55 2016
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
Mon Sep 12 11:44:58 2016
WARNING: PST-initiated drop of 1 disk(s) in group 1(.1137226115))
SQL> alter diskgroup DATADG drop disk DATADG_0001 force /* ASM SERVER */
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=1
Mon Sep 12 11:44:59 2016
GMON updating for reconfiguration, group 1 at 12 for pid 29, osid 19491
NOTE: cache closing disk 1 of grp 1: (not open) DATADG_0001
NOTE: group DATADG: updated PST location: disk 0000 (PST copy 0)
NOTE: group 1 PST updated.
Mon Sep 12 11:44:59 2016
NOTE: membership refresh pending for group 1/0x43c8b183 (DATADG)
Mon Sep 12 11:45:02 2016
NOTE: successfully read ACD block gn=1 blk=0 via retry read
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lgwr_3526.trc:
ORA-15062: ASM disk is globally closed
GMON querying group 1 at 13 for pid 18, osid 3532
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
SUCCESS: refreshed membership for 1/0x43c8b183 (DATADG)
SUCCESS: alter diskgroup DATADG drop disk DATADG_0001 force /* ASM SERVER */
NOTE: starting rebalance of group 1/0x43c8b183 (DATADG) at power 1
SUCCESS: PST-initiated drop disk in group 1(1137226115))
Starting background process ARB0
Mon Sep 12 11:45:03 2016
ARB0 started with pid=35, OS id=19945
NOTE: assigning ARB0 to group 1/0x43c8b183 (DATADG) with 1 parallel I/O
cellip.ora not found.
NOTE: Rebalance has restored redundancy for any existing control file or redo log in disk group DATADG
NOTE: Attempting voting file refresh on diskgroup DATADG
NOTE: Refresh completed on diskgroup DATADG. No voting file found.
Mon Sep 12 11:46:21 2016
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=1
Mon Sep 12 11:46:24 2016
GMON updating for reconfiguration, group 1 at 14 for pid 36, osid 20110
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
NOTE: group 1 PST updated.
WARNING: offline disk number 1 has references (54679 AUs)
Mon Sep 12 11:46:24 2016
NOTE: membership refresh pending for group 1/0x43c8b183 (DATADG)
GMON querying group 1 at 15 for pid 18, osid 3532
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
SUCCESS: refreshed membership for 1/0x43c8b183 (DATADG)
NOTE: Attempting voting file refresh on diskgroup DATADG
NOTE: Refresh completed on diskgroup DATADG. No voting file found.
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 1/0x43c8b183 (DATADG)

这里我们可以看出来磁盘组在2016年9月12日由于disk 1 无法响应,直接被asm 踢出了磁盘组

把被强制删除的磁盘重新加回去

Mon Mar 06 15:36:54 2017
SQL> alter diskgroup DATADG add disk '/dev/asm-diskdata01' name DATADG_0000
NOTE: GroupBlock outside rolling migration privileged region
ORA-15032: not all alterations performed
ORA-15029: disk '/dev/asm-diskdata01' is already mounted by this instance
ERROR: alter diskgroup DATADG add disk '/dev/asm-diskdata01' name DATADG_0000
Mon Mar 06 15:38:27 2017
SQL>  alter diskgroup DATADG add disk '/dev/asm-diskdata02' name DATADG_0001
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (1,2) to disk (/dev/asm-diskdata02)
NOTE: requesting all-instance membership refresh for group=1
NOTE: Disk DATADG_0001 in mode 0x7f marked for de-assignment
ERROR: ORA-15033 signalled during reconfiguration of diskgroup DATADG
Mon Mar 06 15:38:28 2017
NOTE: membership refresh pending for group 1/0x31584f6b (DATADG)
Mon Mar 06 15:38:31 2017
GMON querying group 1 at 7 for pid 18, osid 3468
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
GMON querying group 1 at 8 for pid 18, osid 3468
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
SUCCESS: refreshed membership for 1/0x31584f6b (DATADG)
ORA-15032: not all alterations performed
ORA-15033: disk '/dev/asm-diskdata02' belongs to diskgroup "DATADG"
ERROR:  alter diskgroup DATADG add disk '/dev/asm-diskdata02' name DATADG_0001
NOTE: Attempting voting file refresh on diskgroup DATADG
NOTE: Refresh completed on diskgroup DATADG. No voting file found.
Mon Mar 06 16:04:14 2017
SQL> alter diskgroup DATADG add disk '/dev/asm-diskdata02' name DATADG_0001
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (1,2) to disk (/dev/asm-diskdata02)
NOTE: requesting all-instance membership refresh for group=1
NOTE: Disk DATADG_0001 in mode 0x7f marked for de-assignment
ERROR: ORA-15033 signalled during reconfiguration of diskgroup DATADG
Mon Mar 06 16:04:15 2017
NOTE: membership refresh pending for group 1/0x31584f6b (DATADG)
Mon Mar 06 16:04:18 2017
GMON querying group 1 at 9 for pid 18, osid 3468
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
GMON querying group 1 at 10 for pid 18, osid 3468
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
SUCCESS: refreshed membership for 1/0x31584f6b (DATADG)
ORA-15032: not all alterations performed
ORA-15033: disk '/dev/asm-diskdata02' belongs to diskgroup "DATADG"
ERROR: alter diskgroup DATADG add disk '/dev/asm-diskdata02' name DATADG_0001
NOTE: Attempting voting file refresh on diskgroup DATADG
NOTE: Refresh completed on diskgroup DATADG. No voting file found.
Mon Mar 06 16:23:28 2017
SQL> alter diskgroup DATADG add FAILGROUP DATA_0001 disk '/dev/adm-diskdata02' name DATA_0001
NOTE: GroupBlock outside rolling migration privileged region
ORA-15032: not all alterations performed
ORA-15031: disk specification '/dev/adm-diskdata02' matches no disks
ERROR: alter diskgroup DATADG add FAILGROUP DATA_0001 disk '/dev/adm-diskdata02' name DATA_0001
Mon Mar 06 16:24:48 2017
SQL> alter diskgroup DATADG add FAILGROUP DATA_0001 disk '/dev/asm-diskdata02' name DATA_0001
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (1,2) to disk (/dev/asm-diskdata02)
NOTE: requesting all-instance membership refresh for group=1
NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
ERROR: ORA-15033 signalled during reconfiguration of diskgroup DATADG
Mon Mar 06 16:24:49 2017
NOTE: membership refresh pending for group 1/0x31584f6b (DATADG)
Mon Mar 06 16:24:52 2017
GMON querying group 1 at 11 for pid 18, osid 3468
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
GMON querying group 1 at 12 for pid 18, osid 3468
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
SUCCESS: refreshed membership for 1/0x31584f6b (DATADG)
ORA-15032: not all alterations performed
ORA-15033: disk '/dev/asm-diskdata02' belongs to diskgroup "DATADG"
ERROR: alter diskgroup DATADG add FAILGROUP DATA_0001 disk '/dev/asm-diskdata02' name DATA_0001
NOTE: Attempting voting file refresh on diskgroup DATADG
NOTE: Refresh completed on diskgroup DATADG. No voting file found.
Mon Mar 06 16:26:07 2017
SQL> alter diskgroup DATADG add FAILGROUP DATA_0001 disk '/dev/asm-diskdata02' name DATA_0001 force
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (1,2) to disk (/dev/asm-diskdata02)
NOTE: requesting all-instance membership refresh for group=1
NOTE: initializing header on grp 1 disk DATA_0001
NOTE: requesting all-instance disk validation for group=1
Mon Mar 06 16:26:10 2017
NOTE: skipping rediscovery for group 1/0x31584f6b (DATADG) on local instance.
NOTE: requesting all-instance disk validation for group=1
NOTE: skipping rediscovery for group 1/0x31584f6b (DATADG) on local instance.
Mon Mar 06 16:26:15 2017
GMON updating for reconfiguration, group 1 at 13 for pid 28, osid 12861
NOTE: group 1 PST updated.
NOTE: initiating PST update: grp = 1
GMON updating group 1 at 14 for pid 28, osid 12861
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
NOTE: group DATADG: updated PST location: disk 0000 (PST copy 0)
NOTE: group DATADG: updated PST location: disk 0002 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: membership refresh pending for group 1/0x31584f6b (DATADG)
GMON querying group 1 at 15 for pid 18, osid 3468
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
NOTE: cache opening disk 2 of grp 1: DATA_0001 path:/dev/asm-diskdata02
NOTE: Attempting voting file refresh on diskgroup DATADG
NOTE: Refresh completed on diskgroup DATADG. No voting file found.
GMON querying group 1 at 16 for pid 18, osid 3468
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
SUCCESS: refreshed membership for 1/0x31584f6b (DATADG)
Mon Mar 06 16:26:19 2017
SUCCESS: alter diskgroup DATADG add FAILGROUP DATA_0001 disk '/dev/asm-diskdata02' name DATA_0001 force
NOTE: starting rebalance of group 1/0x31584f6b (DATADG) at power 1
Mon Mar 06 16:26:20 2017
Starting background process ARB0
Mon Mar 06 16:26:20 2017
ARB0 started with pid=32, OS id=25833
NOTE: assigning ARB0 to group 1/0x31584f6b (DATADG) with 1 parallel I/O
cellip.ora not found.
WARNING:cache read  a corrupt block: group=1(DATADG) dsk=0 blk=0 disk=0
        (DATADG_0000)incarn=3915956130 au=0 blk=0 count=1
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_25833.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483648] [0] [0 != 1]
NOTE:a corrupted block from group DATADG was dumped to /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_25833.trc
WARNING:cache read (retry)a corrupt block:group=1(DATADG)
         dsk=0 blk=0 disk=0(DATADG_0000)incarn=3915956130 au=0 blk=0 count=1
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_25833.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483648] [0] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483648] [0] [0 != 1]
ERROR: cache failed to read group=1(DATADG) dsk=0 blk=0 from disk(s): 0(DATADG_0000)
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483648] [0] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483648] [0] [0 != 1]
NOTE: cache initiating offline of disk 0 group DATADG
NOTE:process _arb0_+asm1 (25833) initiating offline of disk 0.3915956130(DATADG_0000)with mask 0x7e in group 1
NOTE: checking PST: grp = 1
GMON checking disk modes for group 1 at 17 for pid 32, osid 25833
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
ERROR: too many offline disks in PST (grp 1)
NOTE: checking PST for grp 1 done.
NOTE: initiating PST update: grp = 1, dsk = 0/0xe968bfa2, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 18 for pid 32, osid 25833
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
ERROR: Disk 0 cannot be offlined, since all the disks [0, 1] with mirrored data would be offline.
ERROR: too many offline disks in PST (grp 1)
Mon Mar 06 16:26:23 2017
NOTE: cache dismounting (not clean) group 1/0x31584F6B (DATADG)
NOTE: messaging CKPT to quiesce pins Unix process pid: 25889, image: oracle@DBN01 (B000)
Mon Mar 06 16:26:23 2017
NOTE: halting all I/Os to diskgroup 1 (DATADG)
Mon Mar 06 16:26:23 2017
NOTE: LGWR doing non-clean dismount of group 1 (DATADG)
NOTE: LGWR sync ABA=19.2851 last written ABA 19.2851
WARNING: Offline for disk DATADG_0000 in mode 0x7f failed.
Mon Mar 06 16:26:23 2017
kjbdomdet send to inst 2
detach from dom 1, sending detach message to inst 2
Mon Mar 06 16:26:23 2017
List of instances:
 1 2
Dirty detach reconfiguration started (new ddet inc 1, cluster inc 8)
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_25833.trc  (incident=65537):
ORA-15335: ASM metadata corruption detected in disk group 'DATADG'
ORA-15130: diskgroup "DATADG" is being dismounted
ORA-15066: offlining disk "DATADG_0000" in group "DATADG" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483648] [0] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483648] [0] [0 != 1]
Incident details in:/u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_65537/+ASM1_arb0_25833_i65537.trc
 Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 1 invalid = TRUE
 3189 GCS resources traversed, 0 cancelled
Dirty Detach Reconfiguration complete
ERROR: ORA-15130 in COD recovery for diskgroup 1/0x31584f6b (DATADG)
ERROR: ORA-15130 thrown in RBAL for group number 1
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_3468.trc:
ORA-15130: diskgroup "DATADG" is being dismounted
Mon Mar 06 16:26:23 2017
WARNING: dirty detached from domain 1
NOTE: cache dismounted group 1/0x31584F6B (DATADG)
---后续mount报错
SQL> ALTER DISKGROUP DATADG MOUNT  /* asm agent *//* {1:18003:2} */
NOTE: cache registered group DATADG number=1 incarn=0xb368408f
NOTE: cache began mount (first) of group DATADG number=1 incarn=0xb368408f
NOTE: Assigning number (1,2) to disk (/dev/asm-diskdata02)
WARNING:GMON has insufficient disks to maintain consensus.
Minimum required is 2:updating 1 PST copies from a total of 2.
ERROR: GMON failed to obtain a quorum ofsupporting disks in group 1
NOTE: cache dismounting (clean) group 1/0xB368408F (DATADG)
NOTE: messaging CKPT to quiesce pins Unix process pid: 27651, image: oracle@DBN01 (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0xB368408F (DATADG)
NOTE: cache ending mount (fail) of group DATADG number=1 incarn=0xb368408f
NOTE: cache deleting context for group DATADG 1/0xb368408f
GMON dismounting group 1 at 12 for pid 30, osid 27651
NOTE: Disk DATA_0001 in mode 0x9 marked for de-assignment
ERROR: diskgroup DATADG was not mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATADG" cannot be mounted
ORA-15315: Write errors in disk group DATADG could lead to inconsistent ASM metadata.
ERROR: ALTER DISKGROUP DATADG MOUNT  /* asm agent *//* {1:18003:2} */

从这里我们可以看出来,前几次加asm disk 由于各种原因都失败了,最后一次通过加force关键字,使得被自动drop的disk重新强制加到datadg里面.可悲的是在加入成功之后,开始做rebalance的时候,发现disk 0出现坏块,从而引起ORA-15196的错误,使得rebalance无法进行下去,进而整个asm 磁盘组datadg自动dismount.后面再次尝试mount datadg的时候,直接提示元数据库不一致,因为disk 0 的磁盘头已经异常.

通过kfed分析disk 0信息
这里是通过dd命令备份的磁盘头到win进行分析的,以前正常的disk 0的磁盘头损坏(全0)
asm-kfed


对于这个故障已经比较清楚,恢复思路也基本上确定:依次递进
方案1:通过kfed修改文件头,然后尝试mount磁盘头手工修复ASM DISK HEADER 异常
方案2:直接通过amdu,dul之类的工具拷贝出来数据文件找回ASM中数据文件
方案3:通过底层au重组出来数据文件asm disk header 彻底损坏恢复
在我们的实际恢复中运气比较好,通过方案1就完成了恢复工作,通过kfed修复磁盘头之后,然后报错如下

SQL> alter diskgroup DATADG mount
NOTE: cache registered group DATADG number=1 incarn=0x5134d0d4
NOTE: cache began mount (first) of group DATADG number=1 incarn=0x5134d0d4
NOTE: Assigning number (1,2) to disk (/dev/asm-diskdata02)
NOTE: Assigning number (1,0) to disk (/dev/asm-diskdata01)
Tue Mar 07 19:03:40 2017
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 27 for pid 28, osid 13837
NOTE: Assigning number (1,1) to disk ()
GMON querying group 1 at 28 for pid 28, osid 13837
NOTE: cache closing disk 1 of grp 1: (not open)
NOTE: cache opening disk 0 of grp 1: DATADG_0000 path:/dev/asm-diskdata01
NOTE: F1X0 found on disk 0 au 2 fcn 0.178802
NOTE: cache opening disk 2 of grp 1: DATA_0001 path:/dev/asm-diskdata02
NOTE: cache mounting (first) normal redundancy group 1/0x5134D0D4 (DATADG)
Tue Mar 07 19:03:40 2017
* allocate domain 1, invalid = TRUE
kjbdomatt send to inst 2
Tue Mar 07 19:03:40 2017
NOTE: attached to recovery domain 1
NOTE: starting recovery of thread=1 ckpt=19.2851 group=1 (DATADG)
NOTE: starting recovery of thread=2 ckpt=13.5327 group=1 (DATADG)
NOTE: advancing ckpt for group 1 (DATADG) thread=2 ckpt=13.5327
NOTE: advancing ckpt for group 1 (DATADG) thread=1 ckpt=19.2852
NOTE: cache recovered group 1 to fcn 0.365868
NOTE: redo buffer size is 512 blocks (2101760 bytes)
Tue Mar 07 19:03:40 2017
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (DATADG)
NOTE: LGWR found thread 1 closed at ABA 19.2851
NOTE: LGWR mounted thread 1 for diskgroup 1 (DATADG)
NOTE: LGWR opening thread 1 at fcn 0.365868 ABA 20.2852
NOTE: cache mounting group 1/0x5134D0D4 (DATADG) succeeded
NOTE: cache ending mount (success) of group DATADG number=1 incarn=0x5134d0d4
Tue Mar 07 19:03:40 2017
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup DATADG was mounted
SUCCESS: alter diskgroup DATADG mount
Tue Mar 07 19:03:40 2017
NOTE: diskgroup resource ora.DATADG.dg is online
Tue Mar 07 19:03:41 2017
ASM Health Checker found 1 new failures
NOTE: ASM did background COD recovery for group 1/0x5134d0d4 (DATADG)
NOTE: starting rebalance of group 1/0x5134d0d4 (DATADG) at power 1
Starting background process ARB0
Tue Mar 07 19:03:42 2017
ARB0 started with pid=30, OS id=13905
NOTE: assigning ARB0 to group 1/0x5134d0d4 (DATADG) with 1 parallel I/O
cellip.ora not found.
WARNING: cache read  a corrupt block: group=1(DATADG) dsk=0 blk=0 disk=0
         (DATADG_0000) incarn=2202280062 au=0 blk=0 count=1
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_13905.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [2147483648] [0] [1022 != 0]
NOTE: a corrupted block from group DATADG was dumped to /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_13905.trc
WARNING:cache read (retry)a corrupt block:group=1(DATADG) dsk=0 blk=0 disk=0
        (DATADG_0000)incarn=2202280062 au=0 blk=0 count=1
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_13905.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [2147483648] [0] [1022 != 0]
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [2147483648] [0] [1022 != 0]
ERROR: cache failed to read group=1(DATADG) dsk=0 blk=0 from disk(s): 0(DATADG_0000)
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [2147483648] [0] [1022 != 0]
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [2147483648] [0] [1022 != 0]
Tue Mar 07 19:03:52 2017
NOTE: client oradb1:oradb registered, osid 13989, mbr 0x1
NOTE: cache initiating offline of disk 0 group DATADG
NOTE:process _arb0_+asm1 (13905) initiating offline of disk 0.2202280062(DATADG_0000)with mask 0x7e in group 1
NOTE: checking PST: grp = 1
Tue Mar 07 19:03:52 2017
GMON checking disk modes for group 1 at 30 for pid 30, osid 13905
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
ERROR: too many offline disks in PST (grp 1)
NOTE: checking PST for grp 1 done.
NOTE: initiating PST update: grp = 1, dsk = 0/0x8344207e, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 31 for pid 30, osid 13905
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
ERROR: Disk 0 cannot be offlined, since all the disks [0, 1] with mirrored data would be offline.
ERROR: too many offline disks in PST (grp 1)
Tue Mar 07 19:03:52 2017
NOTE: cache dismounting (not clean) group 1/0x5134D0D4 (DATADG)
WARNING: Offline for disk DATADG_0000 in mode 0x7f failed.
Tue Mar 07 19:03:52 2017
NOTE: halting all I/Os to diskgroup 1 (DATADG)
NOTE: messaging CKPT to quiesce pins Unix process pid: 14002, image: oracle@DBN01 (B000)
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_13905.trc  (incident=76402):
ORA-15335: ASM metadata corruption detected in disk group 'DATADG'
ORA-15130: diskgroup "DATADG" is being dismounted
ORA-15066: offlining disk "DATADG_0000" in group "DATADG" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [2147483648] [0] [1022 != 0]
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [2147483648] [0] [1022 != 0]
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_76402/+ASM1_arb0_13905_i76402.trc
Tue Mar 07 19:03:52 2017
NOTE: LGWR doing non-clean dismount of group 1 (DATADG)
NOTE: LGWR sync ABA=20.2857 last written ABA 20.2857

这里比较比较幸运,datadg已经mount成功了,但是由于rab依旧读取到disk header异常信息(没有完全修复成功,而且在日志中不光这个block异常,还有其他block异常,因此不考虑进一步修复),因此直接通过屏蔽asm的acd和cod实现该磁盘组mount,而且不会dismount。

SQL> alter diskgroup DATADG mount
NOTE: cache registered group DATADG number=1 incarn=0x9c94d0eb
NOTE: cache began mount (first) of group DATADG number=1 incarn=0x9c94d0eb
NOTE: Assigning number (1,2) to disk (/dev/asm-diskdata02)
NOTE: Assigning number (1,0) to disk (/dev/asm-diskdata01)
NOTE: skip COD recovery as part of test at kfrc.c:1639
NOTE: skip COD recovery as part of test at kfrc.c:1639
Tue Mar 07 19:12:45 2017
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 75 for pid 28, osid 15615
NOTE: Assigning number (1,1) to disk ()
GMON querying group 1 at 76 for pid 28, osid 15615
NOTE: cache closing disk 1 of grp 1: (not open)
NOTE: cache opening disk 0 of grp 1: DATADG_0000 path:/dev/asm-diskdata01
NOTE: F1X0 found on disk 0 au 2 fcn 0.178802
NOTE: cache opening disk 2 of grp 1: DATA_0001 path:/dev/asm-diskdata02
NOTE: cache mounting (first) normal redundancy group 1/0x9C94D0EB (DATADG)
Tue Mar 07 19:12:45 2017
* allocate domain 1, invalid = TRUE
kjbdomatt send to inst 2
Tue Mar 07 19:12:45 2017
NOTE: attached to recovery domain 1
NOTE: starting recovery of thread=1 ckpt=25.2870 group=1 (DATADG)
NOTE: advancing ckpt for group 1 (DATADG) thread=1 ckpt=25.2873
NOTE: cache recovered group 1 to fcn 0.365897
NOTE: redo buffer size is 512 blocks (2101760 bytes)
Tue Mar 07 19:12:45 2017
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (DATADG)
NOTE: LGWR found thread 1 closed at ABA 25.2872
NOTE: LGWR mounted thread 1 for diskgroup 1 (DATADG)
NOTE: LGWR opening thread 1 at fcn 0.365897 ABA 26.2873
NOTE: cache mounting group 1/0x9C94D0EB (DATADG) succeeded
NOTE: cache ending mount (success) of group DATADG number=1 incarn=0x9c94d0eb
Tue Mar 07 19:12:45 2017
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup DATADG was mounted
SUCCESS: alter diskgroup DATADG mount
Tue Mar 07 19:12:45 2017
NOTE: diskgroup resource ora.DATADG.dg is online
NOTE: skip COD recovery as part of test at kfrc.c:1639
NOTE: skip COD recovery as part of test at kfrc.c:1639
NOTE: skip COD recovery as part of test at kfrc.c:1639
NOTE: skip COD recovery as part of test at kfrc.c:1639

asm的问题解决后,然后登录数据库,发现运气比较好,两个数据库正常open成功,而且alert日志无任何报错,直接通过rman备份出来数据,重建asm磁盘组,还原数据,恢复完成,而且实现数据0丢失。
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com