Oracle ASM – Database SOS

客户在不清楚磁盘被asm disk使用的情况下,直接分区做pv,加入到vg中并且分配给了lv,导致数据库异常

通过操作系统层面分析,确认客户把data磁盘组的一个磁盘给处理掉了,导致数据库报错

WARNING: ASMB force dismounting group 2 (DATA) due to failover
SUCCESS: diskgroup DATA was dismounted
2025-05-04T07:03:19.910082+08:00
KCF: read, write or open error, block=0x201544 online=1
        file=102 '+DATA/ORCL/F7D939D6DBE06C71E053C30114AC1F10/DATAFILE/xifenfei_61.dbf'
        error=15078 txt: ''
2025-05-04T07:03:19.918972+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbwc_18507.trc:
2025-05-04T07:03:19.952045+08:00
KCF: read, write or open error, block=0x2013e7 online=1
        file=102 '+DATA/ORCL/F7D939D6DBE06C71E053C30114AC1F10/DATAFILE/xifenfei_61.dbf'
        error=15078 txt: ''
2025-05-04T07:03:19.964538+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbw7_18486.trc:
2025-05-04T07:03:19.967133+08:00
KCF: read, write or open error, block=0x230e71 online=1
        file=105 '+DATA/ORCL/F7D939D6DBE06C71E053C30114AC1F10/DATAFILE/xifenfei_64.dbf'
        error=15078 txt: ''
2025-05-04T07:03:19.973289+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbw2_18466.trc:
2025-05-04T07:03:19.978514+08:00
KCF: read, write or open error, block=0x1f6e91 online=1
        file=86 '+DATA/ORCL/F7D939D6DBE06C71E053C30114AC1F10/DATAFILE/xifenfei_52.dbf'
        error=15078 txt: ''
2025-05-04T07:03:19.991060+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbwd_18511.trc:
2025-05-04T07:03:19.995762+08:00
KCF: read, write or open error, block=0x7f8 online=1
        file=15 '+DATA/ORCL/F7D939D6DBE06C71E053C30114AC1F10/DATAFILE/undotbs01.dbf'
        error=15078 txt: ''
2025-05-04T07:03:20.006862+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbwa_18498.trc:
2025-05-04T07:03:20.020739+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_imr0_18937.trc:

这个客户比较幸运,处理该磁盘之后,没有往对应的lv中写入太多数据,导致覆盖部分很少

[root@rac01 rules.d]# df -h
文件系统               容量  已用  可用 已用% 挂载点
/dev/mapper/nlas-root  800G  272G  528G   34% /
devtmpfs               284G     0  284G    0% /dev
tmpfs                  284G  637M  283G    1% /dev/shm
tmpfs                  284G  4.0G  280G    2% /run
tmpfs                  284G     0  284G    0% /sys/fs/cgroup
/dev/mapper/nlas-home  200G   64M  200G    1% /home
/dev/sda1              197M  158M   40M   80% /boot
tmpfs                   57G   40K   57G    1% /run/user/0
tmpfs                   57G   48K   57G    1% /run/user/1000
[root@rac01 rules.d]# pvs
  PV         VG   Fmt  Attr PSize   PFree
  /dev/sda2  nlas lvm2 a--  564.00g    0 
  /dev/sdb1  nlas lvm2 a--   <2.00t 1.51t
[root@rac01 rules.d]# vgs
  VG   #PV #LV #SN Attr   VSize VFree
  nlas   2   3   0 wz--n- 2.55t 1.51t
[root@rac01 rules.d]# lvs
  LV   VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  home nlas -wi-ao---- 200.00g                                                    
  root nlas -wi-ao---- 800.00g                                                    
  swap nlas -wi-ao----  64.00g

通过底层对磁盘进行分析,发现备份的磁盘头均以损坏,通过深入分析确认f1b1在sdb磁盘的第10个au上,通过相关信息,使用dul工具加载磁盘组,并分析元数据信息,发现恢复数据需要的元数据都可以正常加载

直接使用dul抽取数据到文件系统,然后open数据库成功

然后通过rman 检测坏块(3T多的库只有不到5000个坏块,相对来说效果非常好),对于坏块对象进行处理,完美完成本次恢复工作.对于这次能够有这样好的恢复效果有几个因素:
1)asm disk 加入到vg,并分配给lv之后,立刻停止写入操作,避免了因为写入数据而覆盖asm 磁盘的带来的风险
2)由于是19c库,默认au为4M,使得数据库文件数据相对比较靠后,覆盖几率小了一点
3)由于文件系统是xfs,相对覆盖比ext4会少很多
4)是云环境的ssd磁盘,没有触发trim功能
以前类似asm disk异常恢复的相关case汇总:
asm磁盘加入vg恢复
asm磁盘dd破坏恢复
asm磁盘分区丢失恢复
pvid=yes导致asm无法mount
win asm disk header 异常恢复
又一例asm disk 加入vg故障
pvcreate asm disk导致asm磁盘组异常恢复
asm disk被加入到另外一个磁盘组故障恢复
再一例asm disk被误加入vg并且扩容lv恢复
再一起asm disk被格式化成ext3文件系统故障恢复
一次完美的asm disk被格式化ntfs恢复
asm disk误设置pvid导致asm diskgroup无法mount恢复
asm disk被分区,格式化为ext4恢复
oracle asm disk格式化恢复—格式化为ext4文件系统
分享oracleasm createdisk重新创建asm disk后数据0丢失恢复案例

ORA-15411: Failure groups in disk group DATA have different number of disks.

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：ORA-15411: Failure groups in disk group DATA have different number of disks.

客户磁盘组以前规划是normal模式,但是由于某种原因,其中一个存储掉线了,出现一下状态

SQL> select group_number,name,path,failgroup,state from v$asm_disk;

GROUP_NUMBER NAME                           PATH                           FAILGROUP                      STATE
------------ ------------------------------ ------------------------------ ------------------------------ --------
           0                                /dev/asmocr1                                                  NORMAL
           0                                /dev/asmocr3                                                  NORMAL
           0                                /dev/asmhdisk15                                               NORMAL
           0                                /dev/asmocr2                                                  NORMAL
           1 DATA_0011                                                     FAL1                           NORMAL
           1 DATA_0010                                                     FAL1                           NORMAL
           1 DATA_0013                                                     FAL1                           NORMAL
           1 DATA_0012                                                     FAL1                           NORMAL
           1 DATA_0009                                                     FAL1                           NORMAL
           1 DATA_0008                                                     FAL1                           NORMAL
           1 DATA_0007                                                     FAL1                           NORMAL
           1 DATA_0006                                                     FAL1                           NORMAL
           1 DATA_0005                                                     FAL1                           NORMAL
           1 DATA_0004                                                     FAL1                           NORMAL
           1 DATA_0003                                                     FAL1                           NORMAL
           1 DATA_0002                                                     FAL1                           NORMAL
           1 DATA_0001                                                     FAL1                           NORMAL
           1 DATA_0000                                                     FAL1                           NORMAL
           1 DATA_0023                      /dev/asmhdisk5                 FAL2                           NORMAL
           1 DATA_0024                      /dev/asmhdisk6                 FAL2                           NORMAL
           1 DATA_0022                      /dev/asmhdisk4                 FAL2                           NORMAL
           1 DATA_0020                      /dev/asmhdisk2                 FAL2                           NORMAL
           1 DATA_0014                      /dev/asmhdisk1                 FAL2                           NORMAL
           1 DATA_0021                      /dev/asmhdisk3                 FAL2                           NORMAL
           1 DATA_0018                      /dev/asmhdisk13                FAL2                           NORMAL
           1 DATA_0019                      /dev/asmhdisk14                FAL2                           NORMAL
           1 DATA_0017                      /dev/asmhdisk12                FAL2                           NORMAL
           1 DATA_0016                      /dev/asmhdisk11                FAL2                           NORMAL
           1 DATA_0027                      /dev/asmhdisk9                 FAL2                           NORMAL
           1 DATA_0015                      /dev/asmhdisk10                FAL2                           NORMAL
           1 DATA_0025                      /dev/asmhdisk7                 FAL2                           NORMAL
           1 DATA_0026                      /dev/asmhdisk8                 FAL2                           NORMAL
           2 OCRVOTE2                       AFD:OCRVOTE2                   OCRVOTE2                       NORMAL
           2 OCRVOTE1                       AFD:OCRVOTE1                   OCRVOTE1                       NORMAL
           2 OCRVOTE3                       AFD:OCRVOTE3                   OCRVOTE3                       NORMAL

35 rows selected.

因为磁盘空闲空间较大

ASMCMD> lsdg
State    Type    Rebal  Sector  Logical_Sector  Block       AU  Total_MB   Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N         512             512   4096  4194304  29360128  23110032          2097152        10506440             14             N  DATA/
MOUNTED  EXTERN  N         512             512   4096  4194304     92160     91724                0           91724              0             Y  OCRVOTE/

想从data磁盘组中，删除部分盘,释放出来一些空间,结果报ORA-15411: Failure groups in disk group DATA have different number of disks.

SQL> alter diskgroup data drop disk DATA_0027,DATA_0026,DATA_0025,DATA_0024 rebalance power 10;
alter diskgroup data drop disk DATA_0027,DATA_0026,DATA_0025,DATA_0024 rebalance power 10
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15411: Failure groups in disk group DATA have different number of disks.

设置,删除磁盘成功_asm_disable_failgroup_size_checking和_asm_disable_dangerous_failgroup_checking

SQL> alter system set "_asm_disable_failgroup_size_checking"=true scope=memory sid='*';

System altered.

SQL>alter system set "_asm_disable_dangerous_failgroup_checking"=true scope=memory sid='*';

System altered.

SQL> alter diskgroup data drop disk DATA_0027,DATA_0026,DATA_0025,DATA_0024 rebalance power 10;

Diskgroup altered.

dd破坏asm磁盘头恢复

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：dd破坏asm磁盘头恢复

有朋友对asm disk的磁盘头dd了2048byte的数据

通过分析,gi软件版本,确认是11.2.0.4

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options.
ORACLE_HOME = /u01/app/11.2.0/grid
System name:	Linux
Node name:	rac1
Release:	4.1.12-37.4.1.el6uek.x86_64
Version:	#2 SMP Tue May 17 07:23:38 PDT 2016
Machine:	x86_64

从10.2.0.5之后版本,在第二个au的倒数第二个block上面,有asm disk header备份（每个block大小为4k）,分析au大小(通过分析正常的asm disk快速找到au 大小【使用dd备份的正常的磁盘头查看】)

H:\TEMP\tmp\asmbak>kfed read sdcp.dd |grep ausize
kfdhdb.ausize:                 16777216 ; 0x0bc: 0x01000000

找到被破坏的asm disk的备份磁盘头信息

H:\TEMP\tmp\asmbak>kfed read sdc.dd blkn=4094 aun=1 aus=16777216|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                    4094 ; 0x004: blk=4094
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                   229348702 ; 0x00c: 0x0dab955e
kfbh.fcn.base:                 11727032 ; 0x010: 0x00b2f0b8
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:         ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]:            0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                186646528 ; 0x020: 0x0b200000
kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:               DATA_0000 ; 0x028: length=9
kfdhdb.grpname:                    DATA ; 0x048: length=4
kfdhdb.fgname:                DATA_0000 ; 0x068: length=9
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             33123276 ; 0x0a8: HOUR=0xc DAYS=0x1e MNTH=0xa YEAR=0x7e5
kfdhdb.crestmp.lo:           2259134464 ; 0x0ac: USEC=0x0 MSEC=0x1ea SECS=0x2a MINS=0x21
kfdhdb.mntstmp.hi:             33162836 ; 0x0b0: HOUR=0x14 DAYS=0x12 MNTH=0x1 YEAR=0x7e8
kfdhdb.mntstmp.lo:           3600987136 ; 0x0b4: USEC=0x0 MSEC=0xad SECS=0x2a MINS=0x35
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                 16777216 ; 0x0bc: 0x01000000
kfdhdb.mfact:                    454272 ; 0x0c0: 0x0006ee80
kfdhdb.dsksize:                   65536 ; 0x0c4: 0x00010000
kfdhdb.pmcnt:                         2 ; 0x0c8: 0x00000002
kfdhdb.fstlocn:                       1 ; 0x0cc: 0x00000001
kfdhdb.altlocn:                       2 ; 0x0d0: 0x00000002
kfdhdb.f1b1locn:                      0 ; 0x0d4: 0x00000000
kfdhdb.redomirrors[0]:                0 ; 0x0d8: 0x0000
kfdhdb.redomirrors[1]:                0 ; 0x0da: 0x0000
kfdhdb.redomirrors[2]:                0 ; 0x0dc: 0x0000
…………

确认被损坏的磁盘只有磁盘头信息损坏(即确认第二个block是否是好的)

H:\TEMP\tmp\asmbak>kfed read sdc.dd blkn=0
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
0065D8400 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]


H:\TEMP\tmp\asmbak>kfed read sdc.dd blkn=1|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            2 ; 0x002: KFBTYP_FREESPC
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                       1 ; 0x004: blk=1
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  2781697777 ; 0x00c: 0xa5cd56f1
kfbh.fcn.base:                 39359331 ; 0x010: 0x02589363
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdfsb.aunum:                         0 ; 0x000: 0x00000000
kfdfsb.max:                        1014 ; 0x004: 0x03f6
kfdfsb.cnt:                         147 ; 0x006: 0x0093
kfdfsb.bound:                         0 ; 0x008: 0x0000
kfdfsb.flag:                          1 ; 0x00a: B=1
kfdfsb.ub1spare:                      0 ; 0x00b: 0x00
kfdfsb.spare[0]:                      0 ; 0x00c: 0x00000000
kfdfsb.spare[1]:                      0 ; 0x010: 0x00000000
kfdfsb.spare[2]:                      0 ; 0x014: 0x00000000
kfdfse[0].fse:                        0 ; 0x018: FREE=0x0 FRAG=0x0
…………

基于上述分析,直接使用备份的asm disk header信息进行merge或者repair修复之后,asm 磁盘头状态恢复正常

这个客户运气比较好,库非常大,只是破坏了2k的数据,如果超过4k可能就是比较麻烦的事故了,再次提醒对asm磁盘的dd操作一定要小心谨慎.如果不慎破坏asm磁盘过多,参考以前类似文档:
asm磁盘dd破坏恢复

删除asmlib磁盘导致磁盘组故障恢复

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：删除asmlib磁盘导致磁盘组故障恢复

有客户执行drop disk磁盘组操作之后,然后立刻从oracle asmlib层面执行了oracleasm deletedisk,并且在操作系统层面delete partition(删除磁盘分区),导致磁盘组直接dismount

Tue Nov 26 16:44:04 2024
SQL> alter diskgroup data drop disk DATA_0008 
NOTE: GroupBlock outside rolling migration privileged region
Tue Nov 26 08:44:05 2024
NOTE: stopping process ARB0
NOTE: rebalance interrupted for group 2/0x28dec0d5 (DATA)
NOTE: requesting all-instance membership refresh for group=2
NOTE: membership refresh pending for group 2/0x28dec0d5 (DATA)
Tue Nov 26 08:44:14 2024
GMON querying group 2 at 48 for pid 18, osid 27385
SUCCESS: refreshed membership for 2/0x28dec0d5 (DATA)
SUCCESS: alter diskgroup data drop disk DATA_0008
NOTE: starting rebalance of group 2/0x28dec0d5 (DATA) at power 2
Starting background process ARB0
Tue Nov 26 08:44:14 2024
ARB0 started with pid=38, OS id=56987 
NOTE: assigning ARB0 to group 2/0x28dec0d5 (DATA) with 2 parallel I/Os
Tue Nov 26 08:44:17 2024
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
Tue Nov 26 08:44:57 2024
cellip.ora not found.
Tue Nov 26 17:08:46 2024
SQL> alter diskgroup data drop disk DATA_0008 
ORA-15032: not all alterations performed
ORA-15071: ASM disk "DATA_0008" is already being dropped
ERROR: alter diskgroup data drop disk DATA_0008
Tue Nov 26 17:10:30 2024
SQL> alter diskgroup data drop disk DATA_0008 
ORA-15032: not all alterations performed
ORA-15071: ASM disk "DATA_0008" is already being dropped
ERROR: alter diskgroup data drop disk DATA_0008
Tue Nov 26 09:34:38 2024
WARNING: cache read  a corrupt block:group=2(DATA) dsk=8 blk=98 disk=8 (DATA_0008) incarn=3911069755 au=0 blk=98 count=1
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_56987.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483656] [98] [0 != 1]
NOTE: a corrupted block from group DATA was dumped to /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_56987.trc
WARNING:cache read (retry) a corrupt block:group=2(DATA) dsk=8 blk=98 disk=8(DATA_0008)incarn=3911069755 au=0 blk=98 count=1
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_56987.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483656] [98] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483656] [98] [0 != 1]
ERROR: cache failed to read group=2(DATA) dsk=8 blk=98 from disk(s): 8(DATA_0008)
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483656] [98] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483656] [98] [0 != 1]
NOTE: cache initiating offline of disk 8 group DATA
NOTE: process _arb0_+asm1(56987)initiating offline of disk 8.3911069755 (DATA_0008) with mask 0x7e in group 2
NOTE: initiating PST update: grp = 2, dsk = 8/0xe91e303b, mask = 0x6a, op = clear
Tue Nov 26 09:34:38 2024
GMON updating disk modes for group 2 at 49 for pid 38, osid 56987
ERROR: Disk 8 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 2)
Tue Nov 26 09:34:38 2024
NOTE: cache dismounting (not clean) group 2/0x28DEC0D5 (DATA) 
WARNING: Offline for disk DATA_0008 in mode 0x7f failed.
NOTE: messaging CKPT to quiesce pins Unix process pid: 89645, image: oracle@ahptdb5 (B000)
Tue Nov 26 09:34:38 2024
NOTE: halting all I/Os to diskgroup 2 (DATA)
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_56987.trc  (incident=413105):
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0008" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483656] [98] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483656] [98] [0 != 1]
Tue Nov 26 09:34:39 2024
ERROR: ORA-15130 in COD recovery for diskgroup 2/0x28dec0d5 (DATA)
ERROR: ORA-15130 thrown in RBAL for group number 2
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_27385.trc:
ORA-15130: diskgroup "DATA" is being dismounted
ERROR: ORA-15335 thrown in ARB0 for group number 2
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_56987.trc:
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0008" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483656] [98] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483656] [98] [0 != 1]
NOTE: stopping process ARB0
Tue Nov 26 09:34:40 2024
NOTE: LGWR doing non-clean dismount of group 2 (DATA)
NOTE: LGWR sync ABA=716.2684 last written ABA 716.2684

通过重新分区,并且kfed repair修复磁盘头操作之后,重新mount磁盘组报错

SQL> alter diskgroup data mount 
NOTE: cache registered group DATA number=2 incarn=0x73bec220
NOTE: cache began mount (first) of group DATA number=2 incarn=0x73bec220
NOTE: Assigning number (2,16) to disk (/dev/oracleasm/disks/DATA208)
NOTE: Assigning number (2,15) to disk (/dev/oracleasm/disks/DATA207)
NOTE: Assigning number (2,14) to disk (/dev/oracleasm/disks/DATA206)
NOTE: Assigning number (2,13) to disk (/dev/oracleasm/disks/DATA205)
NOTE: Assigning number (2,12) to disk (/dev/oracleasm/disks/DATA204)
NOTE: Assigning number (2,11) to disk (/dev/oracleasm/disks/DATA203)
NOTE: Assigning number (2,10) to disk (/dev/oracleasm/disks/DATA202)
NOTE: Assigning number (2,9) to disk (/dev/oracleasm/disks/DATA201)
NOTE: Assigning number (2,6) to disk (/dev/oracleasm/disks/DATA07)
NOTE: Assigning number (2,5) to disk (/dev/oracleasm/disks/DATA06)
NOTE: Assigning number (2,4) to disk (/dev/oracleasm/disks/DATA05)
NOTE: Assigning number (2,0) to disk (/dev/oracleasm/disks/DATA01)
NOTE: Assigning number (2,3) to disk (/dev/oracleasm/disks/DATA04)
NOTE: Assigning number (2,2) to disk (/dev/oracleasm/disks/DATA03)
NOTE: Assigning number (2,1) to disk (/dev/oracleasm/disks/DATA02)
NOTE: Assigning number (2,8) to disk (/dev/oracleasm/disks/DATA101)
Tue Nov 26 11:48:22 2024
NOTE: GMON heartbeating for grp 2
GMON querying group 2 at 83 for pid 27, osid 15781
NOTE: cache opening disk 0 of grp 2: DATA_0000 path:/dev/oracleasm/disks/DATA01
NOTE: F1X0 found on disk 0 au 2 fcn 0.127835487
NOTE: cache opening disk 1 of grp 2: DATA_0001 path:/dev/oracleasm/disks/DATA02
NOTE: cache opening disk 2 of grp 2: DATA_0002 path:/dev/oracleasm/disks/DATA03
NOTE: cache opening disk 3 of grp 2: DATA_0003 path:/dev/oracleasm/disks/DATA04
NOTE: cache opening disk 4 of grp 2: DATA_0004 path:/dev/oracleasm/disks/DATA05
NOTE: cache opening disk 5 of grp 2: DATA_0005 path:/dev/oracleasm/disks/DATA06
NOTE: cache opening disk 6 of grp 2: DATA_0006 path:/dev/oracleasm/disks/DATA07
NOTE: cache opening disk 8 of grp 2: DATA_0008 path:/dev/oracleasm/disks/DATA101
NOTE: cache opening disk 9 of grp 2: DATA_0009 path:/dev/oracleasm/disks/DATA201
NOTE: cache opening disk 10 of grp 2: DATA_0010 path:/dev/oracleasm/disks/DATA202
NOTE: cache opening disk 11 of grp 2: DATA_0011 path:/dev/oracleasm/disks/DATA203
NOTE: cache opening disk 12 of grp 2: DATA_0012 path:/dev/oracleasm/disks/DATA204
NOTE: cache opening disk 13 of grp 2: DATA_0013 path:/dev/oracleasm/disks/DATA205
NOTE: cache opening disk 14 of grp 2: DATA_0014 path:/dev/oracleasm/disks/DATA206
NOTE: cache opening disk 15 of grp 2: DATA_0015 path:/dev/oracleasm/disks/DATA207
NOTE: cache opening disk 16 of grp 2: DATA_0016 path:/dev/oracleasm/disks/DATA208
NOTE: cache mounting (first) external redundancy group 2/0x73BEC220 (DATA)
Tue Nov 26 11:48:22 2024
* allocate domain 2, invalid = TRUE 
kjbdomatt send to inst 2
Tue Nov 26 11:48:22 2024
NOTE: attached to recovery domain 2
NOTE: starting recovery of thread=1 ckpt=716.1536 group=2 (DATA)
NOTE: starting recovery of thread=2 ckpt=763.6248 group=2 (DATA)
NOTE: recovery initiating offline of disk 8 group 2 (*)
NOTE: cache initiating offline of disk 8 group DATA
NOTE: process _user15781_+asm1 (15781) initiating offline of disk 8.3911069996 (DATA_0008) with mask 0x7e in group 2
NOTE: initiating PST update: grp = 2, dsk = 8/0xe91e312c, mask = 0x6a, op = clear
GMON updating disk modes for group 2 at 84 for pid 27, osid 15781
ERROR: Disk 8 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 2)
WARNING: Offline for disk DATA_0008 in mode 0x7f failed.
Tue Nov 26 11:48:23 2024
NOTE: halting all I/Os to diskgroup 2 (DATA)
NOTE: recovery (pass 2) of diskgroup 2 (DATA) caught error ORA-15130
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_15781.trc:
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0008" in group "DATA" may result in a data loss
ORA-15131: block 97 of file 8 in diskgroup 2 could not be read
ORA-15196: invalid ASM block header [kfc.c:7600] [endian_kfbh] [2147483656] [97] [0 != 1]

由于客户执行了oracleasm deletedisk,根据经验确认该操作是对asm磁盘头的前1M数据进行了清空,而客户这个asm刚好是drop disk触发了rebalance操作的时候干掉磁盘的,基于这样的情况,直接通过修复磁盘1M数据并且mount磁盘组继续使用该磁盘组的概率不大.因此处理建议：
1. 直接恢复出来该磁盘组数据然后打开该库
2. 直接提取客户需要的核心表数据
有过客户有类似操作是asmlib重新创建了磁盘信息恢复:分享oracleasm createdisk重新创建asm disk后数据0丢失恢复案例
删除分区信息之后数据库恢复案例:删除分区 oracle asm disk 恢复

drop tablespace xxx including contents恢复

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：drop tablespace xxx including contents恢复

最近接到一个客户恢复请求,对系统的核心业务表空间发起了drop tablespace xxx including contents 操作,导致该表空间被删除,客户在删除表空间操作之前使用expdp导出了一份元数据.
drop_tablespace

客户在咨询我的同时,也咨询了其他人,有人给客户答复是可以通过修改字典(以为有导出的元数据就可以逆向想改文件回去),然后把数据文件拷贝过去,实现恢复,成功概率65%[只能说是真牛]
QQ20240904-220806

对于这个客户的故障,这个思路不可能成功,原因有:
1)客户的系统中有部分字典信息已经彻底丢失,无法通过闪回找回来,因此无法对于字典逆向dml操作完成修改
2)drop tbs这个操作涉及的字典操作非常多,而且也非常复杂,在我的认知中,国内不一定有人完全在短时间内梳理清楚相互关系,完成逆向dml操作
3)他们咨询的人不是圈子中恢复大牛(因为圈子不大,大牛也不可能给他们出这种恢复方案)
4)数据文件复制到新库,完全不是同一个库的,要大量修改文件头信息,我估计他们在这一步都不能成功
果然不出所料,他们做了一个测试,结果库起不来
test

这个客户只是drop tablespace including contents 没有加上and datafiles,因此所有数据文件都还在
dbf

所以这个恢复相对比较简单,直接使用dul之类工具扫描数据文件获取到实际数据.结合客户导出的元数据和通过一些途径恢复出来的dataobj#信息,进行整合,实现数据接近完美恢复,可以业务直接启动成功,其中几个大表的lbo字段数据恢复情况
QQ20240904-222230

类似这样的drop tablespace恢复案例我们经历过很多,但是这个是恢复效果最好的(1.所有数据文件没有丢失;2.在删除表空间之前元数据导出了一份;3.通过找删除记录,awr中表,历史的dmp等方法找出来所有表的dataobj#）,以前的一些表空间删除恢复案例:
ASM删除表空间恢复
drop tablesapce 数据恢复
oracle drop tablespace 恢复最后一招
分享运气超级好的一次drop tablespace 数据恢复

amdu参数详解

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：amdu参数详解

最近发现amdu命令比以前认知中的强大,记录下相关参数

[oracle@xifenfei ~]$ amdu help=y
a/usize         AU size for corrupt disks
-ausize <bytes>: This option must be set when -baddisks is set. It
    must be a power of 2. This size is required to scan a disk looking
    for metadata, and it is normally read from the disk header. The
    value applies to all disks that do not have a valid header. The
    value from the disk header will be used if a valid header is
    found.

ba/ddisks               Include disks with bad headers
-baddisks <diskgroup>:  Normally disks with bad disk headers, or that
    look like they were never part of a disk group, will not be
    scanned. This option forces them to be scanned anyway and to be
    considered part of the given diskgroup. This is most useful when
    a disk header has been damaged. The disk will still need to have
    a valid allocation table to drive the scan unless -fullscan is
    used. In any case at least one block in the first two AUs must be
    valid so that the disk number can be determined. The options
    -ausize and -blksize are required since these values are normally
    fetched from the disk header. If the diskgroup uses external
    redundancy then -external should be specified. These values will
    be compared against any valid disks found in the diskgroup and
    they must be the same.

bl/ksize                ASM block size for corrupt disks
-blksize <bytes>: This option must be set when -baddisks is set. It
    must be a power of 2. This size is required to scan a disk looking
    for metadata, and it is normally read from the disk header. The
    value applies to all disks that do not have a valid header. The
    value from the disk header will be used if a valid header is
    found.

c/ompare                Compare file mirrors
-compare: This option only applies to file extraction from a normal or
    high redundancy disk group. Every extent that is mirrored on more
    than one discovered disk will have all sides of its mirror
    compared. If they are not identical a message will be reported
    on standard error and the report file. The message will indicate
    which copy was extracted. A count of the blocks that are not
    identical will be in the report file.

dir/ectory              Directory from previous dump
-directory <string>: This option completely eliminates the discovery
    phase of operation. It specifies the name of a dump directory from
    a previous run of AMDU. The report file and map files are read
    instead of doing a discovery and scan. The parsing of these ASCII
    files is very dependent on them being exactly as written by AMDU.
    AMDU is unlikely to work properly if they have been modified by
    a text editor, or if some of the files are missing or truncated.
    Note that the directory may be a copy FTPed from another
    machine. The other machine may even be a different platform
    with a different endianess.

dis/kstring             Diskstring for discovery
 -diskstring <string>: By default the null string is used for
    discovery. The null string should discover all disks the user has
    access to. Many installations specify an asm_diskstring parameter
    for their ASM instance. If so that parameter value should be given
    here. Multiple discovery strings can be specified by multiple
    occurrences of -diskstring <string>. Beware of shell syntax
    conflicts with discovery strings. Diskstrings are usually the same
    syntax the shell uses for expanding path names on command lines so
    they will most likely need to be enclosed in single quotes.

du/mp           Diskgroups to dump
-dump <diskgroup>: This option specifies the name of a diskgroup to
    have its metadata dumped. This option may be specified multiple
    times to dump multiple diskgroups. If the diskgroup name is ALL
    then all diskgroups encountered will be dumped. The diskgroup name
    is not case sensitive, but will be converted to uppercase for all
    reports. If this option is not specified then no map or image
    files will be created, but -extract and -print may still work.

exc/lude                Disks to exclude
-exclude <string>: Multiple exclude options may be specified. These
    strings are used for discovery just like the values for diskstring.
    Only shallow discovery is done on these diskstrings. Any disks
    found in the exclude discovery will not be accessed. If they are
    also discovered using the -diskstring strings, then the report will
    include the information from shallow discovery along with a message
    indicating the disk was excluded.

exte/rnal               Assume external redundancy
-external: Normally AMDU determines the diskgroup redundancy from the
    disk headers. However this is not possible with the -baddisks
    option. It is assumed that the redundancy of the -baddisks
    diskgroup is normal or high unless this option is given to specify
    external redundancy.

extr/act                Files to extract
-extract <diskgroup>.<file_number>: This extracts the numbered file
    from the named diskgroup, case insensitive. This option may be
    specified multiple times to extract multiple files. The extracted
    file is placed in the dump directory under the name
    <diskgroup>_<number>.f  where <diskgroup> is the diskgroup name
    in uppercase, and <number> is the file number. The -output option
    may be used to write the file to any location. The extracted file
    will appear to have the same contents it would have if accessed
    through the database. If some portion of the file is unavailable
    then that portion of the output file will be filled with
    0xBADFDA7A, and a message will appear on stderr.

fi/ledump               Dump files rather than extract
-filedump: This option causes the file objects in the command line to
    have their blocks dumped to the image files rather than extracted.
    This can be combined with the -novirtual option to selectively
    dump only some of the metadata files. It may also be used to dump
    user files (number >= 256) so that all mirrored copies can be
    examined.

fo/rmer         Include dropped disks
-former: Normally disks marked as former are not scanned, but this
    option will scan them and include their contents in the output.
    This is useful when it is necessary to look at the contents of a
    disk that was dropped. Note that dropped normal disks will not have
    any entries in their allocation tables and thus only the physically
    addressed extents will be dumped. Force dropped disks will not have
    status former in their disk headers and are not affected by this
    option. However if DROP DISKGROUP is used, the disks will have the
    contents as of the time of the drop, and will be in status former.
    Thus this option is useful for extracting files from a dropped
    diskgroup.

fu/llscan               Scan entire disk
-fullscan: This option reads every AU on the disk and looks at the
    contents of the AU rather than limiting the AU's read based on the
    allocation table. This is useful when the allocation table is
    corrupt or needs recovery. An AU will be written to the image file
    if it starts with a block that contains a valid ASM block header.
    The file and extent information for the map will be extracted from
    the block header. Physically addressed metadata will be dumped
    regardless of its contents. This option is incompatible with
    extracting a file. It is an error to specify -extract with this
    option. Note that this option is likely to find old garbage
    metadata in unallocated AU's since there is no means of
    determining what is allocated. Thus there may be many different
    copies of the same block, possibly of different versions.

h/ex            Always print block contents in hex
-hex: This prints the block contents in hex without attempting to print
    them as ASM metadata. This is useful when the block is known to not
    be ASM metadata. It avoids the ASM block header dump and ensures
    the block is not accidentally interpreted as ASM metadata. This
    option requires at least one -print option.

noa/cd          Do not dump ACD
-noacd: This option limits the dumping of the Active Change Directory
    to just the control blocks that contain the checkpoint. There is
    126 MB of ACD per ASM instance (42 MB for external redundancy). It
    is normally of no interest if there has been a clean shutdown or
    no updates for a while. This option avoids dumping a lot of
    unimportant data. The blocks will still be read and checked for
    corruption. The map file will still contain entries for the ACD
    extents, but the block counts will be zero.

nod/ir          Do not create a dump directory
-nodir: No dump directory is created, and no files are created in it.
    The directory name is not written to standard out. The report file
    is written to standard out before any block printouts from any
    -print options.  This option conflicts with -filedump. It is an
    error to specify this and extract a file to the dump directory.

noe/xtract              Do not create extracted file
-noextract: This prevents files from being extracted to an output
    file, but the file will be read and any errors in selecting the
    correct output will be reported. This is most useful in
    combination with the -compare option.

noh/eart                Do not check for heartbeat
-noheart: Normally the heartbeat block will be saved at discovery time
    and checked when the disk is scanned. A sleep is added between
    discovery and scanning to ensure there is time for the heartbeat
    to be written. If the heartbeat block changes then it is most
    likely that the diskgroup containing this disk is mounted by an
    active ASM instance. An error and warning is generated but
    operation proceeds normally. This option suppresses this check
    and avoids the sleep.

noi/mage                Do not create image files
-noimage: No image files will be created n the dump directory. All
    the reads specified by the read options will still be done. The
    map files may be used to find blocks on the disks themselves. In
    the map file, the count of blocks dumped, the image file sequence
    number, and the byte offset in the image file will all always be
    zero (C00000 S0000 B0000000000)

nom/ap          Do not create map or image files
-nomap: No map file is created and no image file is created. The only
    output is the report file. The -noimage option is assumed if this
    is set since an image file without a map is useless. The options
    -noscan and -noread also result in no map or image files, but
    -nomap still reads the metadata to check for I/O errors and corrupt
    blocks.

nop/rint                Do no print block contents
-noprint: This suppresses the printout of the block contents for
    blocks printed with the -print option. It is useful for getting
    just the block reports without a lot of data. This option requires
    at least one -print option.

norea/d         Shallow discovery only
-noread: This eliminates any reading of any disks at all. Only shallow
    discovery will be done. The report will end after the discovery
    section. It is an error to specify this option and specify a file
    to extract or blocks to print. It is an error to specify this
    and -fullscan.

norep/ort               Do not generate a report
-noreport: This suppresses the generation of the report file. It is
    most useful in combination with -nodir and -print to get block
    printouts without a lot of clutter. It is unnecessary to include
    this with -directory since no report is generated then anyway.

nosc/an         Deep discovery only
-noscan: This eliminates any reading of any disks after deep
    discovery. This results in just doing a deep discovery using the
    disksting parameter. The report will end after the discovery
    section. It is an error to specify this option and specify a file
    to extract. It is an error to specify this and -fullscan.

nosu/bdir               Do not create a dump directory
-nosubdir: No dump directory is created, but files are still created.
    The directory name is not written to standard out. The report file
    and any other dump or extract  files are written to the current
    directory or to the directory indicated by -parentdir. This means
    that if multiple AMDU dumps are requested using this option, the
    report file will always correspond to the last dump requested.

nov/irtual              Do not dump virtual metadata
-novirtual: This option eliminates reading of any virtual metadata.
    Only the physically addressed metadata will be read. This
    implicitly eliminates the ACD and extent maps so -noacd and
    -noxmap will be assumed.

nox/map         Do not dump extent maps
-noxmap: This option eliminates reading of the indirect extents
    containing the file extent maps. This is the bulk of the metadata
    in most diskgroups. Even the entries in the map file will be
    eliminated.

o/utput         Files to create for extract
-output <file_name>: This option specifies a different file for
    writing an extracted file. The file will be overwritten if it
    already exists. This option requires that exactly one file is
    extracted via the -extract option.

pa/rent         Parent for dump directory
-parent <path_name>: By default the dump directory is created in the
    current directory, but another directory can be specified using
    this option. The parent directory for the dump directory must
    already exist.

pr/int          Block to print
-print <block_spec>: This option prints one or more blocks to standard
    out. This option may be specified multiple times to print multiple
    <block_spec>s. The printout contains information about how each
    block was read as well as a formatted printout. Multiple blocks
    matching the same <block_spec> may be found when scanning the
    disks. For example there may be multiple disks that have headers
    for the same diskgroup and disk number. If the block is from a
    mirrored file then multiple copies should exist on different disks.
    If multiple copies of the same block have identical contents then
    only one formatted printout of the contents will be generated, but
    a header will be printed for each copy. A <block_spec> may include
    a count of sequential blocks to print. A <block_spec> may specify
    a block either by disk or file.
   <block_spec> ::= <single_block> | <single_block>.C<count>
   <single_block> ::= <report_disk_block> | <group_disk_block> |
        <extent_file_block> | <virtual_file_block> | <xmap_file_block>
   <report_disk_block> ::=
        <group_name>.N<report_number>.A<au_number>.B<block_number>
   <group_disk_block> ::=
         <group_name>.D<disk_number>.A<au_number>.B<block_number>
   <extent_file_block> ::=
         <group_name>.F<file_number>.X<physical_extent>.B<block_number>
   <virtual_file_block> ::= 
         <group_name>.F<file_number>.V<virtual_block_number>
   <xmap_file_block> ::=
         <group_name>.F<file_number>.M<extent_map_block_number>

r/egistry               Dump registry files
-registry: The ASM registries will be read and dumped to the image
    file. There will be no block consistency checks since these files
    do not have ASM cache headers. To dump one specific registry
    specify -filedump and include the file object for the registry
    (e.g. DATA.255).

s/pfile         Extract usable spfile
-spfile: This causes extract to render the resulting file in a form   
    that is directly usable by startup. Without this option, AMDU   
    will extract the file as a regular ASM file including all ASM   
    specific headers and such

asm磁盘加入vg恢复

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：asm磁盘加入vg恢复

又一个客户把asm disk做成pv,加到vg中,并且对lv进行了扩展(ext4的文件系统)

这个客户做了上述操作之后,没有对lv进行写入其他数据,所以破坏较少(主要的破坏就是ext4的每个一段就会置空一部分block预留给文件系统写入元数据使用),通过winhex查看被破坏磁盘发现lvm信息

对于这种情况,通过对文件头进行修复,结合工具直接拷贝出来数据文件（个别文件元数据损坏通过基于block的方式恢复dbf）

然后直接恢复dbf中数据文件（对于异常的主要是segment header被置空的tab使用dul单独扫描处理）,实现客户数据的最大限度恢复
以前类似文章:
asm disk被加入vg恢复
asm disk被分区,格式化为ext4恢复
pvcreate asm disk导致asm磁盘组异常恢复
再一起asm disk被格式化成ext3文件系统故障恢复
一次完美的asm disk被格式化ntfs恢复
oracle asm disk格式化恢复—格式化为ext4文件系统

误删除asm disk导致磁盘组无法mount数据库恢复

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：误删除asm disk导致磁盘组无法mount数据库恢复

客户误删除asm disk两个lun（由于这个是这个存储的特殊性,删除lun之后,存储层面无法恢复出来对应的lun数据,导致客户彻底放弃了硬件层面恢复的可能性.）,导致asm磁盘组无法正常mount

SQL> ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:27928:40938} */ 
NOTE: cache registered group DATA number=3 incarn=0x60fa38b1
NOTE: cache began mount (first) of group DATA number=3 incarn=0x60fa38b1
NOTE: Assigning number (3,0) to disk (/dev/rdisk/VD02_DBF)
NOTE: Assigning number (3,1) to disk (/dev/rdisk/VD03_DBF)
NOTE: Assigning number (3,2) to disk (/dev/rdisk/VD04_DBF)
NOTE: Assigning number (3,3) to disk (/dev/rdisk/VD05_DBF)
NOTE: Assigning number (3,4) to disk (/dev/rdisk/VD06_DBF)
Thu Dec 29 10:21:20 2022
NOTE: GMON heartbeating for grp 3
GMON querying group 3 at 29 for pid 29, osid 3770
NOTE: Assigning number (3,5) to disk ()
NOTE: Assigning number (3,6) to disk ()
GMON querying group 3 at 30 for pid 29, osid 3770
NOTE: cache dismounting (clean) group 3/0x60FA38B1 (DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 3770, image: oracle@db1 (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 3/0x60FA38B1 (DATA) 
NOTE: cache ending mount (fail) of group DATA number=3 incarn=0x60fa38b1
NOTE: cache deleting context for group DATA 3/0x60fa38b1
GMON dismounting group 3 at 31 for pid 29, osid 3770
NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0002 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0003 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0004 in mode 0x7f marked for de-assignment
NOTE: Disk  in mode 0x7f marked for de-assignment
NOTE: Disk  in mode 0x7f marked for de-assignment
ERROR: diskgroup DATA was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "6" is missing from group number "3" 
ORA-15042: ASM disk "5" is missing from group number "3" 
ERROR: ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:27928:40938} */

这个客户应该有三个磁盘组存放数据文件,其中data磁盘组的7个磁盘被删除了2个lun,导致data磁盘组无法mount,客户希望尽可能恢复其中数据,对于这种情况,由于2个lun完全丢失,直接通过dul之类的工具拷贝asm数据文件恢复不可行(因为很多asm的元数据也会在丢失的lun里面,导致拷贝出来的数据文件异常太多,恢复效果会很差),对于这种情况采用asm disk header 彻底损坏恢复的恢复方法,尽可能的从block层面恢复出来所有可以恢复的数据块中的数据

由于这个其中涉及了system表空间(oracle损坏严重),结合客户几年前的一个system历史备份文件,恢复出来字典,然后尽可能的恢复数据文件,最终最大限度给客户恢复数据,让客户的损失降到最低.

asm disk被分区,格式化为ext4恢复

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：asm disk被分区,格式化为ext4恢复

有客户因为没有认识到linux中的磁盘被asm使用,对其进行分区并且做成了ext4的文件系统，从history中获取客户操作命令

  600  fdisk -l
  601  fdisk /dev/sdb
  602  mkfs ext4 /dev/sdb1 
  603  fdisk -l
  604  mkfs -t ext4 /dev/sdb1 
  605  cd /
  606  mkdir u01
  607  mount /dev/sdb1 /u01
  608  df -h

确认磁盘情况,确认sdb直接被asm磁盘使用(asmdisk1)

[grid@racdb3 trace]$ ls -l /dev/asm*
brw-rw---- 1 grid asmadmin 8, 16 Sep 30 14:34 /dev/asmdisk1
[grid@racdb3 trace]$ ls -l /dev/sd*
brw-rw---- 1 root disk 8,  0 Jul 27  2021 /dev/sda
brw-rw---- 1 root disk 8,  1 Jul 27  2021 /dev/sda1
brw-rw---- 1 root disk 8,  2 Jul 27  2021 /dev/sda2
brw-rw---- 1 root disk 8, 16 Sep 30 11:23 /dev/sdb
brw-rw---- 1 root disk 8, 17 Sep 30 11:23 /dev/sdb1
brw-rw---- 1 root disk 8, 32 Jul 27  2021 /dev/sdc

asm日志报错

Fri Sep 30 11:31:41 2022
NOTE: SMON starting instance recovery for group DATA domain 1 (mounted)
NOTE: SMON skipping disk 0 - no header
NOTE: cache initiating offline of disk 0 group DATA
NOTE: process _smon_+asm3 (2989) initiating offline of disk 0.3915953109 (DATA_0000) with mask 0x7e in group 1
NOTE: initiating PST update: grp = 1, dsk = 0/0xe968b3d5, mask = 0x6a, op = clear
Fri Sep 30 11:31:41 2022
GMON updating disk modes for group 1 at 4 for pid 17, osid 2989
ERROR: Disk 0 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 1)
Fri Sep 30 11:31:41 2022
NOTE: cache dismounting (not clean) group 1/0x34F84324 (DATA) 
WARNING: Offline for disk DATA_0000 in mode 0x7f failed.
Fri Sep 30 11:31:41 2022
NOTE: halting all I/Os to diskgroup 1 (DATA)
ERROR: No disks with F1X0 found on disk group DATA
NOTE: aborting instance recovery of domain 1 due to diskgroup dismount
NOTE: SMON skipping lock domain (1) validation because diskgroup being dismounted

数据库日志报错

Fri Sep 30 11:31:44 2022
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei3/trace/xifenfei3_lmon_26356.trc:
ORA-00202: control file: '+DATA/xifenfei/controlfile/current.256.968794097'
ORA-15078: ASM diskgroup was forcibly dismounted
Fri Sep 30 11:31:45 2022
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei3/trace/xifenfei3_ckpt_26388.trc:
ORA-00206: error in writing (block 5, # blocks 1) of control file
ORA-00202: control file: '+DATA/xifenfei/controlfile/current.257.968794097'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-00206: error in writing (block 5, # blocks 1) of control file
ORA-00202: control file: '+DATA/xifenfei/controlfile/current.256.968794097'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei3/trace/xifenfei3_ckpt_26388.trc:
ORA-00221: error on write to control file
ORA-00206: error in writing (block 5, # blocks 1) of control file
ORA-00202: control file: '+DATA/xifenfei/controlfile/current.257.968794097'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-00206: error in writing (block 5, # blocks 1) of control file
ORA-00202: control file: '+DATA/xifenfei/controlfile/current.256.968794097'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
CKPT (ospid: 26388): terminating the instance due to error 221

通过kfed 查看asm disk被破坏情况

[root@racdb3 scsi_host]#  kfed read /dev/asmdisk1 
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F4FAAD45400 00000000 00000000 00000000 00000000  [................]
        Repeat 26 times
7F4FAAD455B0 00000000 00000000 45C222C8 01000000  [.........".E....]
7F4FAAD455C0 FE830001 003FFFFF E9D60000 0000FFFF  [......?.........]
7F4FAAD455D0 00000000 00000000 00000000 00000000  [................]
        Repeat 1 times
7F4FAAD455F0 00000000 00000000 00000000 AA550000  [..............U.]
7F4FAAD45600 00000000 00000000 00000000 00000000  [................]
  Repeat 223 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

[root@racdb3 scsi_host]#  kfed read /dev/asmdisk1  aun=2
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F64E77A0400 00000000 00000000 00000000 00000000  [................]
        Repeat 223 times
7F64E77A1200 000081F9 000181F9 000281F9 000381F9  [................]
7F64E77A1210 000481F9 000C81F9 000D81F9 001881F9  [................]
7F64E77A1220 002881F9 003E81F9 007981F9 00AB81F9  [..(...>...y.....]
7F64E77A1230 013881F9 016C81F9 044581F9 04B081F9  [..8...l...E.....]
7F64E77A1240 061A81F9 0CD081F9 1E8481F9 00000000  [................]
7F64E77A1250 00000000 00000000 00000000 00000000  [................]
  Repeat 26 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

[root@racdb3 scsi_host]#  kfed read /dev/asmdisk1  aun=3
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F8D101FF400 00000000 00000000 00000000 00000000  [................]
        Repeat 223 times
7F8D10200200 000082F9 000182F9 000282F9 000382F9  [................]
7F8D10200210 000482F9 000C82F9 000D82F9 001882F9  [................]
7F8D10200220 002882F9 003E82F9 007982F9 00AB82F9  [..(...>...y.....]
7F8D10200230 013882F9 016C82F9 044582F9 04B082F9  [..8...l...E.....]
7F8D10200240 061A82F9 0CD082F9 1E8482F9 00000000  [................]
7F8D10200250 00000000 00000000 00000000 00000000  [................]
  Repeat 26 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

[root@racdb3 scsi_host]#  kfed read /dev/asmdisk1  aun=4
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F142949C400 00000000 00000000 00000000 00000000  [................]
        Repeat 223 times
7F142949D200 000083F9 000183F9 000283F9 000383F9  [................]
7F142949D210 000483F9 000C83F9 000D83F9 001883F9  [................]
7F142949D220 002883F9 003E83F9 007983F9 00AB83F9  [..(...>...y.....]
7F142949D230 013883F9 016C83F9 044583F9 04B083F9  [..8...l...E.....]
7F142949D240 061A83F9 0CD083F9 1E8483F9 00000000  [................]
7F142949D250 00000000 00000000 00000000 00000000  [................]
  Repeat 26 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

[root@racdb3 scsi_host]#  kfed read /dev/asmdisk1  aun=5
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F0615CF6400 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

磁盘前几个au被破坏严重.而且相关的备份block都已经损坏,基于这种情况,直接参考:
asm磁盘dd破坏恢复
 asm disk header 彻底损坏恢复
 asm disk 磁盘部分被清空恢复
通过底层恢复出来相关数据文件,并检测正常

进一步通过au分配列表获恢复redo,ctl等文件

H:\TEMP\asm-ext4\other>dir 
 驱动器 H 中的卷是 SSD-SX
 卷的序列号是 84EB-F434

 H:\TEMP\asm-ext4\other 的目录

2022-09-30  21:52        25,165,824 256.dd
2022-09-30  21:52        25,165,824 257.dd
2022-09-30  23:52        52,429,312 258.dd.1
2022-09-30  23:54        52,429,312 259.dd.1
2022-09-30  23:55        52,429,312 260.dd.1
2022-09-30  23:55        52,429,312 261.dd.1
2022-09-30  23:56        52,429,312 270.dd.1
2022-09-30  23:57        52,429,312 271.dd.1
2022-09-30  23:57        52,429,312 272.dd.1
2022-09-30  23:57        52,429,312 273.dd.1
2022-09-30  23:58        52,429,312 274.dd.1
2022-10-01  00:01        52,429,312 275.dd.1
2022-10-01  00:00        52,429,312 276.dd.1
2022-10-01  00:00        52,429,312 277.dd.1
2022-10-01  00:00        52,429,312 278.dd.1
2022-09-30  23:59        52,429,312 279.dd.1
2022-09-30  23:59        52,429,312 280.dd.1
2022-09-30  23:59        52,429,312 281.dd.1

在另外的新机器上尝试恢复库

[oracle@xifenfei ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Sat Oct 1 10:18:58 2022

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup mount pfile='/tmp/pfile'
ORACLE instance started.

Total System Global Area 1519898624 bytes
Fixed Size                  2253464 bytes
Variable Size             939527528 bytes
Database Buffers          570425344 bytes
Redo Buffers                7692288 bytes
ORA-00227: corrupt block detected in control file: (block 8, # blocks 1)
ORA-00202: control file: '/oradata/256.dd'

控制文件损坏,重建ctl

SQL> CREATE CONTROLFILE REUSE DATABASE "xifenfei" NORESETLOGS  NOARCHIVELOG  
  2      MAXLOGFILES 50  
  3      MAXLOGMEMBERS 5  
  4      MAXDATAFILES 100  
  5      MAXINSTANCES 8  
  6      MAXLOGHISTORY 226  
  7  LOGFILE  
  8  group 7  '/oradata/270.dd.1' size 50M,
  9  group 8  '/oradata/272.dd.1' size 50M,
 10  group 5  '/oradata/274.dd.1' size 50M,
 11  group 6  '/oradata/276.dd.1' size 50M,
 12  group 3  '/oradata/278.dd.1' size 50M,
 13  group 4  '/oradata/280.dd.1' size 50M,
 14  group 1  '/oradata/258.dd.1' size 50M,
 15  group 2  '/oradata/260.dd.1' size 50M
 16  DATAFILE  
 17  '/oradata/1',
 18  '/oradata/2',
 19  '/oradata/3',
 20  '/oradata/4',
 21  '/oradata/5',
 22  '/oradata/6',
 23  '/oradata/7',
 24  '/oradata/8',
 25  '/oradata/9',
 26  '/oradata/10',
 27  '/oradata/11'
 28  CHARACTER SET ZHS16GBK  
 29  ;  

Control file created.

尝试open库,报ORA-600 kqfidps_update_stats:2，ORA-600 4194等错误

SQL> recover database;
Media recovery complete.
SQL> alter database open ;
alter database open 
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [kqfidps_update_stats:2],
[0x7FFCCBEB3EC0], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [4193], [19319], [l.ok

解决该异常，open数据库成功

SQL> startup mount pfile='/tmp/pfile';
ORACLE instance started.

Total System Global Area 1519898624 bytes
Fixed Size                  2253464 bytes
Variable Size             939527528 bytes
Database Buffers          570425344 bytes
Redo Buffers                7692288 bytes
Database mounted.
SQL> alter database open;

Database altered.

导出数据库,遭遇个别表如下ORA-08103和ORA-01555两种错误，这种是由于个别block在做成文件系统的时候被损坏,底层恢复的时候block被置空导致,对其异常表进行单独处理即可

. . 正在导出表                           ALBUM
EXP-00056: 遇到 ORACLE 错误 8103
ORA-08103: 对象不再存在

. . 正在导出表                  M_PUSH_CONTENT
EXP-00056: 遇到 ORACLE 错误 1555
ORA-01555: 快照过旧: 回退段号  (名称为 "") 过小
ORA-22924: 快照太旧

通过上述操作,实现客户数据的恢复,最大限度挽回客户损坏,再次提醒对于asm disk进行了误操作,建议第一时间保护现场(不要有任何的写入操作,可以最大限度恢复数据)

ORA-15335 ORA-15130 ORA-15066 ORA-15196

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：ORA-15335 ORA-15130 ORA-15066 ORA-15196

客户反馈,数据库无法正常启动，通过分析asm的alert日志发现,data磁盘组mount成功之后,没有一会儿自动dismount掉

Mon Sep 26 16:40:14 2022
SQL> /* ASMCMD */ALTER DISKGROUP data MOUNT  
NOTE: cache registered group DATA number=2 incarn=0x9dfa705f
NOTE: cache began mount (first) of group DATA number=2 incarn=0x9dfa705f
NOTE: Assigning number (2,1) to disk (/dev/oracleasm/disks/DATA02)
NOTE: Assigning number (2,0) to disk (/dev/oracleasm/disks/DATA01)
Mon Sep 26 16:40:20 2022
NOTE: GMON heartbeating for grp 2
GMON querying group 2 at 68 for pid 25, osid 14650
NOTE: cache opening disk 0 of grp 2: DATA_0000 path:/dev/oracleasm/disks/DATA01
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 of grp 2: DATA_0001 path:/dev/oracleasm/disks/DATA02
NOTE: cache mounting (first) external redundancy group 2/0x9DFA705F (DATA)
Mon Sep 26 16:40:20 2022
* allocate domain 2, invalid = TRUE 
kjbdomatt send to inst 2
Mon Sep 26 16:40:20 2022
NOTE: attached to recovery domain 2
NOTE: cache recovered group 2 to fcn 0.321845
NOTE: redo buffer size is 256 blocks (1053184 bytes)
Mon Sep 26 16:40:20 2022
NOTE: LGWR attempting to mount thread 1 for diskgroup 2 (DATA)
NOTE: LGWR found thread 1 closed at ABA 20.3546
NOTE: LGWR mounted thread 1 for diskgroup 2 (DATA)
NOTE: LGWR opening thread 1 at fcn 0.321845 ABA 21.3547
NOTE: cache mounting group 2/0x9DFA705F (DATA) succeeded
NOTE: cache ending mount (success) of group DATA number=2 incarn=0x9dfa705f
Mon Sep 26 16:40:20 2022
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 2
SUCCESS: diskgroup DATA was mounted
SUCCESS: /* ASMCMD */ALTER DISKGROUP data MOUNT 
Mon Sep 26 16:40:22 2022
WARNING: failed to online diskgroup resource ora.DATA.dg (unable to communicate with CRSD/OHASD)
Mon Sep 26 16:40:47 2022
NOTE: client xff1:xff registered, osid 14742, mbr 0x0
Mon Sep 26 16:40:57 2022
WARNING: cache read  a corrupt block: group=2(DATA) dsk=1 blk=257 disk=1 (DATA_0001) 
incarn=3916071178 au=113792 blk=1 count=1
Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
NOTE: a corrupted block from group DATA was dumped to /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trc
WARNING: cache read (retry) a corrupt block: group=2(DATA) dsk=1 blk=257 
disk=1 (DATA_0001) incarn=3916071178 au=113792 blk=1 count=1
Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ERROR: cache failed to read group=2(DATA) dsk=1 blk=257 from disk(s): 1(DATA_0001)
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
NOTE: cache initiating offline of disk 1 group DATA
NOTE: process _user14778_+asm1 (14778) initiating offline of 
disk 1.3916071178 (DATA_0001) with mask 0x7e in group 2
NOTE: initiating PST update: grp = 2, dsk = 1/0xe96a810a, mask = 0x6a, op = clear
Mon Sep 26 16:40:58 2022
GMON updating disk modes for group 2 at 70 for pid 28, osid 14778
ERROR: Disk 1 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 2)
Mon Sep 26 16:40:58 2022
NOTE: cache dismounting (not clean) group 2/0x9DFA705F (DATA) 
WARNING: Offline for disk DATA_0001 in mode 0x7f failed.
NOTE: messaging CKPT to quiesce pins Unix process pid: 14782, image: oracle@oracle11grac1 (B000)
Mon Sep 26 16:40:58 2022
NOTE: halting all I/Os to diskgroup 2 (DATA)
Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trc  (incident=144548):
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0001" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
Incident details in: /opt/grid/diag/asm/+asm/+ASM1/incident/incdir_144548/+ASM1_ora_14778_i144548.trc
Mon Sep 26 16:40:58 2022
Sweep [inc][144548]: completed
System State dumped to trace file /opt/grid/diag/asm/+asm/+ASM1/incident/incdir_144548/+ASM1_ora_14778_i144548.trc
Mon Sep 26 16:40:58 2022
NOTE: AMDU dump of disk group DATA created at /opt/grid/diag/asm/+asm/+ASM1/incident/incdir_144548
Mon Sep 26 16:41:00 2022
NOTE: LGWR doing non-clean dismount of group 2 (DATA)
NOTE: LGWR sync ABA=21.3550 last written ABA 21.3550
Mon Sep 26 16:41:00 2022
Sweep [inc2][144548]: completed
Mon Sep 26 16:41:00 2022
ERROR: ORA-15130 in COD recovery for diskgroup 2/0x9dfa705f (DATA)
ERROR: ORA-15130 thrown in RBAL for group number 2
Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_5162.trc:
ORA-15130: diskgroup "DATA" is being dismounted

这里看主要是由于asm 磁盘组需要做COD recovery导致无法正常稳定的mount,主要原因是遭遇到asm disk的逻辑坏块(存储物理上看是ok的,但是实际数据在asm中看是异常的)

数据库alert日志报错

Mon Sep 26 16:40:52 2022
Successful mount of redo thread 1, with mount id 1097279951
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Lost write protection disabled
Completed: alter database mount
alter database open
This instance was first to open
Picked broadcast on commit scheme to generate SCNs
LGWR: STARTING ARCH PROCESSES
Mon Sep 26 16:40:56 2022
ARC0 started with pid=40, OS id=14761 
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Mon Sep 26 16:40:57 2022
ARC1 started with pid=41, OS id=14764 
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_lgwr_14479.trc:
ORA-00313: ??????? 1 (???? 1) ???
Mon Sep 26 16:40:57 2022
ARC2 started with pid=42, OS id=14766 
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_lgwr_14479.trc:
ORA-00313: ??????? 2 (???? 1) ???
Mon Sep 26 16:40:57 2022
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-00313: open failed for members of log group 1 of thread 1
Mon Sep 26 16:40:57 2022
ARC3 started with pid=44, OS id=14770 
ARC1: Archival started
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-00313: open failed for members of log group 1 of thread 1
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc2_14766.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc1_14764.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc  (incident=180281):
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0001" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc0_14761.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215
ORA-15130: diskgroup "DATA" is being dismounted
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc3_14770.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215
ORA-15130: diskgroup "DATA" is being dismounted
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc0_14761.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215
ORA-15130: diskgroup "DATA" is being dismounted
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc3_14770.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215
ORA-15130: diskgroup "DATA" is being dismounted
Unable to create archive log file '+DATA'
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-19816: WARNING: Files may exist in db_recovery_file_dest that are not known to database.
ORA-17502: ksfdcre:4 Failed to create file +DATA
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0001" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
*************************************************************
WARNING: A file of type ARCHIVED LOG may exist in
db_recovery_file_dest that is not known to the database.
Use the RMAN command CATALOG RECOVERY AREA to re-catalog
any such files. If files cannot be cataloged, then manually
delete them using OS command. This is most likely the
result of a crash during file creation.
*************************************************************
ARCH: Error 19504 Creating archive log file to '+DATA'
NOTE: Deferred communication with ASM instance
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-15130: diskgroup "DATA" is being dismounted
NOTE: deferred map free for map id 23
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-16038: log 1 sequence# 14235 cannot be archived
ORA-19504: failed to create file ""
ORA-00312: online log 1 thread 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-00312: online log 1 thread 1: '+ARCH/xff/onlinelog/group_1.279.1025610217'
Mon Sep 26 16:40:58 2022
Sweep [inc][180281]: completed
Sweep [inc2][180281]: completed
USER (ospid: 14732): terminating the instance due to error 16038
Mon Sep 26 16:40:59 2022
System state dump requested by (instance=1, osid=14732), summary=[abnormal instance termination].
Instance terminated by USER, pid = 14732

对于这类故障处理相对比较容易,通过patch asm，让data磁盘组稳定mount,然后open库,迁移数据,实现数据0丢失,完美恢复

分类目录归档：Oracle ASM