asm disk 磁盘部分被清空恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:asm disk 磁盘部分被清空恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

由于未知原因导致磁盘组的一个磁盘前面1G被置空(data磁盘组一共5个1T磁盘,第一个盘未知原因置空了1G数据)

[root@oradb1 ~]#kfed read /dev/mapper/asm-disk1
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F539088C400 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]
…………
[root@oradb1 ~]#kfed read /dev/mapper/asm-disk1 aun=999
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F8FEB795400 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

通过分析disk 1和disk 2的前面1001MB数据,以及asm的数据分布特性,可以确认丢失的1000MB数据为39号文件的数据(和客户当时这个库是通过rman还原过来的操作有关系)
20220330224604
20220330225211


对于这样的情况,通过底层扫描,可以很好的恢复数据,参考以前类似文章
asm disk被加入vg恢复
asm disk header 彻底损坏恢复
asm磁盘dd破坏恢复
通过底层恢复,恢复结果如下
20220331133634

dbv检查数据文件
20220331134249

然后把数据库恢复出来数据文件中的数据恢复到新库中,至此完成该case恢复

删除分区 oracle asm disk 恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:删除分区 oracle asm disk 恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到一个朋友数据库故障请求case.大概操作是这样的:有一个39T的lun,通过parted分了15个分区,给oracle asm使用创建磁盘组data4,然后分了4个分区做成data5(由于ausize写错误了),删除掉磁盘组和这四个分区.然后重新分配了6个分区,并且使用最后5个分区创建了data5磁盘组.使用了一段时间之后,由于oracle空间不足,检查的时候误以为这个lun就前面15个分区使用,人工把后面的6个分区给删除了,并且创建了4个新分区,然后发现数据库crash了,发现误删除了在使用的分区.然后又把新创建的4个分区给删除了.接手该故障的时候,这个39T lun的分区信息如下

[root@node1 linux64]# parted /dev/mapper/36000d31003d39e000000000000000004
GNU Parted 2.1
Using /dev/mapper/36000d31003d39e000000000000000004
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print                                                            
Model: Linux device-mapper (multipath) (dm)
Disk /dev/mapper/36000d31003d39e000000000000000004: 39.6TB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

Number  Start   End     Size    File system  Name         Flags
 1      2097kB  2000GB  2000GB               asmdata4-1
 2      2000GB  4000GB  2000GB               asmdata4-2
 3      4000GB  6000GB  2000GB               asmdata4-3
 4      6000GB  8000GB  2000GB               asmdata4-4
 5      8000GB  10.0TB  2000GB               asmdata4-5
 6      10.0TB  12.0TB  2000GB               asmdata4-6
 7      12.0TB  14.0TB  2000GB               asmdata4-7
 8      14.0TB  16.0TB  2000GB               asmdata4-8
 9      16.0TB  18.0TB  2000GB               asmdata4-9
10      18.0TB  20.0TB  2000GB               asmdata4-10
11      20.0TB  22.0TB  2000GB               asmdata4-11
12      22.0TB  24.0TB  2000GB               asmdata4-12
13      24.0TB  26.0TB  2000GB               asmdata4-13
14      26.0TB  28.0TB  2000GB               asmdata4-14
15      28.0TB  30.0TB  2000GB               asmdata4-15

客户正常使用情况下,这个lun上面相关分区的asm disk信息

 SQL> CREATE DISKGROUP DATA4 EXTERNAL REDUNDANCY  DISK 
'/dev/mapper/36000d31003d39e000000000000000004p1' SIZE 1907346M ,
 '/dev/mapper/36000d31003d39e000000000000000004p2' SIZE 1907350M ,
 '/dev/mapper/36000d31003d39e000000000000000004p3' SIZE 1907348M ,
 '/dev/mapper/36000d31003d39e000000000000000004p4' SIZE 1907348M ,
 '/dev/mapper/36000d31003d39e000000000000000004p5' SIZE 1907350M  
ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='8M' /* ASMCA */ 

SQL> ALTER DISKGROUP DATA4 ADD  DISK 
'/dev/mapper/36000d31003d39e000000000000000004p10' SIZE 1907348M ,
'/dev/mapper/36000d31003d39e000000000000000004p6' SIZE 1907348M ,
'/dev/mapper/36000d31003d39e000000000000000004p7' SIZE 1907348M ,
'/dev/mapper/36000d31003d39e000000000000000004p8' SIZE 1907350M ,
'/dev/mapper/36000d31003d39e000000000000000004p9' SIZE 1907348M /* ASMCA */ 

SQL> ALTER DISKGROUP DATA4 ADD  DISK 
'/dev/mapper/36000d31003d39e000000000000000004p11' SIZE 1907348M ,
'/dev/mapper/36000d31003d39e000000000000000004p12' SIZE 1907350M ,
'/dev/mapper/36000d31003d39e000000000000000004p13' SIZE 1907348M ,
'/dev/mapper/36000d31003d39e000000000000000004p14' SIZE 1907348M ,
'/dev/mapper/36000d31003d39e000000000000000004p15' SIZE 1907350M /* ASMCA */ 

SQL> CREATE DISKGROUP DATA5 EXTERNAL REDUNDANCY  DISK 
'/dev/mapper/36000d31003d39e000000000000000004p17' SIZE 1716614M ,
'/dev/mapper/36000d31003d39e000000000000000004p18' SIZE 1716614M ,
'/dev/mapper/36000d31003d39e000000000000000004p19' SIZE 1716614M ,
'/dev/mapper/36000d31003d39e000000000000000004p20' SIZE 1716614M ,
'/dev/mapper/36000d31003d39e000000000000000004p21' SIZE 1621246M  
ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='4M' /* ASMCA */ 

基于客户现在的情况,data4中的所有分区都正常,主要是要找出来data5中的5个分区的数据.因为客户不确定p16分区大小,导致后续的5个分区起始位置不好定位.从而使得恢复无法进行.通过shell脚本结合kfed尝试定位asm disk header信息

#!/bin/bash
j=xxxxxxxxxxx
for ((i=xxxxxx; i<=j; i++))
do
 echo "-----$i--------" >> /home/get_au1.txt
 kfed read /dev/mapper/36000d31003d39e000000000000000004 aun=$i |
 > grep  "kfdhdb.dskname:              DATA" >> /home/get_au.txt
done

结果发现无法获取到结果,通过分析发现这里由于lun过大,导致aun值过大,从而使得kfed溢出无法读取到正常值.根据parted的特性,人工dd部分block进行分析

[root@node1 bak]# kfed read xifenfei.dd aun=134|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  3357988283 ; 0x00c: 0xc826d5bb
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:         ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]:            0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                186646528 ; 0x020: 0x0b200000
kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:              DATA5_0000 ; 0x028: length=10
kfdhdb.grpname:                   DATA5 ; 0x048: length=5
kfdhdb.fgname:               DATA5_0000 ; 0x068: length=10
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             33116450 ; 0x0a8: HOUR=0x2 DAYS=0x9 MNTH=0x4 YEAR=0x7e5
kfdhdb.crestmp.lo:            477378560 ; 0x0ac: USEC=0x0 MSEC=0x10e SECS=0x7 MINS=0x7
kfdhdb.mntstmp.hi:             33116450 ; 0x0b0: HOUR=0x2 DAYS=0x9 MNTH=0x4 YEAR=0x7e5
kfdhdb.mntstmp.lo:            486256640 ; 0x0b4: USEC=0x0 MSEC=0x2ec SECS=0xf MINS=0x7
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  4194304 ; 0x0bc: 0x00400000
kfdhdb.mfact:                    454272 ; 0x0c0: 0x0006ee80
kfdhdb.dsksize:                  429153 ; 0x0c4: 0x00068c61
kfdhdb.pmcnt:                         2 ; 0x0c8: 0x00000002

顺利找到了data5中的第一块磁盘,而且确定了起始位置,然后构造相关的dd语句把分区的数据dd到一个新磁盘中

dd if=/dev/mapper/36000d31003d39e000000000000000004 bs=4M skip=xxxxx count=429153 of=/dev/sdu

然后通过kfed查看数据
20210704160347


通过类似方法依次处理,最终把5块asm disk全部找到,并且顺利dd到新的磁盘中.尝试启动crs,并mount data5
20210704153634

20210704153548

data5 磁盘组mount成功之后,数据库顺利启动,实现lun中删除分区之后,asm磁盘组数据完美恢复
20210704153728

这次运气还不错,仅仅是对lun的分区使用了parted进行了删除和创建等操作,没有格式化文件系统和做成新的asm disk,不然数据会有一部分丢失.对于有部分破坏的分区,需要通过底层碎片的方法进行最大限度抢救数据.参考类似文档:
asm disk被加入vg恢复
又一例asm格式化文件系统恢复
文件系统损坏导致数据文件异常恢复
一次完美的asm disk被格式化ntfs恢复
Oracle 数据文件大小为0kb或者文件丢失恢复
再一起asm disk被格式化成ext3文件系统故障恢复
oracle asm disk格式化恢复—格式化为ext4文件系统
oracle asm disk格式化恢复—格式化为ntfs文件系统
分享oracleasm createdisk重新创建asm disk后数据0丢失恢复案例

win asm disk header 异常恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:win asm disk header 异常恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友反馈win环境下rac异常,asm无法正常mount,检查日志发现

Fri Jul 03 03:55:46 2020
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7004.trc:
ORA-15025: could not open disk "\\.\ORCLDISKDATA1"
ORA-27041: unable to open file
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7004.trc:
ORA-15025: could not open disk "\\.\ORCLDISKDATA1"
ORA-27041: unable to open file
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
WARNING: failed to read mirror side 1 of virtual extent 0 logical extent 0 of file 267 in group [2.2254399778] 
from disk DATA_0000  allocation unit 3502 reason error; if possible, will try another mirror side
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7004.trc:
ORA-15081: failed to submit an I/O operation to a disk
Fri Jul 03 03:59:46 2020
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7328.trc:
ORA-15025: could not open disk "\\.\ORCLDISKDATA1"
ORA-27041: unable to open file
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7328.trc:
ORA-15025: could not open disk "\\.\ORCLDISKDATA1"
ORA-27041: unable to open file
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
WARNING: failed to read mirror side 1 of virtual extent 0 logical extent 0 of file 267 in group [2.2254399778] 
from disk DATA_0000  allocation unit 3502 reason error; if possible, will try another mirror side
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7328.trc:
ORA-15081: failed to submit an I/O operation to a disk

报错信息比较明显是由于无法找到\\.\ORCLDISKDATA1磁盘,因此异常,通过asmtool查看磁盘信息

C:\app\11.2.0\grid>asmtool -list
NTFS                             \Device\Harddisk0\Partition3            81920M
NTFS                             \Device\Harddisk0\Partition4           200000M
NTFS                             \Device\Harddisk0\Partition5          4293849M
                                 \Device\Harddisk1\Partition2             4062M
                                 \Device\Harddisk2\Partition2          2097022M
ORCLDISKFRA0                     \Device\Harddisk3\Partition2           511870M

明显的发现ORCLDISKDATA1磁盘丢失,通过对磁盘dd到本地然后进行分析发现,asm disk header损坏

C:\Users\Administrator>kfed read F:\temp\disk3\1\disk2.dd
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
006B38C00 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]


C:\Users\Administrator>kfed read F:\temp\disk3\1\disk2.dd blkn=2
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                       2 ; 0x004: blk=2
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  2349305287 ; 0x00c: 0x8c078dc7
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdatb.aunum:                         0 ; 0x000: 0x00000000
kfdatb.shrink:                      448 ; 0x004: 0x01c0
kfdatb.ub2pad:                        0 ; 0x006: 0x0000
kfdatb.auinfo[0].link.next:           8 ; 0x008: 0x0008
kfdatb.auinfo[0].link.prev:           8 ; 0x00a: 0x0008
kfdatb.auinfo[1].link.next:          12 ; 0x00c: 0x000c
kfdatb.auinfo[1].link.prev:          12 ; 0x00e: 0x000c
kfdatb.auinfo[2].link.next:         456 ; 0x010: 0x01c8
kfdatb.auinfo[2].link.prev:         456 ; 0x012: 0x01c8
kfdatb.auinfo[3].link.next:         488 ; 0x014: 0x01e8
kfdatb.auinfo[3].link.prev:         488 ; 0x016: 0x01e8
kfdatb.auinfo[4].link.next:          24 ; 0x018: 0x0018
kfdatb.auinfo[4].link.prev:          24 ; 0x01a: 0x0018
kfdatb.auinfo[5].link.next:          28 ; 0x01c: 0x001c
kfdatb.auinfo[5].link.prev:          28 ; 0x01e: 0x001c
kfdatb.auinfo[6].link.next:         552 ; 0x020: 0x0228
kfdatb.auinfo[6].link.prev:        3112 ; 0x022: 0x0c28
kfdatb.spare:                         0 ; 0x024: 0x00000000
kfdate[0].discriminator:              1 ; 0x028: 0x00000001
kfdate[0].allo.lo:                    0 ; 0x028: XNUM=0x0
kfdate[0].allo.hi:              8388608 ; 0x02c: V=1 I=0 H=0 FNUM=0x0
kfdate[1].discriminator:              1 ; 0x030: 0x00000001
kfdate[1].allo.lo:                    0 ; 0x030: XNUM=0x0
kfdate[1].allo.hi:              8388608 ; 0x034: V=1 I=0 H=0 FNUM=0x0
kfdate[2].discriminator:              1 ; 0x038: 0x00000001
kfdate[2].allo.lo:                    0 ; 0x038: XNUM=0x0
kfdate[2].allo.hi:              8388609 ; 0x03c: V=1 I=0 H=0 FNUM=0x1

fra磁盘虽然磁盘asm label信息存在,但是其他信息依旧损坏,但是也只是磁盘头信息损坏
20200705203336


通过现场分析,基本上可以确定是由于某种原因导致win asm 的磁盘的所有磁盘头都损坏(两个磁盘头被置空,另外一个磁盘头基本上损坏),基于原因未知
基于客户现场的情况,以及他们有前一天的rman备份,而且客户有保障现场(进一步故障原因分析)的需求,未在现场环境进行恢复,而是在不对现场环境做任何修改的情况下,直接恢复fra里面的redo和归档日志,进而结合备份异地实现数据库恢复,实现数据0丢失,又不破坏现场的效果
20200705203742

以前遇到过类似我其他操作系统平台中asm disk header异常的case:
asm磁盘分区丢失恢复
pvid=yes导致asm无法mount
asm磁盘头全部损坏数据0丢失恢复
分区无法识别导致asm diskgroup无法mount
asm disk误设置pvid导致asm diskgroup无法mount恢复

asm磁盘组操作不当导致数据文件丢失恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:asm磁盘组操作不当导致数据文件丢失恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最近遇到数据库恢复case,客户是要更换存储,在数据库mount状态把使用omf方式存储数据的asm 磁盘组通过rman copy到新的通过别名方式存储的新的asm 磁盘组的存储中,但是由于操作人员粗心,copy语句中部分目标磁盘组的数据文件别名重复了,最后执行rename file之后,导致部分数据文件彻底丢失.我们通过底层碎片扫描(参考:asm disk header 彻底损坏恢复)对于该用户的数据实现完全恢复.
因为整个过程重现比较麻烦,这里测试从一个data磁盘组中有一个omf方式存储的含有两个数据文件的表空间,通过rman copy 把这个表空间的两个文件拷贝到datanew磁盘组中,但是由于粗心把两个数据文件的别名写成一样,结果导致该表空间的一个数据文件彻底丢失的测试.

创建测试表空间
在datanew磁盘组中创建omf方式管理的xifenfei表空间,含有两个数据文件,file#分别为14和15

SQL> create tablespace xifenfei datafile '+DATA' SIZE 128m;
Tablespace created.
SQL> ALTER TABLESPACE XIFENFEI ADD DATAFILE '+DATA' SIZE 128m AUTOEXTEND ON;
Tablespace altered.
SQL> SELECT FILE_NAME,FILE_ID FROM  DBA_DATA_FILES WHERE TABLESPACE_NAME='XIFENFEI';
FILE_NAME
--------------------------------------------------------------------------------
   FILE_ID
----------
+DATA/XFF/DATAFILE/xifenfei.276.961143809
        14
+DATA/XFF/DATAFILE/xifenfei.277.961143825
        15

rman copy datafile 14
通过rman copy把datafile 14拷贝到data磁盘组中,目标端为别名方式存储

RMAN> copy datafile 14 to '+datanew/xifenfei.dbf';
Starting backup at 27-NOV-17
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=24 device type=DISK
channel ORA_DISK_1: starting datafile copy
input datafile file number=00014 name=+DATA/XFF/DATAFILE/xifenfei.276.961143809
output file name=+DATANEW/xifenfei.dbf tag=TAG20171127T082643 RECID=4 STAMP=961144006
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:07
Finished backup at 27-NOV-17
[grid@localhost ~]$ asmcmd
ASMCMD> cd datanew
ASMCMD> ls
XFF/
xifenfei.dbf
ASMCMD> ls -l
Type      Redund  Striped  Time             Sys  Name
                                            Y    XFF/
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  N    xifenfei.dbf => +DATANEW/XFF/DATAFILE/XIFENFEI.256.961144003
ASMCMD>

这里通过asmcmd的ls命令,可以看到虽然我们存储的为datanew磁盘组的别名文件,实际上是link到asm的omf方式的文件(本质上asm中的文件都是omf方式存储,只是在使用的时候体现asm的客户端程序方式不一样,是直接asm中的omf方式,还是asm中的别名).

rman copy datafile 15
通过rman copy把datafile 15 拷贝到和datafile 14别名一样的文件了

RMAN> copy datafile 15 to '+datanew/xifenfei.dbf';
Starting backup at 27-NOV-17
using channel ORA_DISK_1
channel ORA_DISK_1: starting datafile copy
input datafile file number=00015 name=+DATA/XFF/DATAFILE/xifenfei.277.961143825
output file name=+DATANEW/xifenfei.dbf tag=TAG20171127T082731 RECID=5 STAMP=961144053
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:03
Finished backup at 27-NOV-17
ASMCMD> ls -l
Type      Redund  Striped  Time             Sys  Name
                                            Y    XFF/
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  N    xifenfei.dbf => +DATANEW/XFF/DATAFILE/XIFENFEI.256.961144003
ASMCMD> cd xff
ASMCMD> ls
DATAFILE/
ASMCMD> cd datafile
ASMCMD> ls
XIFENFEI.256.961144003
ASMCMD>

这里可以看出来,在data磁盘组中,file 14被file 15覆盖掉了

rename file
把data磁盘组中的数据文件rename 到datanew磁盘组中

SQL> alter database rename file '+DATA/XFF/DATAFILE/xifenfei.276.961143809' to '+datanew/xifenfei.dbf';
Database altered.
SQL> alter database rename file '+DATA/XFF/DATAFILE/xifenfei.277.961143825' to '+datanew/xifenfei.dbf';
alter database rename file '+DATA/XFF/DATAFILE/xifenfei.277.961143825' to '+datanew/xifenfei.dbf'
*
ERROR at line 1:
ORA-01511: error in renaming log/data files
ORA-01523: cannot rename data file to '+data/xifenfei.dbf' - file already part of database

这里我们可以看到,file 14 rename 成功,但是file 15 rename失败,因为在数据库中,已经有了别名的文件(数据文件的路径)

omf自动删除文件
查看原磁盘组datanew中,发现datafile 14被自动删除

ASMCMD> pwd
+DATA/XFF/DATAFILE
ASMCMD> ls -l
Type      Redund  Striped  Time             Sys  Name
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  Y    SYSAUX.257.942061433
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  Y    SYSTEM.256.942061393
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  Y    UNDOTBS1.258.942061449
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  Y    USERS.259.942061449
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  Y    XIFENFEI.277.961143825
ASMCMD>

alert日志证实数据文件被删除

2017-11-27T09:05:03.054741-05:00
alter database rename file '+DATA/XFF/DATAFILE/xifenfei.276.961143809' to '+datanew/xifenfei.dbf'
2017-11-27T09:05:03.114947-05:00
NOTE: Under CF enqueue, no dependency request for disk group DATANEW
Deleted Oracle managed file +DATA/XFF/DATAFILE/xifenfei.276.961143809
Completed: alter database rename file '+DATA/XFF/DATAFILE/xifenfei.276.961143809' to '+datanew/xifenfei.dbf'
2017-11-27T09:05:21.471474-05:00
alter database rename file '+DATA/XFF/DATAFILE/xifenfei.277.961143825' to '+data/xifenfei.dbf'
ORA-1511 signalled during:alter database rename file
      '+DATA/XFF/DATAFILE/xifenfei.277.961143825' to'+datanew/xifenfei.dbf'

这里可以证实,数据文件的omf方式管理,在数据文件执行rename file的时候,会自动删除掉老的数据文件.这里悲剧已经发生,由于rman copy 覆盖了datanew磁盘组中的datafile 14,rename file又导致data磁盘组中的datafile 14被自动删除,从而使得datafile 14这个数据文件在两个磁盘组中都丢失.从常规角度来说,如果没有合适的备份该文件无法恢复.如果遭遇到oracle asm中数据文件丢失或者部分覆盖,请保护现场,联系我们(ORACLE数据库恢复技术支持),将为您提供专业数据库技术支持:Phone:17813235971    Q Q:107644445    E-Mail:dba@xifenfei.com最大限度抢救您的数据

asm磁盘头全部损坏数据0丢失恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:asm磁盘头全部损坏数据0丢失恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到朋友反馈说他们公司的10.2.0.4(无磁盘头备份)asm 磁盘头都损坏了,确定是被人恶意dd掉了磁盘头的1k,他通过查询发过来结果如下
asm_disk_header


分析alert日志,确定磁盘组和磁盘信息
asm mount data磁盘组报错

Sun Apr 16 21:39:31 2017
NOTE: cache dismounting group 2/0x3F94036B (DATA)
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DATA was not mounted
Sun Apr 16 21:39:31 2017
ERROR: no PST quorum in group 3: required 2, found 0

data磁盘组和磁盘信息

Mon Mar 20 16:21:59 2017
NOTE: Hbeat: instance not first (grp 3)
NOTE: cache opening disk 2 of grp 2: DATA_0002 path:/dev/raw/raw21
Mon Mar 20 16:21:59 2017
NOTE: F1X0 found on disk 2 fcn 0.47624333
NOTE: cache opening disk 3 of grp 2: DATA_0003 path:/dev/raw/raw22
NOTE: cache opening disk 4 of grp 2: DATA_0004 path:/dev/raw/raw23
NOTE: cache opening disk 5 of grp 2: DATA_0005 path:/dev/raw/raw24
NOTE: F1X0 found on disk 5 fcn 0.47624333
NOTE: cache opening disk 6 of grp 2: DATA_0006 path:/dev/raw/raw26
NOTE: cache opening disk 7 of grp 2: DATA_0007 path:/dev/raw/raw25
NOTE: F1X0 found on disk 7 fcn 0.47624333
NOTE: cache mounting (not first) group 2/0x01B869DC (DATA)
Mon Mar 20 16:21:59 2017
kjbdomatt send to node 1
Mon Mar 20 16:21:59 2017
NOTE: attached to recovery domain 2
Mon Mar 20 16:21:59 2017
NOTE: opening chunk 2 at fcn 0.201560874 ABA
NOTE: seq=614 blk=4144
Mon Mar 20 16:21:59 2017
NOTE: cache mounting group 2/0x01B869DC (DATA) succeeded
SUCCESS: diskgroup DATA was mounted

最后一次正常mount是使用了raw21-raw26的裸设备为data磁盘组,但是这里从DATA_002开始,表明很可能最初了两个asm disk被删除,继续分析alert日志

Mon Oct 15 01:53:16 2012
 CREATE DISKGROUP DATA Normal REDUNDANCY  DISK '/dev/raw/raw6' SIZE 1144409M ,
'/dev/raw/raw7' SIZE 1144409M
Sat Dec 27 22:41:39 2014
 alter diskgroup data add disk '/dev/raw/raw21'
Sat Dec 27 22:41:54 2014
alter diskgroup data add disk '/dev/raw/raw22'
Sat Dec 27 22:42:14 2014
 alter diskgroup data add disk '/dev/raw/raw23'
Sat Dec 27 22:42:31 2014
 alter diskgroup data add disk '/dev/raw/raw24'
Sat Dec 27 22:42:51 2014
 alter diskgroup data add disk '/dev/raw/raw26'
Sat Dec 27 22:43:10 2014
alter diskgroup data add disk '/dev/raw/raw25'
Mon Dec 29 17:55:07 2014
 alter diskgroup data drop disk 'DATA_0000'
Tue Dec 30 03:14:42 2014
 alter diskgroup data drop disk 'DATA_0001'

kfed确认磁盘头损坏情况
通过kfed分析dd出来的磁盘头发现每个磁盘头都一样,第一个block损坏

[oracle@rac1 xifenfei]$ kfed read raw22
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F21AF427400 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]
[oracle@rac1 xifenfei]$ kfed read raw22 blkn=2|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       2 ; 0x004: blk=2
kfbh.block.obj:              2147483651 ; 0x008: disk=3
kfbh.check:                  2184525105 ; 0x00c: 0x82353531
kfbh.fcn.base:                 47625389 ; 0x010: 0x02d6b4ad
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdatb10.aunum:                       0 ; 0x000: 0x00000000
kfdatb10.shrink:                    448 ; 0x004: 0x01c0
kfdatb10.ub2pad:                      0 ; 0x006: 0x0000
kfdatb10.auinfo[0].link.next:         8 ; 0x008: 0x0008
kfdatb10.auinfo[0].link.prev:         8 ; 0x00a: 0x0008
kfdatb10.auinfo[0].free:              0 ; 0x00c: 0x0000
kfdatb10.auinfo[0].total:           448 ; 0x00e: 0x01c0
kfdatb10.auinfo[1].link.next:        16 ; 0x010: 0x0010
kfdatb10.auinfo[1].link.prev:        16 ; 0x012: 0x0010
kfdatb10.auinfo[1].free:              0 ; 0x014: 0x0000
kfdatb10.auinfo[1].total:             0 ; 0x016: 0x0000
kfdatb10.auinfo[2].link.next:        24 ; 0x018: 0x0018
kfdatb10.auinfo[2].link.prev:        24 ; 0x01a: 0x0018
kfdatb10.auinfo[2].free:              0 ; 0x01c: 0x0000
kfdatb10.auinfo[2].total:             0 ; 0x01e: 0x0000
kfdatb10.auinfo[3].link.next:        32 ; 0x020: 0x0020
kfdatb10.auinfo[3].link.prev:        32 ; 0x022: 0x0020

恢复思路
确定磁盘是只被干掉了第一个block,但是由于asm 是10.2.0.4的,没有asm disk header的备份,因此也只能自己去人工kfed修复.但是考虑到该case中所有的asm disk header 全部丢失,无任何参考,完全修复比较麻烦,另外这个库也比较小,考虑修复asm disk header 关键部位,然后通过工具直接拷贝出来数据文件,在文件系统中open库的思路.主要需要恢复磁盘头基本信息(diskname,disksize,disknum,ausize,blocksize,file directory等)

通过kfed找出来file directory

kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            4 ; 0x002: KFBTYP_FILEDIR
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       2 ; 0x004: blk=2
kfbh.block.obj:                       1 ; 0x008: file=1
kfbh.check:                  2363360058 ; 0x00c: 0x8cde033a
kfbh.fcn.base:                 48245591 ; 0x010: 0x02e02b57
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfffdb.node.incarn:                   1 ; 0x000: A=1 NUMM=0x0
kfffdb.node.frlist.number:   4294967295 ; 0x004: 0xffffffff
kfffdb.node.frlist.incarn:            0 ; 0x008: A=0 NUMM=0x0
kfffdb.hibytes:                       0 ; 0x00c: 0x00000000
kfffdb.lobytes:                 1048576 ; 0x010: 0x00100000
kfffdb.xtntcnt:                       3 ; 0x014: 0x00000003
kfffdb.xtnteof:                       3 ; 0x018: 0x00000003
kfffdb.blkSize:                    4096 ; 0x01c: 0x00001000
kfffdb.flags:                        65 ; 0x020: O=1 S=0 S=0 D=0 C=0 I=0 R=1 A=0
kfffdb.fileType:                     15 ; 0x021: 0x0f
kfffdb.dXrs:                         19 ; 0x022: SCHE=0x1 NUMB=0x3

通过kfed找出来disk directory

kfffde[0].xptr.au:                    4 ; 0x4a0: 0x00000004
kfffde[0].xptr.disk:                  7 ; 0x4a4: 0x0007
kfffde[0].xptr.flags:                 0 ; 0x4a6: L=0 E=0 D=0 S=0
kfffde[0].xptr.chk:                  41 ; 0x4a7: 0x29
kfffde[1].xptr.au:                17405 ; 0x4a8: 0x000043fd
kfffde[1].xptr.disk:                  6 ; 0x4ac: 0x0006
kfffde[1].xptr.flags:                 0 ; 0x4ae: L=0 E=0 D=0 S=0
kfffde[1].xptr.chk:                 146 ; 0x4af: 0x92
kfffde[2].xptr.au:               330031 ; 0x4b0: 0x0005092f
kfffde[2].xptr.disk:                  4 ; 0x4b4: 0x0004
kfffde[2].xptr.flags:                 0 ; 0x4b6: L=0 E=0 D=0 S=0
kfffde[2].xptr.chk:                  13 ; 0x4b7: 0x0d
kfddde[2].entry.incarn:               1 ; 0x3a4: A=1 NUMM=0x0
kfddde[2].entry.hash:                 2 ; 0x3a8: 0x00000002
kfddde[2].entry.refer.number:4294967295 ; 0x3ac: 0xffffffff
kfddde[2].entry.refer.incarn:         0 ; 0x3b0: A=0 NUMM=0x0
kfddde[2].dsknum:                     2 ; 0x3b4: 0x0002
kfddde[2].state:                      2 ; 0x3b6: KFDSTA_NORMAL
kfddde[2].ddchgfl:                    0 ; 0x3b7: 0x00
kfddde[2].dskname:            DATA_0002 ; 0x3b8: length=9
kfddde[2].fgname:             DATA_0002 ; 0x3d8: length=9
kfddde[2].crestmp.hi:          33010550 ; 0x3f8: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[2].crestmp.lo:        2793310208 ; 0x3fc: USEC=0x0 MSEC=0x3a2 SECS=0x27 MINS=0x29
kfddde[2].failstmp.hi:                0 ; 0x400: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[2].failstmp.lo:                0 ; 0x404: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[2].timer:                      0 ; 0x408: 0x00000000
kfddde[2].size:                 1258291 ; 0x40c: 0x00133333
kfddde[2].srRloc.super.hiStart:       0 ; 0x410: 0x00000000
kfddde[2].srRloc.super.loStart:       0 ; 0x414: 0x00000000
kfddde[2].srRloc.super.length:        0 ; 0x418: 0x00000000
kfddde[2].srRloc.incarn:              0 ; 0x41c: 0x00000000
kfddde[2].dskrprtm:                   0 ; 0x420: 0x00000000
kfddde[2].start0:                     0 ; 0x424: 0x00000000
kfddde[2].size0:                1258291 ; 0x428: 0x00133333
kfddde[2].used0:                1258229 ; 0x42c: 0x001332f5
kfddde[2].slot:                       0 ; 0x430: 0x00000000
kfddde[3].entry.incarn:               1 ; 0x564: A=1 NUMM=0x0
kfddde[3].entry.hash:                 3 ; 0x568: 0x00000003
kfddde[3].entry.refer.number:4294967295 ; 0x56c: 0xffffffff
kfddde[3].entry.refer.incarn:         0 ; 0x570: A=0 NUMM=0x0
kfddde[3].dsknum:                     3 ; 0x574: 0x0003
kfddde[3].state:                      2 ; 0x576: KFDSTA_NORMAL
kfddde[3].ddchgfl:                    0 ; 0x577: 0x00
kfddde[3].dskname:            DATA_0003 ; 0x578: length=9
kfddde[3].fgname:             DATA_0003 ; 0x598: length=9
kfddde[3].crestmp.hi:          33010550 ; 0x5b8: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[3].crestmp.lo:        2811397120 ; 0x5bc: USEC=0x0 MSEC=0xa1 SECS=0x39 MINS=0x29
kfddde[3].failstmp.hi:                0 ; 0x5c0: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[3].failstmp.lo:                0 ; 0x5c4: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[3].timer:                      0 ; 0x5c8: 0x00000000
kfddde[3].size:                 1258291 ; 0x5cc: 0x00133333
kfddde[3].srRloc.super.hiStart:       0 ; 0x5d0: 0x00000000
kfddde[3].srRloc.super.loStart:       0 ; 0x5d4: 0x00000000
kfddde[3].srRloc.super.length:        0 ; 0x5d8: 0x00000000
kfddde[3].srRloc.incarn:              0 ; 0x5dc: 0x00000000
kfddde[3].dskrprtm:                   0 ; 0x5e0: 0x00000000
kfddde[3].start0:                     0 ; 0x5e4: 0x00000000
kfddde[3].size0:                1258291 ; 0x5e8: 0x00133333
kfddde[3].used0:                1258128 ; 0x5ec: 0x00133290
kfddde[3].slot:                       0 ; 0x5f0: 0x00000000
kfddde[4].entry.incarn:               1 ; 0x724: A=1 NUMM=0x0
kfddde[4].entry.hash:                 4 ; 0x728: 0x00000004
kfddde[4].entry.refer.number:4294967295 ; 0x72c: 0xffffffff
kfddde[4].entry.refer.incarn:         0 ; 0x730: A=0 NUMM=0x0
kfddde[4].dsknum:                     4 ; 0x734: 0x0004
kfddde[4].state:                      2 ; 0x736: KFDSTA_NORMAL
kfddde[4].ddchgfl:                    0 ; 0x737: 0x00
kfddde[4].dskname:            DATA_0004 ; 0x738: length=9
kfddde[4].fgname:             DATA_0004 ; 0x758: length=9
kfddde[4].crestmp.hi:          33010550 ; 0x778: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[4].crestmp.lo:        2834565120 ; 0x77c: USEC=0x0 MSEC=0x102 SECS=0xf MINS=0x2a
kfddde[4].failstmp.hi:                0 ; 0x780: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[4].failstmp.lo:                0 ; 0x784: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[4].timer:                      0 ; 0x788: 0x00000000
kfddde[4].size:                 1258291 ; 0x78c: 0x00133333
kfddde[4].srRloc.super.hiStart:       0 ; 0x790: 0x00000000
kfddde[4].srRloc.super.loStart:       0 ; 0x794: 0x00000000
kfddde[4].srRloc.super.length:        0 ; 0x798: 0x00000000
kfddde[4].srRloc.incarn:              0 ; 0x79c: 0x00000000
kfddde[4].dskrprtm:                   0 ; 0x7a0: 0x00000000
kfddde[4].start0:                     0 ; 0x7a4: 0x00000000
kfddde[4].size0:                1258291 ; 0x7a8: 0x00133333
kfddde[4].used0:                1258291 ; 0x7ac: 0x00133333
kfddde[4].slot:                       0 ; 0x7b0: 0x00000000
kfddde[5].entry.incarn:               1 ; 0x8e4: A=1 NUMM=0x0
kfddde[5].entry.hash:                 5 ; 0x8e8: 0x00000005
kfddde[5].entry.refer.number:4294967295 ; 0x8ec: 0xffffffff
kfddde[5].entry.refer.incarn:         0 ; 0x8f0: A=0 NUMM=0x0
kfddde[5].dsknum:                     5 ; 0x8f4: 0x0005
kfddde[5].state:                      2 ; 0x8f6: KFDSTA_NORMAL
kfddde[5].ddchgfl:                    0 ; 0x8f7: 0x00
kfddde[5].dskname:            DATA_0005 ; 0x8f8: length=9
kfddde[5].fgname:             DATA_0005 ; 0x918: length=9
kfddde[5].crestmp.hi:          33010550 ; 0x938: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[5].crestmp.lo:        2853560320 ; 0x93c: USEC=0x0 MSEC=0x178 SECS=0x21 MINS=0x2a
kfddde[5].failstmp.hi:                0 ; 0x940: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[5].failstmp.lo:                0 ; 0x944: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[5].timer:                      0 ; 0x948: 0x00000000
kfddde[5].size:                 1258291 ; 0x94c: 0x00133333
kfddde[5].srRloc.super.hiStart:       0 ; 0x950: 0x00000000
kfddde[5].srRloc.super.loStart:       0 ; 0x954: 0x00000000
kfddde[5].srRloc.super.length:        0 ; 0x958: 0x00000000
kfddde[5].srRloc.incarn:              0 ; 0x95c: 0x00000000
kfddde[5].dskrprtm:                   0 ; 0x960: 0x00000000
kfddde[5].start0:                     0 ; 0x964: 0x00000000
kfddde[5].size0:                1258291 ; 0x968: 0x00133333
kfddde[5].used0:                1258255 ; 0x96c: 0x0013330f
kfddde[5].slot:                       0 ; 0x970: 0x00000000
kfddde[6].entry.incarn:               1 ; 0xaa4: A=1 NUMM=0x0
kfddde[6].entry.hash:                 6 ; 0xaa8: 0x00000006
kfddde[6].entry.refer.number:4294967295 ; 0xaac: 0xffffffff
kfddde[6].entry.refer.incarn:         0 ; 0xab0: A=0 NUMM=0x0
kfddde[6].dsknum:                     6 ; 0xab4: 0x0006
kfddde[6].state:                      2 ; 0xab6: KFDSTA_NORMAL
kfddde[6].ddchgfl:                    0 ; 0xab7: 0x00
kfddde[6].dskname:            DATA_0006 ; 0xab8: length=9
kfddde[6].fgname:             DATA_0006 ; 0xad8: length=9
kfddde[6].crestmp.hi:          33010550 ; 0xaf8: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[6].crestmp.lo:        2875645952 ; 0xafc: USEC=0x0 MSEC=0x1b8 SECS=0x36 MINS=0x2a
kfddde[6].failstmp.hi:                0 ; 0xb00: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[6].failstmp.lo:                0 ; 0xb04: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[6].timer:                      0 ; 0xb08: 0x00000000
kfddde[6].size:                 1258291 ; 0xb0c: 0x00133333
kfddde[6].srRloc.super.hiStart:       0 ; 0xb10: 0x00000000
kfddde[6].srRloc.super.loStart:       0 ; 0xb14: 0x00000000
kfddde[6].srRloc.super.length:        0 ; 0xb18: 0x00000000
kfddde[6].srRloc.incarn:              0 ; 0xb1c: 0x00000000
kfddde[6].dskrprtm:                   0 ; 0xb20: 0x00000000
kfddde[6].start0:                     0 ; 0xb24: 0x00000000
kfddde[6].size0:                1258291 ; 0xb28: 0x00133333
kfddde[6].used0:                1258247 ; 0xb2c: 0x00133307
kfddde[6].slot:                       0 ; 0xb30: 0x00000000
kfddde[7].entry.incarn:               1 ; 0xc64: A=1 NUMM=0x0
kfddde[7].entry.hash:                 7 ; 0xc68: 0x00000007
kfddde[7].entry.refer.number:4294967295 ; 0xc6c: 0xffffffff
kfddde[7].entry.refer.incarn:         0 ; 0xc70: A=0 NUMM=0x0
kfddde[7].dsknum:                     7 ; 0xc74: 0x0007
kfddde[7].state:                      2 ; 0xc76: KFDSTA_NORMAL
kfddde[7].ddchgfl:                    0 ; 0xc77: 0x00
kfddde[7].dskname:            DATA_0007 ; 0xc78: length=9
kfddde[7].fgname:             DATA_0007 ; 0xc98: length=9
kfddde[7].crestmp.hi:          33010550 ; 0xcb8: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[7].crestmp.lo:        2898849792 ; 0xcbc: USEC=0x0 MSEC=0x23c SECS=0xc MINS=0x2b
kfddde[7].failstmp.hi:                0 ; 0xcc0: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[7].failstmp.lo:                0 ; 0xcc4: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[7].timer:                      0 ; 0xcc8: 0x00000000
kfddde[7].size:                 1258291 ; 0xccc: 0x00133333
kfddde[7].srRloc.super.hiStart:       0 ; 0xcd0: 0x00000000
kfddde[7].srRloc.super.loStart:       0 ; 0xcd4: 0x00000000
kfddde[7].srRloc.super.length:        0 ; 0xcd8: 0x00000000
kfddde[7].srRloc.incarn:              0 ; 0xcdc: 0x00000000
kfddde[7].dskrprtm:                   0 ; 0xce0: 0x00000000
kfddde[7].start0:                     0 ; 0xce4: 0x00000000
kfddde[7].size0:                1258291 ; 0xce8: 0x00133333
kfddde[7].used0:                1258209 ; 0xcec: 0x001332e1
kfddde[7].slot:                       0 ; 0xcf0: 0x00000000

结合上述信息构造类似磁盘头文件

kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:              2147483648 ; 0x008: TYPE=0x8 NUMB=0x0
kfbh.check:                  3123334821 ; 0x00c: 0xba2a4ea5
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:     ORCLDISKVOL2 ; 0x000: length=12
kfdhdb.driver.reserved[0]:    827084630 ; 0x008: 0x314c4f56
kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum:                        2 ; 0x024: 0x0002
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_NORMAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:               DATA_0002 ; 0x028: length=9
kfdhdb.grpname:                    DATA ; 0x048: length=4
kfdhdb.fgname:                DATA_0002 ; 0x068: length=9
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             33010550 ; 0x3f8: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfdhdb.crestmp.lo:           2793310208 ; 0x3fc: USEC=0x0 MSEC=0x3a2 SECS=0x27 MINS=0x29
kfdhdb.mntstmp.hi:             33049840 ; 0x0b0: HOUR=0x10 DAYS=0x7 MNTH=0x3 YEAR=0x7e1
kfdhdb.mntstmp.lo:           1588567040 ; 0x0b4: USEC=0x0 MSEC=0x3e7 SECS=0x2a MINS=0x17
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80
kfdhdb.dsksize:                 1258291 ; 0x40c: 0x00133333
kfdhdb.pmcnt:                        19 ; 0x0c8: 0x00000013
kfdhdb.fstlocn:                       1 ; 0x0cc: 0x00000001
kfdhdb.altlocn:                       2 ; 0x0d0: 0x00000002
kfdhdb.f1b1locn:                      2 ; 0x0d4: 0x00000002

然后通过kfed merge分别对所有磁盘头进行重构,然后通过dul去识别拷贝文件

+DATA/XFF/spfileXFF.ora.265.869858859
+DATA/XFF/CONTROLFILE/Current.256.796701475
+DATA/XFF/ONLINELOG/group_1.257.796701475
+DATA/XFF/ONLINELOG/group_2.258.796701485
+DATA/XFF/ONLINELOG/group_3.266.796704261
+DATA/XFF/ONLINELOG/group_4.267.796704277
+DATA/XFF/ONLINELOG/group_5.1235.824131117
+DATA/XFF/ONLINELOG/group_6.1115.824200515
+DATA/XFF/ONLINELOG/group_7.1113.824200587
+DATA/XFF/ONLINELOG/group_8.1112.824200627
+DATA/XFF/ONLINELOG/group_9.1066.824201189
+DATA/XFF/ONLINELOG/group_10.1063.824201207
+DATA/XFF/ONLINELOG/group_11.1062.824201287
+DATA/XFF/ONLINELOG/group_12.1061.824201301
File 259 datafile size 2147491840, block size 8192
File 260 datafile size 32186048512, block size 8192
File 261 datafile size 6897541120, block size 8192
File 263 datafile size 27409784832, block size 8192
File 264 datafile size 34359730176, block size 8192
File 280 datafile size 31457288192, block size 8192
File 281 datafile size 31457288192, block size 8192
File 330 datafile size 5242888192, block size 8192
File 334 datafile size 20971528192, block size 8192
File 382 datafile size 20971528192, block size 8192
File 383 datafile size 20971528192, block size 8192
File 384 datafile size 31457288192, block size 8192
File 385 datafile size 31457288192, block size 8192
File 386 datafile size 31457288192, block size 8192
File 387 datafile size 4294975488, block size 8192
File 388 datafile size 4294975488, block size 8192
File 389 datafile size 4294975488, block size 8192
File 390 datafile size 4294975488, block size 8192
File 391 datafile size 4294975488, block size 8192
File 392 datafile size 4294975488, block size 8192
File 394 datafile size 31457288192, block size 8192
File 491 datafile size 20971528192, block size 8192
File 494 datafile size 20971528192, block size 8192
File 578 datafile size 31457288192, block size 8192
File 597 datafile size 20971528192, block size 8192
File 613 datafile size 4294975488, block size 8192
File 638 datafile size 31457288192, block size 8192
File 668 datafile size 16988184576, block size 8192
File 688 datafile size 20971528192, block size 8192
File 740 datafile size 31457288192, block size 8192
File 787 datafile size 31457288192, block size 8192
File 798 datafile size 31457288192, block size 8192
File 806 datafile size 31457288192, block size 8192
File 810 datafile size 31457288192, block size 8192
File 845 datafile size 31457288192, block size 8192
File 886 datafile size 31457288192, block size 8192
File 887 datafile size 31457288192, block size 8192
File 889 datafile size 31457288192, block size 8192
File 892 datafile size 31457288192, block size 8192
File 903 datafile size 31457288192, block size 8192
File 932 datafile size 31457288192, block size 8192
File 933 datafile size 3145736192, block size 8192
File 951 datafile size 20971528192, block size 8192
File 953 datafile size 31457288192, block size 8192
File 955 datafile size 31457288192, block size 8192
File 963 datafile size 31457288192, block size 8192
File 1000 datafile size 31457288192, block size 8192
File 1001 datafile size 12035563520, block size 8192
File 1031 datafile size 31457288192, block size 8192
File 1033 datafile size 31457288192, block size 8192
File 1035 datafile size 31457288192, block size 8192
File 1037 datafile size 31457288192, block size 8192
File 1045 datafile size 31457288192, block size 8192
File 1073 datafile size 4294975488, block size 8192
File 1074 datafile size 4294975488, block size 8192
File 1075 datafile size 4294975488, block size 8192
File 1076 datafile size 8589942784, block size 8192
File 1077 datafile size 31457288192, block size 8192
File 1078 datafile size 8589942784, block size 8192
File 1079 datafile size 8589942784, block size 8192
File 1080 datafile size 4294975488, block size 8192
File 1081 datafile size 8589942784, block size 8192
File 1082 datafile size 8589942784, block size 8192
File 1083 datafile size 8589942784, block size 8192
File 1084 datafile size 8589942784, block size 8192
File 1085 datafile size 32365355008, block size 8192
File 1086 datafile size 9071239168, block size 8192
File 1116 datafile size 8589942784, block size 8192
File 1133 datafile size 8589942784, block size 8192
File 1219 datafile size 31457288192, block size 8192
File 1245 datafile size 31457288192, block size 8192
File 1249 datafile size 31457288192, block size 8192
File 1251 datafile size 31457288192, block size 8192
File 1322 datafile size 4294975488, block size 8192
File 1442 datafile size 31457288192, block size 8192
File 1468 datafile size 1048584192, block size 8192
File 1508 datafile size 31457288192, block size 8192
File 1554 datafile size 4294975488, block size 8192
File 1570 datafile size 31457288192, block size 8192
File 2004 datafile size 31457288192, block size 8192
File 2005 datafile size 31457288192, block size 8192
File 2344 datafile size 31457288192, block size 8192
File 2345 datafile size 31457288192, block size 8192
File 2348 datafile size 31457288192, block size 8192
File 2617 datafile size 10737426432, block size 8192
File 2618 datafile size 21474844672, block size 8192
File 2766 datafile size 33554440192, block size 8192
File 2782 datafile size 31457288192, block size 8192
File 2784 datafile size 31457288192, block size 8192
File 2893 datafile size 31457288192, block size 8192
File 2924 datafile size 31457288192, block size 8192
File 2925 datafile size 31457288192, block size 8192
File 2926 datafile size 31457288192, block size 8192
File 2983 datafile size 31457288192, block size 8192
File 2984 datafile size 31457288192, block size 8192
File 3634 datafile size 31457288192, block size 8192
File 3909 datafile size 31457288192, block size 8192
File 3917 datafile size 31457288192, block size 8192
File 3920 datafile size 31457288192, block size 8192
File 3922 datafile size 31457288192, block size 8192

剩下的事情就比较简单了,通过把spfile,controlfile,datafile文件拷贝出来,本地启动数据库,恢复成功
asm_open_db


open_alert


如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

分享oracleasm createdisk重新创建asm disk后数据0丢失恢复案例

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:分享oracleasm createdisk重新创建asm disk后数据0丢失恢复案例

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户反馈他们重启系统之后,发现asmlib创建的asmdisk丢失了,然后又使用oracleasm deletedisk和createdisk重新创建的asm disk,最后发现asm diskgroup无法mount。让客户通过dd 备份5m数据,然后使用kfed分析
kefd分析结果

E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:                       0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check:                  3760689243 ; 0x00c: 0xe027905b
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img blkn=1
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:                       0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img blkn=10
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:                       0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img blkn=255
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:                       0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img blkn=256|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                           17 ; 0x002: KFBTYP_PST_META
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                     256 ; 0x004: T=0 NUMB=0x100
kfbh.block.obj:              2147483648 ; 0x008: TYPE=0x8 NUMB=0x0
kfbh.check:                  3925268785 ; 0x00c: 0xe9f6d931
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdpHdrPairBv1.first.super.time.hi:32994098 ; 0x000: HOUR=0x12 DAYS=0x19 MNTH=0x
c YEAR=0x7dd
kfdpHdrPairBv1.first.super.time.lo:1614030848 ; 0x004: USEC=0x0 MSEC=0x10a SECS=
0x3 MINS=0x18
kfdpHdrPairBv1.first.super.last:      2 ; 0x008: 0x00000002
kfdpHdrPairBv1.first.super.next:      2 ; 0x00c: 0x00000002
kfdpHdrPairBv1.first.super.copyCnt:   1 ; 0x010: 0x01
kfdpHdrPairBv1.first.super.version:   1 ; 0x011: 0x01
kfdpHdrPairBv1.first.super.ub2spare:  0 ; 0x012: 0x0000
kfdpHdrPairBv1.first.super.incarn:    1 ; 0x014: 0x00000001
kfdpHdrPairBv1.first.super.copy[0]:   0 ; 0x018: 0x0000
kfdpHdrPairBv1.first.super.copy[1]:   0 ; 0x01a: 0x0000
kfdpHdrPairBv1.first.super.copy[2]:   0 ; 0x01c: 0x0000
……

因为kfed默认每个block为4k,这里提示256是ok的,255是损坏的,从而推测出来,很可能oracleasm createdisk损坏了1M的数据。由于默认au是1m,而且数据库版本是11.2.0.3,而且第256个blkn开始没有损坏,因此初步判断可以考虑使用备份asm disk header来恢复磁盘头
检查还原磁盘头的asm disk

[grid@xifenfei1 disks]$ kfed read DATA1
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  2776451033 ; 0x00c: 0xa57d47d9
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:    ORCLDISKDATA1 ; 0x000: length=13
kfdhdb.driver.reserved[0]:   1096040772 ; 0x008: 0x41544144
kfdhdb.driver.reserved[1]:           49 ; 0x00c: 0x00000031
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                186646528 ; 0x020: 0x0b200000
kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:               DATA_0000 ; 0x028: length=9
kfdhdb.grpname:                    DATA ; 0x048: length=4
kfdhdb.fgname:                DATA_0000 ; 0x068: length=9
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             32994099 ; 0x0a8: HOUR=0x13 DAYS=0x19 MNTH=0xc YEAR=0x7dd
kfdhdb.crestmp.lo:           2797442048 ; 0x0ac: USEC=0x0 MSEC=0x365 SECS=0x2b MINS=0x29
kfdhdb.mntstmp.hi:             33022061 ; 0x0b0: HOUR=0xd DAYS=0x3 MNTH=0x8 YEAR=0x7df
kfdhdb.mntstmp.lo:            816879616 ; 0x0b4: USEC=0x0 MSEC=0x26 SECS=0xb MINS=0xc
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
…………

证明磁盘头确实被比较完美的修复了,现在的任务是尝试mount磁盘组
mount磁盘组

[grid@xifenfei1 ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.3.0 Production on Thu Aug 6 20:54:53 2015
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> alter diskgroup data mount;
Diskgroup altered.
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

asm diskgroup已经正常mount,使用asmcmd命令检查文件是否正常
分析磁盘组数据是否正常

[grid@xifenfei1 ~]$ asmcmd
ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576   1622060   636493                0          636493              0             N  DATA/
ASMCMD> cd data
ASMCMD> ls
ORCL/
ASMCMD> cd orcl
ASMCMD> ls
CONTROLFILE/
DATAFILE/
ONLINELOG/
PARAMETERFILE/
TEMPFILE/
spfileorcl.ora
ASMCMD> cd datafile
ASMCMD> ls
XIFENFEI20130801.314.835191517
XIFENFEI20140101.321.835191571
XIFENFEI20140201.322.835191573
XIFENFEI20140301.323.835191573
…………
SYSAUX.270.835182535
SYSAUX.838.874669369
SYSTEM.271.835182533
SYSTEM.823.873555791
SYSTEM.945.883146947
…………

这里看到磁盘组里面的数据文件都正常,使用同样的方法,继续mount其他磁盘组。
尝试启动数据库

SQL> startup
ORACLE 例程已经启动。
Total System Global Area 5010685952 bytes
Fixed Size                  2236968 bytes
Variable Size            2013269464 bytes
Database Buffers         2986344448 bytes
Redo Buffers                8835072 bytes
数据库装载完毕。
ORA-16038: 日志 14 sequence# 21145 无法归档
ORA-19504: 无法创建文件""
ORA-00312: 联机日志 14 线程 2: '+DATA/orcl/onlinelog/group_14.284.835184569'
ORA-00312: 联机日志 14 线程 2: '+ARCH/orcl/onlinelog/group_14.287.835184569'

查看数据库alert日志

ARC1: Archival started
ARC2: Archival started
ARC2: Becoming the 'no FAL' ARCH
ARC2: Becoming the 'no SRL' ARCH
ARC1: Becoming the heartbeat ARCH
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Thu Aug 06 21:04:06 2015
Thread 2 advanced to log sequence 21146 (thread recovery)
Picked broadcast on commit scheme to generate SCNs
Thread 2 advanced to log sequence 21147 (before internal thread enable)
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_27402.trc:
ORA-19816: 警告: 文件可能存在于数据库未知的 db_recovery_file_dest 中。
ORA-17502: ksfdcre: 4 未能创建文件 +ARCH
ORA-15196: invalid ASM block header [kfc.c:19572] [check_kfbh] [1] [47962] [1344818371 != 630731762]
ORA-15130: diskgroup "ARCH" is being dismounted
ORA-15066: offlining disk "ARCH_0000" in group "ARCH" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1]
ARCH: Error 19504 Creating archive log file to '+ARCH'
NOTE: Deferred communication with ASM instance
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_27402.trc:
ORA-15130: diskgroup "ARCH" is being dismounted
NOTE: deferred map free for map id 754
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_27402.trc:
ORA-16038: 日志 14 sequence# 21145 无法归档
ORA-19504: 无法创建文件""
ORA-00312: 联机日志 14 线程 2: '+DATA/orcl/onlinelog/group_14.284.835184569'
ORA-00312: 联机日志 14 线程 2: '+ARCH/orcl/onlinelog/group_14.287.835184569'
ORA-16038 signalled during: ALTER DATABASE OPEN...
Thu Aug 06 21:04:10 2015
SUCCESS: diskgroup ARCH was dismounted
SUCCESS: diskgroup ARCH was dismounted
Thu Aug 06 21:04:10 2015
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ckpt_27353.trc:
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: '+ARCH/orcl/controlfile/current.256.835182531'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ckpt_27353.trc:
ORA-00221: error on write to control file
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: '+ARCH/orcl/controlfile/current.256.835182531'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Thu Aug 06 21:04:10 2015
System state dump requested by (instance=1, osid=27353 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_27318.trc
CKPT (ospid: 27353): terminating the instance due to error 221
Instance terminated by CKPT, pid = 27353

查看asm alert日志

Thu Aug 06 21:04:07 2015
WARNING: cache read  a corrupt block: group=2(ARCH) dsk=0 blk=1 disk=0 (ARCH_0000) incarn=3942486752 au=0 blk=1 count=1
Errors in file /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27462.trc:
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1]
NOTE: a corrupted block from group ARCH was dumped to /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27462.trc
WARNING: cache read (retry) a corrupt block: group=2(ARCH) dsk=0 blk=1 disk=0 (ARCH_0000) incarn=3942486752 au=0 blk=1 count=1
Errors in file /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27462.trc:
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1]
ERROR: cache failed to read group=2(ARCH) dsk=0 blk=1 from disk(s): 0(ARCH_0000)
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1]
NOTE: cache initiating offline of disk 0 group ARCH
NOTE: process _user27462_+asm1 (27462) initiating offline of disk 0.3942486752 (ARCH_0000) with mask 0x7e in group 2
WARNING: Disk 0 (ARCH_0000) in group 2 in mode 0x7f is now being taken offline on ASM inst 1
NOTE: initiating PST update: grp = 2, dsk = 0/0xeafd92e0, mask = 0x6a, op = clear
Thu Aug 06 21:04:07 2015
GMON updating disk modes for group 2 at 17 for pid 35, osid 27462
ERROR: Disk 0 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 2)
Thu Aug 06 21:04:07 2015
NOTE: cache dismounting (not clean) group 2/0x723D6245 (ARCH)
NOTE: messaging CKPT to quiesce pins Unix process pid: 27089, image: oracle@xifenfei1 (B000)
WARNING: Offline of disk 0 (ARCH_0000) in group 2 and mode 0x7f failed on ASM inst 1
Thu Aug 06 21:04:07 2015
NOTE: halting all I/Os to diskgroup 2 (ARCH)
System State dumped to trace file /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27462.trc
NOTE: AMDU dump of disk group ARCH created at /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace
Thu Aug 06 21:04:09 2015
NOTE: LGWR doing non-clean dismount of group 2 (ARCH)
NOTE: LGWR sync ABA=126.806 last written ABA 126.806

这里可以看出来,报错的block为arch磁盘组的第一个磁盘的第一个au的第二个block,而我们在开始的时候,已经分析了asm disk的第一个au完全损坏,并且我们使用了备份磁盘头进行来还原,勉强可以让磁盘组mount起来,但是由于数据库在启动的时候,需要对redo进行归档,而归档的过程需要写到arch磁盘组里面,这个时候需要访问到au=0 blk=1,而这个块本身是坏的,因此这个时候该块盘的disk就被offline掉了,而这个磁盘组是外部冗余的,因此磁盘组dismount了,所以数据库无法启动.

分析第一个au里面到底有哪些东西

SQL> select DISK_NUMBER,path from v$asm_disk;
DISK_NUMBER PATH
----------- --------------------------------------------------
          0 /dev/raw/raw1
          2 /dev/raw/raw3
          1 /dev/raw/raw2
[oracle@xifenfei raw]$ kfed read raw1 blkn=1|grep kfbh.type
kfbh.type:                            2 ; 0x002: KFBTYP_FREESPC
[oracle@xifenfei raw]$ kfed read raw1 blkn=2|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
[oracle@xifenfei raw]$ kfed read raw1 blkn=3|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
[oracle@xifenfei raw]$ kfed read raw1 blkn=255|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
[oracle@xifenfei raw]$ kfed read raw2 blkn=1|grep kfbh.type
kfbh.type:                            2 ; 0x002: KFBTYP_FREESPC
[oracle@xifenfei raw]$ kfed read raw2 blkn=2|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
[oracle@xifenfei raw]$ kfed read raw2 blkn=255|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
[oracle@xifenfei raw]$ kfed read raw3 blkn=1|grep kfbh.type
kfbh.type:                            2 ; 0x002: KFBTYP_FREESPC
[oracle@xifenfei raw]$ kfed read raw3 blkn=2|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
[oracle@xifenfei raw]$ kfed read raw3 blkn=255|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

通过一个测试机器的一个磁盘组进行分析,我们可以基本上确定asm 第一个au除了asm disk header的KFBTYP_DISKHEAD之外,其他主要是KFBTYP_FREESPC(Free Space Table)和KFBTYP_ALLOCTBL(allocator table),主要就是记录asm中au的分配情况,也就是进一步说明,如果我不对asm里面的数据使用更多的au分配或者回收au,在缺少第一个au的1-255个block的信息情况下,asm的磁盘组也不会dismount。根据这个思路,让数据库归档到本地,然后继续测试

继续open数据库

SQL> startup
ORACLE 例程已经启动。
Total System Global Area 5010685952 bytes
Fixed Size                  2236968 bytes
Variable Size            2013269464 bytes
Database Buffers         2986344448 bytes
Redo Buffers                8835072 bytes
数据库装载完毕。
SQL> alter database open;
数据库已更改。
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Fri Aug 07 02:43:13 2015
ARC1 started with pid=34, OS id=22778
Fri Aug 07 02:43:13 2015
ARC2 started with pid=35, OS id=22780
Fri Aug 07 02:43:13 2015
ARC3 started with pid=36, OS id=22782
ARC1: Archival started
ARC2: Archival started
ARC2: Becoming the 'no FAL' ARCH
ARC2: Becoming the 'no SRL' ARCH
ARC1: Becoming the heartbeat ARCH
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Fri Aug 07 02:43:24 2015
Thread 1 opened at log sequence 18604
  Current log# 10 seq# 18604 mem# 0: /tmp/xifenfei/otherfile/group_10.273.835182533
  Current log# 10 seq# 18604 mem# 1: /tmp/xifenfei/otherfile/group_10.263.835182533
Successful open of redo thread 1
Fri Aug 07 02:43:24 2015
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Aug 07 02:43:25 2015
SMON: enabling cache recovery
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
Fri Aug 07 02:43:26 2015
minact-scn: Inst 1 is now the master inc#:2 mmon proc-id:21328 status:0x7
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
Fri Aug 07 02:43:26 2015
Redo thread 2 internally disabled at seq 21147 (CKPT)
[21341] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:96999124 end:97000624 diff:1500 (15 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
Starting background process GTX0
Fri Aug 07 02:43:31 2015
GTX0 started with pid=37, OS id=22803
Starting background process RCBG
Fri Aug 07 02:43:31 2015
RCBG started with pid=38, OS id=22805
replication_dependency_tracking turned off (no async multimaster replication found)
Fri Aug 07 02:43:34 2015
Archived Log entry 73876 added for thread 2 sequence 21145 ID 0x513c613f dest 1: <---果然有归档操作发生
Starting background process QMNC
Fri Aug 07 02:43:34 2015
QMNC started with pid=39, OS id=22812
Fri Aug 07 02:43:35 2015
Archived Log entry 73877 added for thread 2 sequence 21146 ID 0x513c613f dest 1:
Fri Aug 07 02:43:35 2015
ARC0: Archiving disabled thread 2 sequence 21147
Archived Log entry 73878 added for thread 2 sequence 21147 ID 0x513c613f dest 1:
Fri Aug 07 02:43:37 2015
Completed: alter database open

现在到了这一步,基本上可以确定,数据库是零丢失恢复。由于asm 第一个au丢失数据严重,想要彻底修复比较难,考虑把数据库启动到mount/read only状态然后使用rman备份数据,然后进行重建asm 磁盘组,再迁移回来。至此完美恢复asmlib的磁盘被oracleasm重写的故障恢复,实现数据0丢失.当然在整个恢复过程没有于此的简单,涉及到在votedisk损坏的情况下,如何mount磁盘组,vote diskgroup的损坏修复问题,磁盘组在10g/11.1和11.2还原磁盘头备份的问题等问题.
虽然本次的恢复案例中,由于asmlib的asm disk不可见就轻易使用oracleasm createdisk命令对磁盘进行了重建,犯了一个很大错误,但是在重建之后,发现磁盘组依旧异常,未继续操作(比如重建磁盘组等),为最后的数据完全恢复创造了必要条件,使得客户的没有任何数据损失。如果再对除磁盘组继续复写操作,可能会导致数据永久性丢失。这个教训告诉我们:遇到自己不能把握的事情,及时终止,不要让错误越走越远