ERROR: diskgroup XXXX was not mounted

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ERROR: diskgroup XXXX was not mounted

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

aix平台10.2.0.5 2节点RAC,由于节点2系统盘故障,通过节点1镜像系统,复制到节点2,结果由于节点2磁盘顺序和节点1不匹配,aix工程师进行了相关操作之后,节点1重启之后datadg磁盘组无法mount

SQL> alter diskgroup datadg mount
Mon Jun 10 23:23:46 CST 2019
NOTE: cache registered group DATADG number=1 incarn=0x8cf61164
Mon Jun 10 23:23:46 CST 2019
NOTE: Hbeat: instance first (grp 1)
Mon Jun 10 23:23:50 CST 2019
NOTE: start heartbeating (grp 1)
Mon Jun 10 23:23:50 CST 2019
NOTE: cache dismounting group 1/0x8CF61164 (DATADG)
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DATADG was not mounted

检查datadg磁盘组相关信息

Tue Jan 29 19:21:45 CST 2019
NOTE: start heartbeating (grp 2)
NOTE: cache opening disk 0 of grp 2: DATADG_0000 path:/dev/rhdisk6
Tue Jan 29 19:21:45 CST 2019
NOTE: F1X0 found on disk 0 fcn 0.0
NOTE: cache opening disk 1 of grp 2: DATADG_0001 path:/dev/rhdisk7
NOTE: cache opening disk 2 of grp 2: DATADG_0002 path:/dev/rhdisk8
NOTE: cache opening disk 3 of grp 2: DATADG_0003 path:/dev/rhdisk9
NOTE: cache mounting (first) group 2/0x60E59155 (DATADG)
* allocate domain 2, invalid = TRUE
Tue Jan 29 19:21:45 CST 2019
NOTE: attached to recovery domain 2
Tue Jan 29 19:21:45 CST 2019
NOTE: cache recovered group 2 to fcn 0.849668
Tue Jan 29 19:21:45 CST 2019
NOTE: LGWR attempting to mount thread 1 for disk group 2
NOTE: LGWR mounted thread 1 for disk group 2
NOTE: opening chunk 1 at fcn 0.849668 ABA
NOTE: seq=21 blk=5394
Tue Jan 29 19:21:46 CST 2019
NOTE: cache mounting group 2/0x60E59155 (DATADG) succeeded
SUCCESS: diskgroup DATADG was mounted

通过这里可以看出来datadg磁盘组是由rhdisk6-9 四块磁盘组成,查询相关磁盘信息发现
asm-foreign


这里确定rhdisk7磁盘异常,通过kfed分析磁盘情况

D:\BaiduNetdiskDownload\xifenfei>kfed read rhdisk7.dd
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                           34 ; 0x001: 0x22
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                   49407 ; 0x004: blk=49407
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                    58396 ; 0x010: 0x0000e41c
kfbh.fcn.wrap:                   131072 ; 0x014: 0x00020000
kfbh.spare1:                 4294967064 ; 0x018: 0xffffff18
kfbh.spare2:                 2105310074 ; 0x01c: 0x7d7c7b7a
005918A00 00002200 0000C0FF 00000000 00000000  [."..............]
005918A10 0000E41C 00020000 FFFFFF18 7D7C7B7A  [............z{|}]
005918A20 00000000 00000000 00000000 00000000  [................]
  Repeat 253 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]
D:\BaiduNetdiskDownload\xifenfei>kfed read rhdisk7.dd blkn=1
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
006EF8A00 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]
D:\BaiduNetdiskDownload\xifenfei>kfed read rhdisk7.dd blkn=2|more
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                33554432 ; 0x004: blk=33554432
kfbh.block.obj:                16777344 ; 0x008: file=128
kfbh.check:                  3844041089 ; 0x00c: 0xe51f6981
kfbh.fcn.base:               1297484544 ; 0x010: 0x4d560b00
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdatb10.aunum:                       0 ; 0x000: 0x00000000
kfdatb10.shrink:                  49153 ; 0x004: 0xc001
kfdatb10.ub2pad:                  20555 ; 0x006: 0x504b
kfdatb10.auinfo[0].link.next:      2048 ; 0x008: 0x0800
kfdatb10.auinfo[0].link.prev:      2048 ; 0x00a: 0x0800
kfdatb10.auinfo[0].free:              0 ; 0x00c: 0x0000
kfdatb10.auinfo[0].total:         49153 ; 0x00e: 0xc001
kfdatb10.auinfo[1].link.next:      4096 ; 0x010: 0x1000
kfdatb10.auinfo[1].link.prev:      4096 ; 0x012: 0x1000
kfdatb10.auinfo[1].free:              0 ; 0x014: 0x0000
kfdatb10.auinfo[1].total:             0 ; 0x016: 0x0000
kfdatb10.auinfo[2].link.next:      6144 ; 0x018: 0x1800
kfdatb10.auinfo[2].link.prev:      6144 ; 0x01a: 0x1800
kfdatb10.auinfo[2].free:              0 ; 0x01c: 0x0000
kfdatb10.auinfo[2].total:             0 ; 0x01e: 0x0000
kfdatb10.auinfo[3].link.next:      8192 ; 0x020: 0x2000
kfdatb10.auinfo[3].link.prev:      8192 ; 0x022: 0x2000
kfdatb10.auinfo[3].free:              0 ; 0x024: 0x0000

对比磁盘可能的损坏情况,由于在aix 平台asm disk的block有一个特征一般0082开头,通过工具打开磁盘,检索该标记对比
正常磁盘
0082-good


异常磁盘
0082-had


通过上述分析,大概评估rhdisk7 元数据部分损坏的不光是block 0和1,人工修复继续使用的可能性不太大,而且基于客户的数据库不大,采取方案是直接拷贝数据文件、redo、控制文件到文件系统,然后在本地文件系统open库
open_db


运气不错,实现完美恢复数据0丢失

WARNING: Read Failed.导致asm磁盘组异常

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:WARNING: Read Failed.导致asm磁盘组异常

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户对asm dg进行扩容,一段时间之后,asm data 磁盘组直接dismount

Wed May 29 18:37:25 2019
SUCCESS: ALTER DISKGROUP DATA ADD  DISK '/dev/oracleasm/disks/DATA_0028' SIZE 511993M ,
'/dev/oracleasm/disks/DATA_0027' SIZE 511993M ,
'/dev/oracleasm/disks/DATA_0026' SIZE 511993M ,
'/dev/oracleasm/disks/DATA_0025' SIZE 511993M /* ASMCA */
NOTE: starting rebalance of group 1/0x9e18e2f1 (DATA) at power 1
Wed May 29 18:37:26 2019
Starting background process ARB0
Wed May 29 18:37:26 2019
ARB0 started with pid=34, OS id=96638
NOTE: assigning ARB0 to group 1/0x9e18e2f1 (DATA) with 1 parallel I/O
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
cellip.ora not found.
Wed May 29 19:21:43 2019
WARNING: Read Failed. group:1 disk:27 AU:0 offset:360448 size:4096
WARNING: cache failed reading from group=1(DATA) dsk=27 blk=88 count=1 from disk= 27
(DATA_0027) kfkist=0x20 status=0x02 osderr=0x0 file=kfc.c line=11596
ERROR: cache failed to read group=1(DATA) dsk=27 blk=88 from disk(s): 27(DATA_0027)
ORA-15080: synchronous I/O operation to a disk failed
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 704
Additional information: -1
NOTE: cache initiating offline of disk 27 group DATA
NOTE: process _user31879_+asm1 (31879) initiating offline of disk 27.3915911747 (DATA_0027) with mask 0x7e in group 1
NOTE: initiating PST update: grp = 1, dsk = 27/0xe9681243, mask = 0x6a, op = clear
Wed May 29 19:21:43 2019
GMON updating disk modes for group 1 at 10 for pid 35, osid 31879
ERROR: Disk 27 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 1)
Wed May 29 19:21:43 2019
NOTE: cache dismounting (not clean) group 1/0x9E18E2F1 (DATA)
NOTE: messaging CKPT to quiesce pins Unix process pid: 90256, image: oracle@ftz-db-o1 (B000)
Wed May 29 19:21:43 2019
NOTE: halting all I/Os to diskgroup 1 (DATA)
WARNING: Offline for disk DATA_0027 in mode 0x7f failed.
Wed May 29 19:21:43 2019
NOTE: LGWR doing non-clean dismount of group 1 (DATA)
NOTE: LGWR sync ABA=27.3207 last written ABA 27.3207
Wed May 29 19:21:43 2019
ERROR: ORA-15130 thrown in ARB0 for group number 1
Errors in file /oracle/grid_base/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_96638.trc:
ORA-15130: diskgroup "" is being dismounted
ORA-15130: diskgroup "DATA" is being dismounted
Wed May 29 19:21:43 2019
NOTE: stopping process ARB0

后续继续mount data 磁盘组成功,但是立马又dismount

Wed May 29 18:37:25 2019
SUCCESS: ALTER DISKGROUP DATA ADD  DISK '/dev/oracleasm/disks/DATA_0028' SIZE 511993M ,
'/dev/oracleasm/disks/DATA_0027' SIZE 511993M ,
'/dev/oracleasm/disks/DATA_0026' SIZE 511993M ,
'/dev/oracleasm/disks/DATA_0025' SIZE 511993M /* ASMCA */
NOTE: starting rebalance of group 1/0x9e18e2f1 (DATA) at power 1
Wed May 29 18:37:26 2019
Starting background process ARB0
Wed May 29 18:37:26 2019
ARB0 started with pid=34, OS id=96638
NOTE: assigning ARB0 to group 1/0x9e18e2f1 (DATA) with 1 parallel I/O
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
cellip.ora not found.
Wed May 29 19:21:43 2019
WARNING: Read Failed. group:1 disk:27 AU:0 offset:360448 size:4096
WARNING: cache failed reading from group=1(DATA) dsk=27 blk=88 count=1 from disk= 27
(DATA_0027) kfkist=0x20 status=0x02 osderr=0x0 file=kfc.c line=11596
ERROR: cache failed to read group=1(DATA) dsk=27 blk=88 from disk(s): 27(DATA_0027)
ORA-15080: synchronous I/O operation to a disk failed
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 704
Additional information: -1
NOTE: cache initiating offline of disk 27 group DATA
NOTE: process _user31879_+asm1 (31879) initiating offline of disk 27.3915911747 (DATA_0027) with mask 0x7e in group 1
NOTE: initiating PST update: grp = 1, dsk = 27/0xe9681243, mask = 0x6a, op = clear
Wed May 29 19:21:43 2019
GMON updating disk modes for group 1 at 10 for pid 35, osid 31879
ERROR: Disk 27 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 1)
Wed May 29 19:21:43 2019
NOTE: cache dismounting (not clean) group 1/0x9E18E2F1 (DATA)
NOTE: messaging CKPT to quiesce pins Unix process pid: 90256, image: oracle@ftz-db-o1 (B000)
Wed May 29 19:21:43 2019
NOTE: halting all I/Os to diskgroup 1 (DATA)
WARNING: Offline for disk DATA_0027 in mode 0x7f failed.
Wed May 29 19:21:43 2019
NOTE: LGWR doing non-clean dismount of group 1 (DATA)
NOTE: LGWR sync ABA=27.3207 last written ABA 27.3207
Wed May 29 19:21:43 2019
ERROR: ORA-15130 thrown in ARB0 for group number 1
Errors in file /oracle/grid_base/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_96638.trc:
ORA-15130: diskgroup "" is being dismounted
ORA-15130: diskgroup "DATA" is being dismounted
Wed May 29 19:21:43 2019
NOTE: stopping process ARB0

对于上述的故障现象,本质原因是由于asm 磁盘组增加新磁盘之后,开始做rebalance,但是由于遭遇到 27号盘上有IO读错误,使得asm磁盘组无法正常完成rebalance,因而data磁盘组无法稳定的mount。解决该问题思路,通过patch asm磁盘组,禁止rebalance,从而使得data磁盘组不再dismount,再进行后续恢复

ORA-600 kfrValAcd30 恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-600 kfrValAcd30 恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户由于存储控制器损坏,修复控制器之后,asm无法正常mount,报ORA-600 kfrValAcd30错误,让我们提供技术支持
kfrValAcd30


asm alert日志报错

Wed Apr 03 16:50:57 2019
SQL> alter diskgroup data mount
NOTE: cache registered group DATA number=1 incarn=0x14248741
NOTE: cache began mount (first) of group DATA number=1 incarn=0x14248741
NOTE: Assigning number (1,0) to disk (ORCL:DATAVOL1)
Wed Apr 03 16:51:03 2019
NOTE: start heartbeating (grp 1)
kfdp_query(DATA): 15
kfdp_queryBg(): 15
NOTE: cache opening disk 0 of grp 1: DATAVOL1 label:DATAVOL1
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache mounting (first) external redundancy group 1/0x14248741 (DATA)
Wed Apr 03 16:51:04 2019
* allocate domain 1, invalid = TRUE
Wed Apr 03 16:51:04 2019
NOTE: attached to recovery domain 1
NOTE: starting recovery of thread=1 ckpt=27.2697 group=1 (DATA)
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_15951.trc  (incident=23394):
ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [1], [27], [2697], [28], [2697], [], [], [], [], []
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM2/incident/incdir_23394/+ASM2_ora_15951_i23394.trc
Abort recovery for domain 1
NOTE: crash recovery signalled OER-600
ERROR: ORA-600 signalled during mount of diskgroup DATA
ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [1], [27], [2697], [28], [2697], [], [], [], [], []
ERROR: alter diskgroup data mount
NOTE: cache dismounting (clean) group 1/0x14248741 (DATA)
NOTE: lgwr not being msg'd to dismount
freeing rdom 1
Wed Apr 03 16:51:05 2019
Sweep [inc][23394]: completed
Sweep [inc2][23394]: completed
Wed Apr 03 16:51:05 2019
Trace dumping is performing id=[cdmp_20190403165105]
NOTE: detached from domain 1
NOTE: cache dismounted group 1/0x14248741 (DATA)
NOTE: cache ending mount (fail) of group DATA number=1 incarn=0x14248741
kfdp_dismount(): 16
kfdp_dismountBg(): 16
NOTE: De-assigning number (1,0) from disk (ORCL:DATAVOL1)
ERROR: diskgroup DATA was not mounted
NOTE: cache deleting context for group DATA 1/337938241

mos相关记录
参考:ORA-600 [KFRVALACD30] in ASM (Doc ID 2123013.1)
kfrValAcd30-mos


ORA-00600: internal error code, arguments: [kfrValAcd30]可能是bug或者硬件故障导致.基于客户的情况,最大可能就是由于硬件故障导致asm 磁盘组的acd无法正常进行,从而无法mount成功.如果运气好,通过kfed相关修复可以正常mount成功,运气不好可以通过底层进行恢复数据文件,从而最大限度恢复数据.

又一例asm格式化文件系统恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:又一例asm格式化文件系统恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

又一个客户把win rac中的asm disk给格式化为ntfs了(data磁盘组由三个500G的磁盘组成,被格式化掉前面两个还剩下一个),而且格式化之后,还进行了一系列恢复(比如修复磁盘头,又进行分区等一些磁盘操作),导致恢复难度增加,也增加了一些数据覆盖
asm alert日志报错

Thu Aug 23 11:20:14 2018
NOTE: ASM client orcl1:orcl disconnected unexpectedly.
NOTE: check client alert log.
NOTE: Process state recorded in trace file d:\app\administrator\diag\asm\+asm\+asm1\trace\+asm1_ora_2260.trc
Thu Aug 23 11:20:28 2018
Errors in file d:\app\administrator\diag\asm\+asm\+asm1\trace\+asm1_lgwr_3820.trc:
ORA-27070: async read/write failed
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 87) 参数错误。
WARNING: IO Failed. group:2 disk(number.incarnation):1.0xf0f0a1cb disk_path:\\.\ORCLDISKDATA1
	 AU:26 disk_offset(bytes):27566080 io_size:4096 operation:Write type:synchronous
	 result:I/O error process_id:3820
NOTE: unable to write any mirror side for diskgroup DATA
NOTE: cache initiating offline of disk 1 group DATA
NOTE: process 3268:3820 initiating offline of disk 1.4042301899 (DATA_0001) with mask 0x7e in group 2
WARNING: Disk DATA_0001 in mode 0x7f is now being taken offline
NOTE: initiating PST update: grp = 2, dsk = 1/0xf0f0a1cb, mode = 0x15
kfdp_updateDsk(): 22
Thu Aug 23 11:20:28 2018
kfdp_updateDskBg(): 22
ERROR: too many offline disks in PST (grp 2)
WARNING: Disk DATA_0001 in mode 0x7f offline aborted

数据库alert日志报错

WARNING: IO Failed. group:2 disk(number.incarnation):1.0xf0f0a1cb disk_path:\\.\ORCLDISKDATA1
	 AU:422 disk_offset(bytes):442515456 io_size:16384 operation:Read type:synchronous
	 result:I/O error process_id:11992
WARNING: failed to read mirror side 1 of virtual extent 5 logical extent 0 of file 260 in
group [2.1859146063] from disk DATA_0001  allocation unit 422 reason error; if possible,will try another mirror side
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_ora_11992.trc:
ORA-15080: 与磁盘的同步 I/O 操作失败
WARNING: failed to write mirror side 1 of virtual extent 5 logical extent 0 of file 260
in group 2 on disk 1 allocation unit 422
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_ora_11992.trc:
ORA-00202: 控制文件: ''+DATA/orcl/controlfile/current.260.944422981''
ORA-15081: 无法将 I/O 操作提交到磁盘
Thu Aug 23 11:20:13 2018
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-27070: 异步读取/写入失败
WARNING: IO Failed. group:2 disk(number.incarnation):1.0xf0f0a1cb disk_path:\\.\ORCLDISKDATA1
	 AU:841 disk_offset(bytes):882532352 io_size:131072 operation:Write type:asynchronous
	 result:I/O error process_id:3224
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-15080: 与磁盘的同步 I/O 操作失败
WARNING: failed to write mirror side 1 of virtual extent 240 logical extent 0 of file 259 in group 2 on disk 1
allocation unit 841 KCF: read, write or open error, block=0x7853 online=1
        file=4 '+DATA/orcl/datafile/users.259.944422883'
        error=15081 txt: ''
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 87) 参数错误。
WARNING: IO Failed. group:2 disk(number.incarnation):1.0xf0f0a1cb disk_path:\\.\ORCLDISKDATA1
	 AU:422 disk_offset(bytes):442515456 io_size:16384 operation:Read type:synchronous
	 result:I/O error process_id:3224
WARNING: failed to read mirror side 1 of virtual extent 5 logical extent 0 of file 260 in group [2.1859146063] from
disk DATA_0001  allocation unit 422 reason error; if possible,will try another mirror side
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-15080: 与磁盘的同步 I/O 操作失败
WARNING: failed to write mirror side 1 of virtual extent 5 logical extent 0 of file 260 in group 2 on disk 1
allocation unit 422
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-00202: 控制文件: ''+DATA/orcl/controlfile/current.260.944422981''
ORA-15081: 无法将 I/O 操作提交到磁盘
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_dbw1_3224.trc:
ORA-00204: 读取控制文件时出错 (块 41, # 块 1)
ORA-00202: 控制文件: ''+DATA/orcl/controlfile/current.260.944422981''
ORA-15081: 无法将 I/O 操作提交到磁盘
DBW1 (ospid: 3224): terminating the instance due to error 204

由于客户进行了一系列恢复恢复操作导致查看磁盘都不全

D:\>asmtool -list
NTFS                             \Device\Harddisk0\Partition1              100M
NTFS                             \Device\Harddisk0\Partition2           102298M
NTFS                             \Device\Harddisk1\Partition1           102397M
NTFS                             \Device\Harddisk2\Partition1           204797M
---这里还有一个磁盘没有正常显示
ORCLDISKDATA10                   \Device\Harddisk4\Partition1           511997M--客户尝试修复的磁盘
ORCLDISKDATA2                    \Device\Harddisk5\Partition1           511997M
ORCLDISKRECOVERY0                \Device\Harddisk6\Partition1            51197M
ORCLDISKRECOVERY1                \Device\Harddisk7\Partition1            51197M
ORCLDISKRECOVERY2                \Device\Harddisk8\Partition1            51197M
ORCLDISKCRS0                     \Device\Harddisk9\Partition1            10237M
ORCLDISKCRS1                     \Device\Harddisk10\Partition1           10237M
ORCLDISKCRS2                     \Device\Harddisk11\Partition1           10237M
NTFS                             \Device\Harddisk12\Partition2         4194174M

通过主机层面激活卷,删除分区等一系列操作,然后通过kfed构造磁盘头,让这些磁盘在os层面可以正常显示

C:\Users\Administrator>asmtool -list
NTFS                             \Device\Harddisk0\Partition1              100M
NTFS                             \Device\Harddisk0\Partition2           102298M
NTFS                             \Device\Harddisk1\Partition1           102397M
NTFS                             \Device\Harddisk2\Partition1           204797M
------需要处理的磁盘------
ORCLDISKDATA0                    \Device\Harddisk3\Partition1           511997M
ORCLDISKDATA1                    \Device\Harddisk4\Partition1           511997M
ORCLDISKDATA2                    \Device\Harddisk5\Partition1           511997M
-----------------------
ORCLDISKRECOVERY0                \Device\Harddisk6\Partition1            51197M
ORCLDISKRECOVERY1                \Device\Harddisk7\Partition1            51197M
ORCLDISKRECOVERY2                \Device\Harddisk8\Partition1            51197M
ORCLDISKCRS0                     \Device\Harddisk9\Partition1            10237M
ORCLDISKCRS1                     \Device\Harddisk10\Partition1           10237M
ORCLDISKCRS2                     \Device\Harddisk11\Partition1           10237M
NTFS                             \Device\Harddisk12\Partition2         4194174M

由于asm磁盘组内部目录au被彻底损坏,导致无法通过asm直接拷贝出来数据,通过底层扫描,按照au恢复出来相关数据,由于格式化ntfs和后续的误操作导致部分数据au被覆盖.其余数据均恢复,抢救了绝大部分数据.
数据文件恢复参考:asm disk header 彻底损坏恢复
另外有一次win平台类似恢复经历:asm disk格式化为ntfs恢复
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

asm disk 大小限制

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:asm disk 大小限制

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

这个问题在12C之前争议很小,基本共识非XD环境不能超过2T,但是到了后面的版本中,发生了一些改变,主要是COMPATIBLE.ASM and COMPATIBLE.RDBMS disk group attributes are set to 12.1 or greater的时候asm disk 大小限制依赖au size,
1M ausize asm disk limit为4 PB
2M ausize asm disk limit为8 PB
4M ausize asm disk limit为16 PB
8M ausize asm disk limit为32 PB

asm-limit-1
asm-limit-2


参见:Oracle ASM Storage Limits
18C中COMPATIBLE.ASM和COMPATIBLE.RDBMS默认值(COMPATIBLE.RDBMS为10.1,也就是说默认情况下非XD情况还是只能支持不超过2T的asm disk)
18c-asm


asm磁盘组操作不当导致数据文件丢失恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:asm磁盘组操作不当导致数据文件丢失恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最近遇到数据库恢复case,客户是要更换存储,在数据库mount状态把使用omf方式存储数据的asm 磁盘组通过rman copy到新的通过别名方式存储的新的asm 磁盘组的存储中,但是由于操作人员粗心,copy语句中部分目标磁盘组的数据文件别名重复了,最后执行rename file之后,导致部分数据文件彻底丢失.我们通过底层碎片扫描(参考:asm disk header 彻底损坏恢复)对于该用户的数据实现完全恢复.
因为整个过程重现比较麻烦,这里测试从一个data磁盘组中有一个omf方式存储的含有两个数据文件的表空间,通过rman copy 把这个表空间的两个文件拷贝到datanew磁盘组中,但是由于粗心把两个数据文件的别名写成一样,结果导致该表空间的一个数据文件彻底丢失的测试.

创建测试表空间
在datanew磁盘组中创建omf方式管理的xifenfei表空间,含有两个数据文件,file#分别为14和15

SQL> create tablespace xifenfei datafile '+DATA' SIZE 128m;
Tablespace created.
SQL> ALTER TABLESPACE XIFENFEI ADD DATAFILE '+DATA' SIZE 128m AUTOEXTEND ON;
Tablespace altered.
SQL> SELECT FILE_NAME,FILE_ID FROM  DBA_DATA_FILES WHERE TABLESPACE_NAME='XIFENFEI';
FILE_NAME
--------------------------------------------------------------------------------
   FILE_ID
----------
+DATA/XFF/DATAFILE/xifenfei.276.961143809
        14
+DATA/XFF/DATAFILE/xifenfei.277.961143825
        15

rman copy datafile 14
通过rman copy把datafile 14拷贝到data磁盘组中,目标端为别名方式存储

RMAN> copy datafile 14 to '+datanew/xifenfei.dbf';
Starting backup at 27-NOV-17
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=24 device type=DISK
channel ORA_DISK_1: starting datafile copy
input datafile file number=00014 name=+DATA/XFF/DATAFILE/xifenfei.276.961143809
output file name=+DATANEW/xifenfei.dbf tag=TAG20171127T082643 RECID=4 STAMP=961144006
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:07
Finished backup at 27-NOV-17
[grid@localhost ~]$ asmcmd
ASMCMD> cd datanew
ASMCMD> ls
XFF/
xifenfei.dbf
ASMCMD> ls -l
Type      Redund  Striped  Time             Sys  Name
                                            Y    XFF/
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  N    xifenfei.dbf => +DATANEW/XFF/DATAFILE/XIFENFEI.256.961144003
ASMCMD>

这里通过asmcmd的ls命令,可以看到虽然我们存储的为datanew磁盘组的别名文件,实际上是link到asm的omf方式的文件(本质上asm中的文件都是omf方式存储,只是在使用的时候体现asm的客户端程序方式不一样,是直接asm中的omf方式,还是asm中的别名).

rman copy datafile 15
通过rman copy把datafile 15 拷贝到和datafile 14别名一样的文件了

RMAN> copy datafile 15 to '+datanew/xifenfei.dbf';
Starting backup at 27-NOV-17
using channel ORA_DISK_1
channel ORA_DISK_1: starting datafile copy
input datafile file number=00015 name=+DATA/XFF/DATAFILE/xifenfei.277.961143825
output file name=+DATANEW/xifenfei.dbf tag=TAG20171127T082731 RECID=5 STAMP=961144053
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:03
Finished backup at 27-NOV-17
ASMCMD> ls -l
Type      Redund  Striped  Time             Sys  Name
                                            Y    XFF/
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  N    xifenfei.dbf => +DATANEW/XFF/DATAFILE/XIFENFEI.256.961144003
ASMCMD> cd xff
ASMCMD> ls
DATAFILE/
ASMCMD> cd datafile
ASMCMD> ls
XIFENFEI.256.961144003
ASMCMD>

这里可以看出来,在data磁盘组中,file 14被file 15覆盖掉了

rename file
把data磁盘组中的数据文件rename 到datanew磁盘组中

SQL> alter database rename file '+DATA/XFF/DATAFILE/xifenfei.276.961143809' to '+datanew/xifenfei.dbf';
Database altered.
SQL> alter database rename file '+DATA/XFF/DATAFILE/xifenfei.277.961143825' to '+datanew/xifenfei.dbf';
alter database rename file '+DATA/XFF/DATAFILE/xifenfei.277.961143825' to '+datanew/xifenfei.dbf'
*
ERROR at line 1:
ORA-01511: error in renaming log/data files
ORA-01523: cannot rename data file to '+data/xifenfei.dbf' - file already part of database

这里我们可以看到,file 14 rename 成功,但是file 15 rename失败,因为在数据库中,已经有了别名的文件(数据文件的路径)

omf自动删除文件
查看原磁盘组datanew中,发现datafile 14被自动删除

ASMCMD> pwd
+DATA/XFF/DATAFILE
ASMCMD> ls -l
Type      Redund  Striped  Time             Sys  Name
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  Y    SYSAUX.257.942061433
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  Y    SYSTEM.256.942061393
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  Y    UNDOTBS1.258.942061449
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  Y    USERS.259.942061449
DATAFILE  UNPROT  COARSE   NOV 27 08:00:00  Y    XIFENFEI.277.961143825
ASMCMD>

alert日志证实数据文件被删除

2017-11-27T09:05:03.054741-05:00
alter database rename file '+DATA/XFF/DATAFILE/xifenfei.276.961143809' to '+datanew/xifenfei.dbf'
2017-11-27T09:05:03.114947-05:00
NOTE: Under CF enqueue, no dependency request for disk group DATANEW
Deleted Oracle managed file +DATA/XFF/DATAFILE/xifenfei.276.961143809
Completed: alter database rename file '+DATA/XFF/DATAFILE/xifenfei.276.961143809' to '+datanew/xifenfei.dbf'
2017-11-27T09:05:21.471474-05:00
alter database rename file '+DATA/XFF/DATAFILE/xifenfei.277.961143825' to '+data/xifenfei.dbf'
ORA-1511 signalled during:alter database rename file
      '+DATA/XFF/DATAFILE/xifenfei.277.961143825' to'+datanew/xifenfei.dbf'

这里可以证实,数据文件的omf方式管理,在数据文件执行rename file的时候,会自动删除掉老的数据文件.这里悲剧已经发生,由于rman copy 覆盖了datanew磁盘组中的datafile 14,rename file又导致data磁盘组中的datafile 14被自动删除,从而使得datafile 14这个数据文件在两个磁盘组中都丢失.从常规角度来说,如果没有合适的备份该文件无法恢复.如果遭遇到oracle asm中数据文件丢失或者部分覆盖,请保护现场,联系我们(ORACLE数据库恢复技术支持),将为您提供专业数据库技术支持:Phone:17813235971    Q Q:107644445    E-Mail:dba@xifenfei.com最大限度抢救您的数据

asm磁盘分区丢失恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:asm磁盘分区丢失恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友反馈,他们做了xx存储的双活之后,重启主机发现gi无法正常启动,分析发现所有该存储的磁盘分区信息丢失,导致asmlib无法发现磁盘(使用分区做asm disk)
类似如下错误(磁盘分区丢失)

--fdisk -l 显示部分结果
Disk /dev/mapper/datahds1: 1099.5 GB, 1099511627776 bytes
255 heads, 63 sectors/track, 133674 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
--ls -l /dev/mapper/   显示结果无分区信息
lrwxrwxrwx 1 root root      7 May  6 03:44 datahds1 -> ../dm-1
lrwxrwxrwx 1 root root      7 May  6 03:26 datahds2 -> ../dm-3
lrwxrwxrwx 1 root root      7 May  6 03:26 datahds3 -> ../dm-8
lrwxrwxrwx 1 root root      7 May  6 03:26 ocrhds1 -> ../dm-0
lrwxrwxrwx 1 root root      7 May  6 03:26 ocrhds2 -> ../dm-2
lrwxrwxrwx 1 root root      7 May  6 03:26 ocrhds3 -> ../dm-4

asm日志显示

SUCCESS: diskgroup DATADG was mounted
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 3
SUCCESS: diskgroup OCRHDS was mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA"

分析系统日志

May  6 02:23:27 db2 kernel: sdb: unknown partition table
May  6 02:23:27 db2 kernel: sde: unknown partition table
May  6 02:23:27 db2 kernel: sdc: unknown partition table
May  6 02:23:27 db2 kernel: sdf: unknown partition table
May  6 02:23:27 db2 kernel: sdd: unknown partition table
May  6 02:23:27 db2 kernel: sdj:Dev sdj: unable to read RDB block 0
May  6 02:23:27 db2 kernel: unable to read partition table
May  6 02:23:27 db2 kernel: sdi: sdi1
May  6 02:23:27 db2 kernel: sdk: sdk1
May  6 02:23:27 db2 kernel: sdg: unknown partition table
May  6 02:23:27 db2 kernel: sdl: sdl1
May  6 02:23:27 db2 kernel: sdm:Dev sdm: unable to read RDB block 0
May  6 02:23:27 db2 kernel: unable to read partition table
May  6 02:23:27 db2 kernel: sdo:Dev sdo: unable to read RDB block 0
May  6 02:23:27 db2 kernel: unable to read partition table
May  6 02:23:27 db2 kernel: sdn:Dev sdn: unable to read RDB block 0
May  6 02:23:27 db2 kernel: unable to read partition table
May  6 02:23:27 db2 kernel: sdp:Dev sdp: unable to read RDB block 0
May  6 02:23:27 db2 kernel: unable to read partition table
May  6 02:23:27 db2 kernel: sds:Dev sds: unable to read RDB block 0
May  6 02:23:27 db2 kernel: unable to read partition table
May  6 02:23:27 db2 kernel: sdh:
May  6 02:23:27 db2 kernel: sdt: sdt1
May  6 02:23:27 db2 kernel: sdv:Dev sdv: unable to read RDB block 0
May  6 02:23:27 db2 kernel: unable to read partition table
May  6 02:23:27 db2 kernel: sdq:Dev sdq: unable to read RDB block 0
May  6 02:23:27 db2 kernel: unable to read partition table
May  6 02:23:27 db2 kernel: sd 1:0:1:9: [sdr] Very big device. Trying to use READ CAPACITY(16).
May  6 02:23:27 db2 kernel: sdr:Dev sdr: unable to read RDB block 0
May  6 02:23:27 db2 kernel: unable to read partition table
May  6 02:23:27 db2 kernel: sd 2:0:0:9: [sdab] Very big device. Trying to use READ CAPACITY(16).
May  6 02:23:27 db2 kernel: sdab: unknown partition table
May  6 02:23:27 db2 kernel: sdac: unknown partition table
May  6 02:23:27 db2 kernel: sdw: sdw1
May  6 02:23:27 db2 kernel: sdu:Dev sdu: unable to read RDB block 0
May  6 02:23:27 db2 kernel: unable to read partition table
May  6 02:23:27 db2 kernel: sdx: sdx1
May  6 02:23:27 db2 kernel: sdy: sdy1
May  6 02:23:27 db2 kernel: sdaa: sdaa1
May  6 02:23:27 db2 kernel: sdz: sdz1
May  6 02:23:27 db2 kernel: sdae: unknown partition table
May  6 02:23:27 db2 kernel: sdaf: unknown partition table
May  6 02:23:27 db2 kernel: sdag: unknown partition table
May  6 02:23:27 db2 kernel: sdai:
May  6 02:23:27 db2 kernel: sdah: unknown partition table
May  6 02:23:27 db2 kernel: sdad: unknown partition table
May  6 02:23:28 db2 mcelog: failed to prefill DIMM database from DMI data

这里错误比较明显unknown partition table,磁盘的分区信息损坏.使用fdisk无法发现分区

partprobe也无效

[root@db2 oracle]# partprobe /dev/mapper/ocrhds3
[root@db2 oracle]#
[root@db2 oracle]# ls -l /dev/mapper/ocrhds3*
lrwxrwxrwx 1 root root 7 May  6 07:30 /dev/mapper/ocrhds3 -> ../dm-4

从尚需信息看,磁盘的分区表信息应该已经损坏,现在能够做的,就是希望运气好,磁盘的分区的实际数据没有损坏

分析磁盘实际分区数据

[root@db2 ~]$ dd if=/dev/mapper/datahds1 of=/tmp/datahds1.dd bs=1024k count=50
[root@db2 ~]$ dd if=/tmp/datahds1.dd of=/tmp/xff01.dd  bs=3225 skip=1
[grid@db2 ~]$ kfed read /tmp/xff01.dd |more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  3110278718 ; 0x00c: 0xb963163e
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr: ORCLDISKHDSDATA1 ; 0x000: length=16
kfdhdb.driver.reserved[0]:   1146307656 ; 0x008: 0x44534448
kfdhdb.driver.reserved[1]:    826364993 ; 0x00c: 0x31415441
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                186646528 ; 0x020: 0x0b200000
kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:             DATADG_0000 ; 0x028: length=11
kfdhdb.grpname:                  DATADG ; 0x048: length=6
kfdhdb.fgname:              DATADG_0000 ; 0x068: length=11
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             33050696 ; 0x0a8: HOUR=0x8 DAYS=0x2 MNTH=0x4 YEAR=0x7e1
kfdhdb.crestmp.lo:           3813740544 ; 0x0ac: USEC=0x0 MSEC=0x44 SECS=0x35 MINS=0x38
kfdhdb.mntstmp.hi:             33050701 ; 0x0b0: HOUR=0xd DAYS=0x2 MNTH=0x4 YEAR=0x7e1
kfdhdb.mntstmp.lo:            411385856 ; 0x0b4: USEC=0x0 MSEC=0x150 SECS=0x8 MINS=0x6

通过上述分析,我们可以初步判断,分区磁盘的信息很可能是好的(因为asm disk header是好的,根据一般的规则从前往后覆盖,既然header是好的,后面的block被覆盖的概率非常小)

通过准备新磁盘直接把磁盘分区dd到新设备上

dd if=/dev/mapper/ocrhds1 of=/dev/mapper/ocrhdsnew1 skip=1 bs=3225
dd if=/dev/mapper/ocrhds2 of=/dev/mapper/ocrhdsnew2 skip=1 bs=3225
dd if=/dev/mapper/ocrhds3 of=/dev/mapper/ocrhdsnew3 skip=1 bs=3225
dd if=/dev/mapper/datahds1 of=/dev/mapper/datahdsnew1 skip=1 bs=3225
dd if=/dev/mapper/datahds2 of=/dev/mapper/datahdsnew2 skip=1 bs=3225
dd if=/dev/mapper/datahds3 of=/dev/mapper/datahdsnew3 skip=1 bs=3225

asmlib重新扫描磁盘

[root@db1 disks]# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...
Instantiating disk "HDSOCR3"
Instantiating disk "HDSDATA2"
Instantiating disk "HDSDATA1"
Instantiating disk "HDSDATA3"
Instantiating disk "HDSOCR1"
Instantiating disk "HDSOCR2"
[root@db1 disks]# ls -ltr
total 0
brw-rw---- 1 grid asmadmin  8, 160 May  6 13:49 HDSOCR3
brw-rw---- 1 grid asmadmin  8, 192 May  6 13:49 HDSDATA2
brw-rw---- 1 grid asmadmin  8, 176 May  6 13:49 HDSDATA1
brw-rw---- 1 grid asmadmin  8, 208 May  6 13:49 HDSDATA3
brw-rw---- 1 grid asmadmin  8, 128 May  6 13:49 HDSOCR1
brw-rw---- 1 grid asmadmin  8, 144 May  6 13:49 HDSOCR2

kfed验证拷贝的分区

[root@db2 tmp]# /oracle/app/11.2.0/grid_1/bin/kfed read /dev/oracleasm/disks/HDSDATA1
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  3110278718 ; 0x00c: 0xb963163e
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr: ORCLDISKHDSDATA1 ; 0x000: length=16
kfdhdb.driver.reserved[0]:   1146307656 ; 0x008: 0x44534448
kfdhdb.driver.reserved[1]:    826364993 ; 0x00c: 0x31415441
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                186646528 ; 0x020: 0x0b200000
kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:             DATADG_0000 ; 0x028: length=11
kfdhdb.grpname:                  DATADG ; 0x048: length=6
kfdhdb.fgname:              DATADG_0000 ; 0x068: length=11
kfdhdb.capname:                         ; 0x088: length=0

asm和数据库启动正常

[grid@db2 ~]$ asmcmd
ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576   3145710  2378034                0         2378034              0             N  DATADG/
MOUNTED  NORMAL  N         512   4096  1048576     15342    14416             5114            4651              0             Y  OCRHDS/
ASMCMD>
[oracle@db2 ~]$ sqlplus  / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Sat May 6 13:54:21 2017
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 3.6077E+10 bytes
Fixed Size                  2260648 bytes
Variable Size            7247757656 bytes
Database Buffers         2.8723E+10 bytes
Redo Buffers              104382464 bytes
Database mounted.
Database opened.
SQL>

asm-disk-partition-lost-recovery


通过上述恢复,实现asm磁盘分区丢失数据0丢失
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

Physically Addressed Metadata Redundancy on 12c ASM ( PHYS_META_REPLICATED )

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:Physically Addressed Metadata Redundancy on 12c ASM ( PHYS_META_REPLICATED )

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

从版本12.1开始,ASM会对某些物理元数据做一份复制,具体的说是每个磁盘的第一个AU(0号AU)上元数据。这意味着,ASM同时维护着两份磁盘头、FST(Free Space Table)表、AT(Allocation table)表的数据。需要注意的是ASM对这些数据采用的是复制(replicate),而不是镜像(mirror)。ASM镜像(mirror)意味着把一份数据,拷贝到不同磁盘上;而物理元数据的副本位于相同的磁盘,因此使用的术语复制(replicate)。这意味着在external冗余的磁盘组中,物理元数据也会被复制。PST也是物理元数据,但是ASM是通过镜像,而不是复制来提供数据保护。因此只有在normal和high冗余的磁盘组中,PST表存在数据的冗余。物理元数据位于每块ASM磁盘的0号AU。元数据复制的特性打开后,ASM会把0号AU的内容拷贝到11号AU,然后同时维护这两份副本。创建磁盘组时如果指定或修改了一个已经存在的磁盘组的compatibility属性为12.1及以上,该特性会自动被打开。当提升ASM compatibility属性值为12.1及以上时,如果11号AU有数据,ASM将把这些数据移动到别处,然后将物理元数据复制到11号AU。从版本11.1.0.7开始,ASM在1号AU的倒数第二个块维护了一份磁盘头的副本。在版本12.1中,ASM仍然维护着这个副本数据。也就是说,现在每个ASM磁盘,有磁盘头的三个副本。
au 0中具体数据
au0


命令行创建diskgroup

[grid@localhost ~]$ sqlplus / as sysasm
SQL*Plus: Release 12.2.0.1.0 Production on Sat Apr 22 08:13:31 2017
Copyright (c) 1982, 2016, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
SQL> create diskgroup xifenfei  external redundancy disk '/dev/xifenfei-sdg','/dev/xifenfei-sdh';
Diskgroup created.
SQL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

查看xifenfei磁盘组属性

[grid@localhost ~]$ asmcmd lsattr -l -G XIFENFEI
Name                     Value
access_control.enabled   FALSE
access_control.umask     066
au_size                  1048576
cell.smart_scan_capable  FALSE
compatible.asm           11.2.0.2.0
compatible.rdbms         10.1.0.0.0
disk_repair_time         3.6h
idp.boundary             auto
idp.type                 dynamic
sector_size              512

这里可以看到目前compatible.asm为11.2,没有phys_meta_replicated属性

查看磁盘头信息
ASM磁盘头的fdhdb.flags条目指代了物理元数据的复制状态:
· kfdhdb.flags = 0 — 元数据没有复制
· kfdhdb.flags = 1 — 元数据已经复制完毕
· kfdhdb.flags = 2 — 元数据在复制过程中

[grid@localhost ~]$ for disk in `asmcmd lsdsk -G XIFENFEI --suppressheader`;
> do kfed read $disk | egrep "dskname|flags"; done
kfdhdb.dskname:           XIFENFEI_0000 ; 0x028: length=13
kfdhdb.flags:                         0 ; 0x0fc: 0x00000000
kfdhdb.dskname:           XIFENFEI_0001 ; 0x028: length=13
kfdhdb.flags:                         0 ; 0x0fc: 0x00000000
[grid@localhost ~]$ kfed read /dev/xifenfei-sdg|grep kfbh.type
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
[grid@localhost ~]$ kfed read /dev/xifenfei-sdg aun=11|grep kfbh.type
kfbh.type:                            8 ; 0x002: KFBTYP_CHNGDIR
[grid@localhost ~]$ kfed read /dev/xifenfei-sdg blkn=254 aun=1|grep kfbh.type
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD

这里也比较明显,在aun 11的位置没有第一个au的备份
修改compatible.asm为12.2

[grid@localhost ~]$  asmcmd setattr -G XIFENFEI compatible.asm 12.2.0.0.0
[grid@localhost ~]$ asmcmd lsattr -l -G XIFENFEI
Name                        Value
access_control.enabled      FALSE
access_control.umask        066
appliance._partnering_type  NULL
au_size                     1048576
cell.smart_scan_capable     FALSE
cell.sparse_dg              allnonsparse
compatible.asm              12.2.0.0.0
compatible.rdbms            10.1.0.0.0
content.check               FALSE
content.type                data
disk_repair_time            3.6h
failgroup_repair_time       24.0h
idp.boundary                auto
idp.type                    dynamic
logical_sector_size         512
phys_meta_replicated        true
preferred_read.enabled      FALSE
scrub_async_limit           1
scrub_metadata.enabled      FALSE
sector_size                 512
thin_provisioned            FALSE

这里可以看到修改为compatible.asm=12.2之后,出现phys_meta_replicated属性

查看au的备份

[grid@localhost ~]$ for disk in `asmcmd lsdsk -G XIFENFEI --suppressheader`;
> do kfed read $disk | egrep "dskname|flags"; done
kfdhdb.dskname:           XIFENFEI_0000 ; 0x028: length=13
kfdhdb.flags:                         1 ; 0x0fc: 0x00000001
kfdhdb.dskname:           XIFENFEI_0001 ; 0x028: length=13
kfdhdb.flags:                         1 ; 0x0fc: 0x00000001
[grid@localhost ~]$ kfed read /dev/xifenfei-sdg aun=11|grep kfbh.type
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
[grid@localhost ~]$ kfed read /dev/xifenfei-sdg blkn=254 aun=1|grep kfbh.type
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
[grid@localhost ~]$ kfed read /dev/xifenfei-sdg|grep kfbh.type
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD

这里就可以看到kfdhdb.flags为1,在au 11的地方也变为了磁盘头信息

模拟第一个au彻底损坏

[grid@localhost ~]$ dd if=/dev/zero of=/dev/xifenfei-sdg bs=1024k count=1 conv=notrunc
1+0 records in
1+0 records out
[grid@localhost ~]$ kfed read /dev/xifenfei-sdg
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
000000000 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

尝试mount磁盘组

SQL> alter diskgroup xifenfei mount;
alter diskgroup xifenfei mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "XIFENFEI" cannot be mounted
ORA-15040: diskgroup is incomplete

alert日志信息

SQL> alter diskgroup xifenfei mount
2017-04-22T08:30:00.889037-04:00
NOTE: cache registered group XIFENFEI 1/0xB15C368B
NOTE: cache began mount (first) of group XIFENFEI 1/0xB15C368B
NOTE: Assigning number (1,1) to disk (/dev/xifenfei-sdh)
2017-04-22T08:30:01.001544-04:00
ERROR: no read quorum in group: required 1, found 0 disks
2017-04-22T08:30:01.001737-04:00
NOTE: cache dismounting (clean) group 1/0xB15C368B (XIFENFEI)
NOTE: messaging CKPT to quiesce pins Unix process pid: 20894, image: oracle@localhost.localdomain (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: LGWR not being messaged to dismount
NOTE: cache dismounted group 1/0xB15C368B (XIFENFEI)
NOTE: cache ending mount (fail) of group XIFENFEI number=1 incarn=0xb15c368b
NOTE: cache deleting context for group XIFENFEI 1/0xb15c368b
2017-04-22T08:30:01.028825-04:00
GMON dismounting group 1 at 2 for pid 23, osid 20894
2017-04-22T08:30:01.029146-04:00
NOTE: Disk XIFENFEI_0001 in mode 0x8 marked for de-assignment
ERROR: diskgroup XIFENFEI was not mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup "XIFENFEI" cannot be mounted
ORA-15040: diskgroup is incomplete
2017-04-22T08:30:01.036014-04:00
ERROR: alter diskgroup xifenfei mount

很明显由于xifenfei-sdg第一个au 已经被完全dd掉,xifenfei磁盘组无法mount,提示ORA-15040: diskgroup is incomplete

使用备份au还原

[grid@localhost ~]$ dd if=/dev/xifenfei-sdg skip=11 bs=1024k count=1 of=/tmp/sdg_header
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.048576 s, 21.6 MB/s
[grid@localhost ~]$ kfed read /tmp/sdg_header |more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  3085718230 ; 0x00c: 0xb7ec52d6
kfbh.fcn.base:                       41 ; 0x010: 0x00000029
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:         ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]:            0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                203423744 ; 0x020: 0x0c200000
kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:           XIFENFEI_0000 ; 0x028: length=13
kfdhdb.grpname:                XIFENFEI ; 0x048: length=8
kfdhdb.fgname:            XIFENFEI_0000 ; 0x068: length=13
[grid@localhost ~]$ dd if=/tmp/sdg_header of=/dev/xifenfei-sdg bs=1024k count=1 conv=notrunc
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0262761 s, 39.9 MB/s
[grid@localhost ~]$ kfed read /dev/xifenfei-sdg|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  3085718230 ; 0x00c: 0xb7ec52d6
kfbh.fcn.base:                       41 ; 0x010: 0x00000029
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:         ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]:            0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                203423744 ; 0x020: 0x0c200000
kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:           XIFENFEI_0000 ; 0x028: length=13
kfdhdb.grpname:                XIFENFEI ; 0x048: length=8
kfdhdb.fgname:            XIFENFEI_0000 ; 0x068: length=13

xifenfei磁盘组mount成功

[grid@localhost ~]$ sqlplus / as sysasm
SQL*Plus: Release 12.2.0.1.0 Production on Sat Apr 22 08:34:53 2017
Copyright (c) 1982, 2016, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
SQL> alter diskgroup xifenfei mount;
Diskgroup altered.

asm alert日志

SQL> alter diskgroup xifenfei mount
2017-04-22T08:34:59.298838-04:00
NOTE: cache registered group XIFENFEI 1/0xFA6C368E
NOTE: cache began mount (first) of group XIFENFEI 1/0xFA6C368E
NOTE: Assigning number (1,0) to disk (/dev/xifenfei-sdg)
NOTE: Assigning number (1,1) to disk (/dev/xifenfei-sdh)
2017-04-22T08:35:05.447528-04:00
NOTE: GMON heartbeating for grp 1 (XIFENFEI)
GMON querying group 1 at 5 for pid 23, osid 21195
2017-04-22T08:35:05.449557-04:00
NOTE: cache is mounting group XIFENFEI created on 2017/04/22 08:13:39
NOTE: cache opening disk 0 of grp 1: XIFENFEI_0000 path:/dev/xifenfei-sdg
NOTE: 04/22/17 08:35:04 XIFENFEI.F1X0 found on disk 0 au 2 fcn 0.0 datfmt 2
NOTE: cache opening disk 1 of grp 1: XIFENFEI_0001 path:/dev/xifenfei-sdh
2017-04-22T08:35:05.450316-04:00
NOTE: cache mounting (first) external redundancy group 1/0xFA6C368E (XIFENFEI)
NOTE: cache recovered group 1 to fcn 0.352
NOTE: redo buffer size is 256 blocks (1056768 bytes)
2017-04-22T08:35:05.504356-04:00
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (XIFENFEI)
NOTE: LGWR found thread 1 closed at ABA 2.63 lock domain=0 inc#=0 instnum=1
NOTE: LGWR mounted thread 1 for diskgroup 1 (XIFENFEI)
2017-04-22T08:35:05.555647-04:00
NOTE: LGWR opened thread 1 (XIFENFEI) at fcn 0.352 ABA 3.64 lock domain=1 inc#=0 instnum=1
  gx.incarn=4201395854 mntstmp=2017/04/22 08:35:05.510000
2017-04-22T08:35:05.556006-04:00
NOTE: cache mounting group 1/0xFA6C368E (XIFENFEI) succeeded
NOTE: cache ending mount (success) of group XIFENFEI number=1 incarn=0xfa6c368e
2017-04-22T08:35:05.596616-04:00
NOTE: Instance updated compatible.asm to 12.2.0.0.0 for grp 1 (XIFENFEI).
2017-04-22T08:35:05.599181-04:00
NOTE: Instance updated compatible.rdbms to 10.1.0.0.0 for grp 1 (XIFENFEI).
2017-04-22T08:35:05.608332-04:00
SUCCESS: diskgroup XIFENFEI was mounted
2017-04-22T08:35:05.635588-04:00
SUCCESS: alter diskgroup xifenfei mount

asm磁盘头全部损坏数据0丢失恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:asm磁盘头全部损坏数据0丢失恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到朋友反馈说他们公司的10.2.0.4(无磁盘头备份)asm 磁盘头都损坏了,确定是被人恶意dd掉了磁盘头的1k,他通过查询发过来结果如下
asm_disk_header


分析alert日志,确定磁盘组和磁盘信息
asm mount data磁盘组报错

Sun Apr 16 21:39:31 2017
NOTE: cache dismounting group 2/0x3F94036B (DATA)
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DATA was not mounted
Sun Apr 16 21:39:31 2017
ERROR: no PST quorum in group 3: required 2, found 0

data磁盘组和磁盘信息

Mon Mar 20 16:21:59 2017
NOTE: Hbeat: instance not first (grp 3)
NOTE: cache opening disk 2 of grp 2: DATA_0002 path:/dev/raw/raw21
Mon Mar 20 16:21:59 2017
NOTE: F1X0 found on disk 2 fcn 0.47624333
NOTE: cache opening disk 3 of grp 2: DATA_0003 path:/dev/raw/raw22
NOTE: cache opening disk 4 of grp 2: DATA_0004 path:/dev/raw/raw23
NOTE: cache opening disk 5 of grp 2: DATA_0005 path:/dev/raw/raw24
NOTE: F1X0 found on disk 5 fcn 0.47624333
NOTE: cache opening disk 6 of grp 2: DATA_0006 path:/dev/raw/raw26
NOTE: cache opening disk 7 of grp 2: DATA_0007 path:/dev/raw/raw25
NOTE: F1X0 found on disk 7 fcn 0.47624333
NOTE: cache mounting (not first) group 2/0x01B869DC (DATA)
Mon Mar 20 16:21:59 2017
kjbdomatt send to node 1
Mon Mar 20 16:21:59 2017
NOTE: attached to recovery domain 2
Mon Mar 20 16:21:59 2017
NOTE: opening chunk 2 at fcn 0.201560874 ABA
NOTE: seq=614 blk=4144
Mon Mar 20 16:21:59 2017
NOTE: cache mounting group 2/0x01B869DC (DATA) succeeded
SUCCESS: diskgroup DATA was mounted

最后一次正常mount是使用了raw21-raw26的裸设备为data磁盘组,但是这里从DATA_002开始,表明很可能最初了两个asm disk被删除,继续分析alert日志

Mon Oct 15 01:53:16 2012
 CREATE DISKGROUP DATA Normal REDUNDANCY  DISK '/dev/raw/raw6' SIZE 1144409M ,
'/dev/raw/raw7' SIZE 1144409M
Sat Dec 27 22:41:39 2014
 alter diskgroup data add disk '/dev/raw/raw21'
Sat Dec 27 22:41:54 2014
alter diskgroup data add disk '/dev/raw/raw22'
Sat Dec 27 22:42:14 2014
 alter diskgroup data add disk '/dev/raw/raw23'
Sat Dec 27 22:42:31 2014
 alter diskgroup data add disk '/dev/raw/raw24'
Sat Dec 27 22:42:51 2014
 alter diskgroup data add disk '/dev/raw/raw26'
Sat Dec 27 22:43:10 2014
alter diskgroup data add disk '/dev/raw/raw25'
Mon Dec 29 17:55:07 2014
 alter diskgroup data drop disk 'DATA_0000'
Tue Dec 30 03:14:42 2014
 alter diskgroup data drop disk 'DATA_0001'

kfed确认磁盘头损坏情况
通过kfed分析dd出来的磁盘头发现每个磁盘头都一样,第一个block损坏

[oracle@rac1 xifenfei]$ kfed read raw22
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F21AF427400 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]
[oracle@rac1 xifenfei]$ kfed read raw22 blkn=2|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       2 ; 0x004: blk=2
kfbh.block.obj:              2147483651 ; 0x008: disk=3
kfbh.check:                  2184525105 ; 0x00c: 0x82353531
kfbh.fcn.base:                 47625389 ; 0x010: 0x02d6b4ad
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdatb10.aunum:                       0 ; 0x000: 0x00000000
kfdatb10.shrink:                    448 ; 0x004: 0x01c0
kfdatb10.ub2pad:                      0 ; 0x006: 0x0000
kfdatb10.auinfo[0].link.next:         8 ; 0x008: 0x0008
kfdatb10.auinfo[0].link.prev:         8 ; 0x00a: 0x0008
kfdatb10.auinfo[0].free:              0 ; 0x00c: 0x0000
kfdatb10.auinfo[0].total:           448 ; 0x00e: 0x01c0
kfdatb10.auinfo[1].link.next:        16 ; 0x010: 0x0010
kfdatb10.auinfo[1].link.prev:        16 ; 0x012: 0x0010
kfdatb10.auinfo[1].free:              0 ; 0x014: 0x0000
kfdatb10.auinfo[1].total:             0 ; 0x016: 0x0000
kfdatb10.auinfo[2].link.next:        24 ; 0x018: 0x0018
kfdatb10.auinfo[2].link.prev:        24 ; 0x01a: 0x0018
kfdatb10.auinfo[2].free:              0 ; 0x01c: 0x0000
kfdatb10.auinfo[2].total:             0 ; 0x01e: 0x0000
kfdatb10.auinfo[3].link.next:        32 ; 0x020: 0x0020
kfdatb10.auinfo[3].link.prev:        32 ; 0x022: 0x0020

恢复思路
确定磁盘是只被干掉了第一个block,但是由于asm 是10.2.0.4的,没有asm disk header的备份,因此也只能自己去人工kfed修复.但是考虑到该case中所有的asm disk header 全部丢失,无任何参考,完全修复比较麻烦,另外这个库也比较小,考虑修复asm disk header 关键部位,然后通过工具直接拷贝出来数据文件,在文件系统中open库的思路.主要需要恢复磁盘头基本信息(diskname,disksize,disknum,ausize,blocksize,file directory等)

通过kfed找出来file directory

kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            4 ; 0x002: KFBTYP_FILEDIR
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       2 ; 0x004: blk=2
kfbh.block.obj:                       1 ; 0x008: file=1
kfbh.check:                  2363360058 ; 0x00c: 0x8cde033a
kfbh.fcn.base:                 48245591 ; 0x010: 0x02e02b57
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfffdb.node.incarn:                   1 ; 0x000: A=1 NUMM=0x0
kfffdb.node.frlist.number:   4294967295 ; 0x004: 0xffffffff
kfffdb.node.frlist.incarn:            0 ; 0x008: A=0 NUMM=0x0
kfffdb.hibytes:                       0 ; 0x00c: 0x00000000
kfffdb.lobytes:                 1048576 ; 0x010: 0x00100000
kfffdb.xtntcnt:                       3 ; 0x014: 0x00000003
kfffdb.xtnteof:                       3 ; 0x018: 0x00000003
kfffdb.blkSize:                    4096 ; 0x01c: 0x00001000
kfffdb.flags:                        65 ; 0x020: O=1 S=0 S=0 D=0 C=0 I=0 R=1 A=0
kfffdb.fileType:                     15 ; 0x021: 0x0f
kfffdb.dXrs:                         19 ; 0x022: SCHE=0x1 NUMB=0x3

通过kfed找出来disk directory

kfffde[0].xptr.au:                    4 ; 0x4a0: 0x00000004
kfffde[0].xptr.disk:                  7 ; 0x4a4: 0x0007
kfffde[0].xptr.flags:                 0 ; 0x4a6: L=0 E=0 D=0 S=0
kfffde[0].xptr.chk:                  41 ; 0x4a7: 0x29
kfffde[1].xptr.au:                17405 ; 0x4a8: 0x000043fd
kfffde[1].xptr.disk:                  6 ; 0x4ac: 0x0006
kfffde[1].xptr.flags:                 0 ; 0x4ae: L=0 E=0 D=0 S=0
kfffde[1].xptr.chk:                 146 ; 0x4af: 0x92
kfffde[2].xptr.au:               330031 ; 0x4b0: 0x0005092f
kfffde[2].xptr.disk:                  4 ; 0x4b4: 0x0004
kfffde[2].xptr.flags:                 0 ; 0x4b6: L=0 E=0 D=0 S=0
kfffde[2].xptr.chk:                  13 ; 0x4b7: 0x0d
kfddde[2].entry.incarn:               1 ; 0x3a4: A=1 NUMM=0x0
kfddde[2].entry.hash:                 2 ; 0x3a8: 0x00000002
kfddde[2].entry.refer.number:4294967295 ; 0x3ac: 0xffffffff
kfddde[2].entry.refer.incarn:         0 ; 0x3b0: A=0 NUMM=0x0
kfddde[2].dsknum:                     2 ; 0x3b4: 0x0002
kfddde[2].state:                      2 ; 0x3b6: KFDSTA_NORMAL
kfddde[2].ddchgfl:                    0 ; 0x3b7: 0x00
kfddde[2].dskname:            DATA_0002 ; 0x3b8: length=9
kfddde[2].fgname:             DATA_0002 ; 0x3d8: length=9
kfddde[2].crestmp.hi:          33010550 ; 0x3f8: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[2].crestmp.lo:        2793310208 ; 0x3fc: USEC=0x0 MSEC=0x3a2 SECS=0x27 MINS=0x29
kfddde[2].failstmp.hi:                0 ; 0x400: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[2].failstmp.lo:                0 ; 0x404: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[2].timer:                      0 ; 0x408: 0x00000000
kfddde[2].size:                 1258291 ; 0x40c: 0x00133333
kfddde[2].srRloc.super.hiStart:       0 ; 0x410: 0x00000000
kfddde[2].srRloc.super.loStart:       0 ; 0x414: 0x00000000
kfddde[2].srRloc.super.length:        0 ; 0x418: 0x00000000
kfddde[2].srRloc.incarn:              0 ; 0x41c: 0x00000000
kfddde[2].dskrprtm:                   0 ; 0x420: 0x00000000
kfddde[2].start0:                     0 ; 0x424: 0x00000000
kfddde[2].size0:                1258291 ; 0x428: 0x00133333
kfddde[2].used0:                1258229 ; 0x42c: 0x001332f5
kfddde[2].slot:                       0 ; 0x430: 0x00000000
kfddde[3].entry.incarn:               1 ; 0x564: A=1 NUMM=0x0
kfddde[3].entry.hash:                 3 ; 0x568: 0x00000003
kfddde[3].entry.refer.number:4294967295 ; 0x56c: 0xffffffff
kfddde[3].entry.refer.incarn:         0 ; 0x570: A=0 NUMM=0x0
kfddde[3].dsknum:                     3 ; 0x574: 0x0003
kfddde[3].state:                      2 ; 0x576: KFDSTA_NORMAL
kfddde[3].ddchgfl:                    0 ; 0x577: 0x00
kfddde[3].dskname:            DATA_0003 ; 0x578: length=9
kfddde[3].fgname:             DATA_0003 ; 0x598: length=9
kfddde[3].crestmp.hi:          33010550 ; 0x5b8: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[3].crestmp.lo:        2811397120 ; 0x5bc: USEC=0x0 MSEC=0xa1 SECS=0x39 MINS=0x29
kfddde[3].failstmp.hi:                0 ; 0x5c0: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[3].failstmp.lo:                0 ; 0x5c4: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[3].timer:                      0 ; 0x5c8: 0x00000000
kfddde[3].size:                 1258291 ; 0x5cc: 0x00133333
kfddde[3].srRloc.super.hiStart:       0 ; 0x5d0: 0x00000000
kfddde[3].srRloc.super.loStart:       0 ; 0x5d4: 0x00000000
kfddde[3].srRloc.super.length:        0 ; 0x5d8: 0x00000000
kfddde[3].srRloc.incarn:              0 ; 0x5dc: 0x00000000
kfddde[3].dskrprtm:                   0 ; 0x5e0: 0x00000000
kfddde[3].start0:                     0 ; 0x5e4: 0x00000000
kfddde[3].size0:                1258291 ; 0x5e8: 0x00133333
kfddde[3].used0:                1258128 ; 0x5ec: 0x00133290
kfddde[3].slot:                       0 ; 0x5f0: 0x00000000
kfddde[4].entry.incarn:               1 ; 0x724: A=1 NUMM=0x0
kfddde[4].entry.hash:                 4 ; 0x728: 0x00000004
kfddde[4].entry.refer.number:4294967295 ; 0x72c: 0xffffffff
kfddde[4].entry.refer.incarn:         0 ; 0x730: A=0 NUMM=0x0
kfddde[4].dsknum:                     4 ; 0x734: 0x0004
kfddde[4].state:                      2 ; 0x736: KFDSTA_NORMAL
kfddde[4].ddchgfl:                    0 ; 0x737: 0x00
kfddde[4].dskname:            DATA_0004 ; 0x738: length=9
kfddde[4].fgname:             DATA_0004 ; 0x758: length=9
kfddde[4].crestmp.hi:          33010550 ; 0x778: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[4].crestmp.lo:        2834565120 ; 0x77c: USEC=0x0 MSEC=0x102 SECS=0xf MINS=0x2a
kfddde[4].failstmp.hi:                0 ; 0x780: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[4].failstmp.lo:                0 ; 0x784: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[4].timer:                      0 ; 0x788: 0x00000000
kfddde[4].size:                 1258291 ; 0x78c: 0x00133333
kfddde[4].srRloc.super.hiStart:       0 ; 0x790: 0x00000000
kfddde[4].srRloc.super.loStart:       0 ; 0x794: 0x00000000
kfddde[4].srRloc.super.length:        0 ; 0x798: 0x00000000
kfddde[4].srRloc.incarn:              0 ; 0x79c: 0x00000000
kfddde[4].dskrprtm:                   0 ; 0x7a0: 0x00000000
kfddde[4].start0:                     0 ; 0x7a4: 0x00000000
kfddde[4].size0:                1258291 ; 0x7a8: 0x00133333
kfddde[4].used0:                1258291 ; 0x7ac: 0x00133333
kfddde[4].slot:                       0 ; 0x7b0: 0x00000000
kfddde[5].entry.incarn:               1 ; 0x8e4: A=1 NUMM=0x0
kfddde[5].entry.hash:                 5 ; 0x8e8: 0x00000005
kfddde[5].entry.refer.number:4294967295 ; 0x8ec: 0xffffffff
kfddde[5].entry.refer.incarn:         0 ; 0x8f0: A=0 NUMM=0x0
kfddde[5].dsknum:                     5 ; 0x8f4: 0x0005
kfddde[5].state:                      2 ; 0x8f6: KFDSTA_NORMAL
kfddde[5].ddchgfl:                    0 ; 0x8f7: 0x00
kfddde[5].dskname:            DATA_0005 ; 0x8f8: length=9
kfddde[5].fgname:             DATA_0005 ; 0x918: length=9
kfddde[5].crestmp.hi:          33010550 ; 0x938: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[5].crestmp.lo:        2853560320 ; 0x93c: USEC=0x0 MSEC=0x178 SECS=0x21 MINS=0x2a
kfddde[5].failstmp.hi:                0 ; 0x940: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[5].failstmp.lo:                0 ; 0x944: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[5].timer:                      0 ; 0x948: 0x00000000
kfddde[5].size:                 1258291 ; 0x94c: 0x00133333
kfddde[5].srRloc.super.hiStart:       0 ; 0x950: 0x00000000
kfddde[5].srRloc.super.loStart:       0 ; 0x954: 0x00000000
kfddde[5].srRloc.super.length:        0 ; 0x958: 0x00000000
kfddde[5].srRloc.incarn:              0 ; 0x95c: 0x00000000
kfddde[5].dskrprtm:                   0 ; 0x960: 0x00000000
kfddde[5].start0:                     0 ; 0x964: 0x00000000
kfddde[5].size0:                1258291 ; 0x968: 0x00133333
kfddde[5].used0:                1258255 ; 0x96c: 0x0013330f
kfddde[5].slot:                       0 ; 0x970: 0x00000000
kfddde[6].entry.incarn:               1 ; 0xaa4: A=1 NUMM=0x0
kfddde[6].entry.hash:                 6 ; 0xaa8: 0x00000006
kfddde[6].entry.refer.number:4294967295 ; 0xaac: 0xffffffff
kfddde[6].entry.refer.incarn:         0 ; 0xab0: A=0 NUMM=0x0
kfddde[6].dsknum:                     6 ; 0xab4: 0x0006
kfddde[6].state:                      2 ; 0xab6: KFDSTA_NORMAL
kfddde[6].ddchgfl:                    0 ; 0xab7: 0x00
kfddde[6].dskname:            DATA_0006 ; 0xab8: length=9
kfddde[6].fgname:             DATA_0006 ; 0xad8: length=9
kfddde[6].crestmp.hi:          33010550 ; 0xaf8: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[6].crestmp.lo:        2875645952 ; 0xafc: USEC=0x0 MSEC=0x1b8 SECS=0x36 MINS=0x2a
kfddde[6].failstmp.hi:                0 ; 0xb00: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[6].failstmp.lo:                0 ; 0xb04: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[6].timer:                      0 ; 0xb08: 0x00000000
kfddde[6].size:                 1258291 ; 0xb0c: 0x00133333
kfddde[6].srRloc.super.hiStart:       0 ; 0xb10: 0x00000000
kfddde[6].srRloc.super.loStart:       0 ; 0xb14: 0x00000000
kfddde[6].srRloc.super.length:        0 ; 0xb18: 0x00000000
kfddde[6].srRloc.incarn:              0 ; 0xb1c: 0x00000000
kfddde[6].dskrprtm:                   0 ; 0xb20: 0x00000000
kfddde[6].start0:                     0 ; 0xb24: 0x00000000
kfddde[6].size0:                1258291 ; 0xb28: 0x00133333
kfddde[6].used0:                1258247 ; 0xb2c: 0x00133307
kfddde[6].slot:                       0 ; 0xb30: 0x00000000
kfddde[7].entry.incarn:               1 ; 0xc64: A=1 NUMM=0x0
kfddde[7].entry.hash:                 7 ; 0xc68: 0x00000007
kfddde[7].entry.refer.number:4294967295 ; 0xc6c: 0xffffffff
kfddde[7].entry.refer.incarn:         0 ; 0xc70: A=0 NUMM=0x0
kfddde[7].dsknum:                     7 ; 0xc74: 0x0007
kfddde[7].state:                      2 ; 0xc76: KFDSTA_NORMAL
kfddde[7].ddchgfl:                    0 ; 0xc77: 0x00
kfddde[7].dskname:            DATA_0007 ; 0xc78: length=9
kfddde[7].fgname:             DATA_0007 ; 0xc98: length=9
kfddde[7].crestmp.hi:          33010550 ; 0xcb8: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[7].crestmp.lo:        2898849792 ; 0xcbc: USEC=0x0 MSEC=0x23c SECS=0xc MINS=0x2b
kfddde[7].failstmp.hi:                0 ; 0xcc0: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[7].failstmp.lo:                0 ; 0xcc4: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[7].timer:                      0 ; 0xcc8: 0x00000000
kfddde[7].size:                 1258291 ; 0xccc: 0x00133333
kfddde[7].srRloc.super.hiStart:       0 ; 0xcd0: 0x00000000
kfddde[7].srRloc.super.loStart:       0 ; 0xcd4: 0x00000000
kfddde[7].srRloc.super.length:        0 ; 0xcd8: 0x00000000
kfddde[7].srRloc.incarn:              0 ; 0xcdc: 0x00000000
kfddde[7].dskrprtm:                   0 ; 0xce0: 0x00000000
kfddde[7].start0:                     0 ; 0xce4: 0x00000000
kfddde[7].size0:                1258291 ; 0xce8: 0x00133333
kfddde[7].used0:                1258209 ; 0xcec: 0x001332e1
kfddde[7].slot:                       0 ; 0xcf0: 0x00000000

结合上述信息构造类似磁盘头文件

kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:              2147483648 ; 0x008: TYPE=0x8 NUMB=0x0
kfbh.check:                  3123334821 ; 0x00c: 0xba2a4ea5
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:     ORCLDISKVOL2 ; 0x000: length=12
kfdhdb.driver.reserved[0]:    827084630 ; 0x008: 0x314c4f56
kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum:                        2 ; 0x024: 0x0002
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_NORMAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:               DATA_0002 ; 0x028: length=9
kfdhdb.grpname:                    DATA ; 0x048: length=4
kfdhdb.fgname:                DATA_0002 ; 0x068: length=9
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             33010550 ; 0x3f8: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfdhdb.crestmp.lo:           2793310208 ; 0x3fc: USEC=0x0 MSEC=0x3a2 SECS=0x27 MINS=0x29
kfdhdb.mntstmp.hi:             33049840 ; 0x0b0: HOUR=0x10 DAYS=0x7 MNTH=0x3 YEAR=0x7e1
kfdhdb.mntstmp.lo:           1588567040 ; 0x0b4: USEC=0x0 MSEC=0x3e7 SECS=0x2a MINS=0x17
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80
kfdhdb.dsksize:                 1258291 ; 0x40c: 0x00133333
kfdhdb.pmcnt:                        19 ; 0x0c8: 0x00000013
kfdhdb.fstlocn:                       1 ; 0x0cc: 0x00000001
kfdhdb.altlocn:                       2 ; 0x0d0: 0x00000002
kfdhdb.f1b1locn:                      2 ; 0x0d4: 0x00000002

然后通过kfed merge分别对所有磁盘头进行重构,然后通过dul去识别拷贝文件

+DATA/XFF/spfileXFF.ora.265.869858859
+DATA/XFF/CONTROLFILE/Current.256.796701475
+DATA/XFF/ONLINELOG/group_1.257.796701475
+DATA/XFF/ONLINELOG/group_2.258.796701485
+DATA/XFF/ONLINELOG/group_3.266.796704261
+DATA/XFF/ONLINELOG/group_4.267.796704277
+DATA/XFF/ONLINELOG/group_5.1235.824131117
+DATA/XFF/ONLINELOG/group_6.1115.824200515
+DATA/XFF/ONLINELOG/group_7.1113.824200587
+DATA/XFF/ONLINELOG/group_8.1112.824200627
+DATA/XFF/ONLINELOG/group_9.1066.824201189
+DATA/XFF/ONLINELOG/group_10.1063.824201207
+DATA/XFF/ONLINELOG/group_11.1062.824201287
+DATA/XFF/ONLINELOG/group_12.1061.824201301
File 259 datafile size 2147491840, block size 8192
File 260 datafile size 32186048512, block size 8192
File 261 datafile size 6897541120, block size 8192
File 263 datafile size 27409784832, block size 8192
File 264 datafile size 34359730176, block size 8192
File 280 datafile size 31457288192, block size 8192
File 281 datafile size 31457288192, block size 8192
File 330 datafile size 5242888192, block size 8192
File 334 datafile size 20971528192, block size 8192
File 382 datafile size 20971528192, block size 8192
File 383 datafile size 20971528192, block size 8192
File 384 datafile size 31457288192, block size 8192
File 385 datafile size 31457288192, block size 8192
File 386 datafile size 31457288192, block size 8192
File 387 datafile size 4294975488, block size 8192
File 388 datafile size 4294975488, block size 8192
File 389 datafile size 4294975488, block size 8192
File 390 datafile size 4294975488, block size 8192
File 391 datafile size 4294975488, block size 8192
File 392 datafile size 4294975488, block size 8192
File 394 datafile size 31457288192, block size 8192
File 491 datafile size 20971528192, block size 8192
File 494 datafile size 20971528192, block size 8192
File 578 datafile size 31457288192, block size 8192
File 597 datafile size 20971528192, block size 8192
File 613 datafile size 4294975488, block size 8192
File 638 datafile size 31457288192, block size 8192
File 668 datafile size 16988184576, block size 8192
File 688 datafile size 20971528192, block size 8192
File 740 datafile size 31457288192, block size 8192
File 787 datafile size 31457288192, block size 8192
File 798 datafile size 31457288192, block size 8192
File 806 datafile size 31457288192, block size 8192
File 810 datafile size 31457288192, block size 8192
File 845 datafile size 31457288192, block size 8192
File 886 datafile size 31457288192, block size 8192
File 887 datafile size 31457288192, block size 8192
File 889 datafile size 31457288192, block size 8192
File 892 datafile size 31457288192, block size 8192
File 903 datafile size 31457288192, block size 8192
File 932 datafile size 31457288192, block size 8192
File 933 datafile size 3145736192, block size 8192
File 951 datafile size 20971528192, block size 8192
File 953 datafile size 31457288192, block size 8192
File 955 datafile size 31457288192, block size 8192
File 963 datafile size 31457288192, block size 8192
File 1000 datafile size 31457288192, block size 8192
File 1001 datafile size 12035563520, block size 8192
File 1031 datafile size 31457288192, block size 8192
File 1033 datafile size 31457288192, block size 8192
File 1035 datafile size 31457288192, block size 8192
File 1037 datafile size 31457288192, block size 8192
File 1045 datafile size 31457288192, block size 8192
File 1073 datafile size 4294975488, block size 8192
File 1074 datafile size 4294975488, block size 8192
File 1075 datafile size 4294975488, block size 8192
File 1076 datafile size 8589942784, block size 8192
File 1077 datafile size 31457288192, block size 8192
File 1078 datafile size 8589942784, block size 8192
File 1079 datafile size 8589942784, block size 8192
File 1080 datafile size 4294975488, block size 8192
File 1081 datafile size 8589942784, block size 8192
File 1082 datafile size 8589942784, block size 8192
File 1083 datafile size 8589942784, block size 8192
File 1084 datafile size 8589942784, block size 8192
File 1085 datafile size 32365355008, block size 8192
File 1086 datafile size 9071239168, block size 8192
File 1116 datafile size 8589942784, block size 8192
File 1133 datafile size 8589942784, block size 8192
File 1219 datafile size 31457288192, block size 8192
File 1245 datafile size 31457288192, block size 8192
File 1249 datafile size 31457288192, block size 8192
File 1251 datafile size 31457288192, block size 8192
File 1322 datafile size 4294975488, block size 8192
File 1442 datafile size 31457288192, block size 8192
File 1468 datafile size 1048584192, block size 8192
File 1508 datafile size 31457288192, block size 8192
File 1554 datafile size 4294975488, block size 8192
File 1570 datafile size 31457288192, block size 8192
File 2004 datafile size 31457288192, block size 8192
File 2005 datafile size 31457288192, block size 8192
File 2344 datafile size 31457288192, block size 8192
File 2345 datafile size 31457288192, block size 8192
File 2348 datafile size 31457288192, block size 8192
File 2617 datafile size 10737426432, block size 8192
File 2618 datafile size 21474844672, block size 8192
File 2766 datafile size 33554440192, block size 8192
File 2782 datafile size 31457288192, block size 8192
File 2784 datafile size 31457288192, block size 8192
File 2893 datafile size 31457288192, block size 8192
File 2924 datafile size 31457288192, block size 8192
File 2925 datafile size 31457288192, block size 8192
File 2926 datafile size 31457288192, block size 8192
File 2983 datafile size 31457288192, block size 8192
File 2984 datafile size 31457288192, block size 8192
File 3634 datafile size 31457288192, block size 8192
File 3909 datafile size 31457288192, block size 8192
File 3917 datafile size 31457288192, block size 8192
File 3920 datafile size 31457288192, block size 8192
File 3922 datafile size 31457288192, block size 8192

剩下的事情就比较简单了,通过把spfile,controlfile,datafile文件拷贝出来,本地启动数据库,恢复成功
asm_open_db


open_alert


如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type]

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type]

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在oracle asm的使用过程中由于操作系统层面的错误操作导致asm disk 被破坏,这里列举了几种破坏之后的kfed报错现象(KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type])
asm mount 磁盘组报错(ORA-15040 ORA-15042)

SQL> alter diskgroup DATA mount;
alter diskgroup DATA mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "2" is missing from group number "2"

asm alert日志报错(ORA-15335 ORA-15066 ORA-15196等)

ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0002" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483651] [48] [0 != 1]

kfed查看磁盘头报错
文件文件头(不光是disk header的4k,可能是连续的几个au,甚至更多)可能彻底损坏,一般kfed 读取都会看到KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type]之类错误

[oracle@fcomtaep2 disks]$ kfed read ASMRECO03
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
7FC18D899400 00000000 00000000 00000000 00000000 [................]
  Repeat 27 times
7FC18D8995C0 FEEE0001 0001FFFF FFFF0000 00000FFF [................]
7FC18D8995D0 00000000 00000000 00000000 00000000 [................]
  Repeat 1 times
7FC18D8995F0 00000000 00000000 00000000 AA550000 [..............U.]
7FC18D899600 20494645 54524150 00010000 0000005C [EFI PART....\...] <==== **** Here ******
7FC18D899610 BD82BBB3 00000000 00000001 00000000 [................]
7FC18D899620 0FFFFFFF 00000000 00000022 00000000 [........".......]
7FC18D899630 0FFFFFDE 00000000 FD8857E5 42D7B49B [.........W.....B]
7FC18D899640 0901FA87 6B3DB5AA 00000002 00000000 [......=k........]
7FC18D899650 00000080 00000080 FE48EB77 00000000 [........w.H.....]
7FC18D899660 00000000 00000000 00000000 00000000 [................]
  Repeat 25 times
7FC18D899800 EBD0A0A2 4433B9E5 B668C087 C79926B7 [......3D..h..&..]
7FC18D899810 5381F6DF 4626F988 0E4F468D D78D3B28 [...S..&F.FO.(;..]
7FC18D899820 000007A1 00000000 0FFFF85F 00000000 [........_.......]
7FC18D899830 00000000 00000000 00720070 006D0069 [........p.r.i.m.]
7FC18D899840 00720061 00000079 00000000 00000000 [a.r.y...........]
7FC18D899850 00000000 00000000 00000000 00000000 [................]
 Repeat 186 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

“EFI PART”是分区的元数据,一般是被分区导致asm disk损坏.

[ebernal@dbaasm new2]$ kfed read emcpowerl | head -25
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
2ABD671E9400 00000000 00000000 00000000 00000000 [................]
  Repeat 31 times
2ABD671E9600 4542414C 454E4F4C 00000001 00000000 [LABELONE........]
2ABD671E9610 E4E1DDB1 00000020 324D564C 31303020 [.... ...LVM2 001] <==== **** Here ******
2ABD671E9620 50365A77 71327874 34303156 4B4E6136 [wZ6Ptx2qV1046aNK]
2ABD671E9630 35395159 5147634C 487A5A38 63575A37 [YQ95LcGQ8ZzH7ZWc]
2ABD671E9640 00000000 00000019 00030000 00000000 [................]
2ABD671E9650 00000000 00000000 00000000 00000000 [................]
2ABD671E9660 00000000 00000000 00001000 00000000 [................]
2ABD671E9670 0002F000 00000000 00000000 00000000 [................]
2ABD671E9680 00000000 00000000 00000000 00000000 [................]
  Repeat 215 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

“LVM2 001” 是逻辑卷的名字,该asm disk很可能被做为lvm管理而被破坏

[ebernal@dbaasm tars]$ kfed read rhdisk16
kfbh.endian:                         65 ; 0x000: 0x41
kfbh.hard:                           73 ; 0x001: 0x49
kfbh.type:                           88 ; 0x002: *** Unknown Enum ***
kfbh.datfmt:                         32 ; 0x003: 0x20
kfbh.block.blk:              1111709260 ; 0x004: blk=1111709260
kfbh.block.obj:              1634861056 ; 0x008: file=131072
kfbh.check:                         119 ; 0x00c: 0x00000077
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
2B6FE2AC1400 20584941 4243564C 61720000 00000077  [AIX LVCB..raw...] <==== **** Here ******
2B6FE2AC1410 00000000 00000000 00000000 00000000  [................]
2B6FE2AC1420 00000000 00000000 30300000 38306430  [..........000d08]
2B6FE2AC1430 30306131 34643030 30303030 31303030  [1a0000d400000001]
2B6FE2AC1440 61006533 766C6D73 7461645F 00003161  [3e.asmlv_data1..]
2B6FE2AC1450 00000000 00000000 00000000 00000000  [................]
        Repeat 2 times
2B6FE2AC1480 54000000 4D206575 20207961 31312037  [...Tue May  7 11]
2B6FE2AC1490 3A33343A 32203633 0A333130 00000000  [:43:36 2013.....]
2B6FE2AC14A0 65755400 79614D20 20372020 343A3131  [.Tue May  7 11:4]
2B6FE2AC14B0 34323A38 31303220 00000A33 44000000  [8:24 2013......D]
2B6FE2AC14C0 41313830 30303444 6D6D7900 02007900  [081AD400.ymm.y..]
2B6FE2AC14D0 0100E40C 656E6F4E 00000000 00000000  [....None........]
2B6FE2AC14E0 00000000 00000000 00000000 00000000  [................]
        Repeat 14 times
2B6FE2AC15D0 00000000 00000000 65310000 61653934  [..........1e49ea]
2B6FE2AC15E0 342E3862 00000000 00000000 00000000  [b8.4............]
2B6FE2AC15F0 00000000 00000000 00000000 00000000  [................]
  Repeat 224 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][88]

这里的“AIX LVCB..raw” 是AIX OS volume 的元数据库,也就是说,asm disk 被作为了aix os层面破坏

[oracle@dbep2 disks]$ kfed read asm-disk3
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
06000000 00000000 00000000 00000000 00000000 [................]
  Repeat 25 times
0602100 51e2b7f6 00ed4e00 00000000 00000001  [...Q.N..........]
0602120 00000000 0000000b 00000100 0000003c [............<...]
0602140 00000242 0000007b 5d8468e7 6147782a [B...{....h.]*xGa]
0602160 d17851a2 327552e2 00000000 00000000 [.Qx..Ru2........]
0602200 00000000 00000000 3130752f 91a4f000 [......../u01....] <==== **** Here ******
0602220 ff8808e4 d5104cff 000000ac 00000100 [.....L..........]
0602240 00000000 00000000 00000000 09d18000 [................]
  Repeat 254 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][88]

这里的/u01很可能表明该asm disk被文件系统覆盖

对于asm disk的各种破坏情况,如果是normal/high冗余,那么asm dg没有问题,可以考虑通过删除异常盘,然后重新加入;如果是外部冗余遭遇到asm disk 被破坏,一般asm disk 会dismount,而且无法正常mount,如果有备份的磁盘头,可以尝试还原磁盘头,mount 磁盘组,然后只读方式迁移数据;如果没有备份磁盘头或者还原之后也无法mount,可能需要通过一些额外的方式处理比如通过工具在asm dismount状态下恢复数据文件,甚至通过对asm block/oracle block碎片重组的方式恢复数据.参考相关文章:
oracle asm系列文章汇总
pvid=yes导致asm无法mount
asm disk header 彻底损坏恢复
分区无法识别导致asm diskgroup无法mount
oracle asm disk格式化恢复—格式化为ext4文件系统
oracle asm disk格式化恢复—格式化为ntfs文件系统
asm disk误设置pvid导致asm diskgroup无法mount恢复
分享oracleasm createdisk重新创建asm disk后数据0丢失恢复案例
ORA-15042: ASM disk “N” is missing from group number “M” 故障恢复
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com