kfed恢复误删除磁盘组

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:kfed恢复误删除磁盘组

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在某些情况下,可能因为误操作,不小先drop diskgroup,这个时候千万别紧张,出现此类故障,可以通过kfed进行完美恢复(数据0丢失).如果进一步损坏了相关asm disk,那后续恢复就很麻烦了,可能需要使用dul扫描磁盘来进行抢救性恢复,而且可能导致数据丢失.
创建测试磁盘组xifenfei

[grid@xifenfei ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.4.0 Production on Thu Apr 30 15:12:08 2015
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Automatic Storage Management option
SQL>  select name,path,header_status from v$asm_disk;
NAME                           PATH                           HEADER_STATU
------------------------------ ------------------------------ ------------
                               /dev/asm-disk3                 CANDIDATE
DATA_0000                      /dev/asm-disk1                 MEMBER
DATA_0001                      /dev/asm-disk2                 MEMBER
SQL> create diskgroup xifenfei external redundancy disk '/dev/asm-disk3';
Diskgroup created.
SQL> select name,path,header_status from v$asm_disk;
NAME                           PATH                           HEADER_STATU
------------------------------ ------------------------------ ------------
XIFENFEI_0000                  /dev/asm-disk3                 MEMBER
DATA_0000                      /dev/asm-disk1                 MEMBER
DATA_0001                      /dev/asm-disk2                 MEMBER

使用/dev/asm-disk3这个磁盘创建磁盘组xifenfei

创建表,存储在xifenfei磁盘组中

[oracle@xifenfei ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Thu Apr 30 15:14:55 2015
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options
SQL> create tablespace xifenfei datafile '+xifenfei' size 100M;
Tablespace created.
SQL> select name from v$datafile;
NAME
--------------------------------------------------------------------------------
+DATA/xifenfei/datafile/system.256.878224279
+DATA/xifenfei/datafile/sysaux.257.878224279
+DATA/xifenfei/datafile/undotbs1.258.878224279
+DATA/xifenfei/datafile/users.259.878224279
+XIFENFEI/xifenfei/datafile/xifenfei.256.878397315
SQL> create table t_xifenfei tablespace xifenfei
  2  as select * from dba_objects;
Table created.
SQL> select count(*) from t_xifenfei;
  COUNT(*)
----------
     86259

通过在磁盘组中创建表空间,从而实现表xifenfei存放在测试磁盘组中

尝试删除磁盘组xifenfei

SQL> drop diskgroup xifenfei;
drop diskgroup xifenfei
*
ERROR at line 1:
ORA-15039: diskgroup not dropped
ORA-15053: diskgroup "XIFENFEI" contains existing files
SQL> drop diskgroup xifenfei  including contents;
drop diskgroup xifenfei  including contents
*
ERROR at line 1:
ORA-15039: diskgroup not dropped
ORA-15027: active use of diskgroup "XIFENFEI" precludes its dismount
[grid@xifenfei ~]$ asmcmd
ASMCMD> lsof
DB_Name   Instance_Name  Path
xifenfei  xifenfei       +data/xifenfei/controlfile/current.260.878224379
xifenfei  xifenfei       +data/xifenfei/datafile/sysaux.257.878224279
xifenfei  xifenfei       +data/xifenfei/datafile/system.256.878224279
xifenfei  xifenfei       +data/xifenfei/datafile/undotbs1.258.878224279
xifenfei  xifenfei       +data/xifenfei/datafile/users.259.878224279
xifenfei  xifenfei       +data/xifenfei/onlinelog/group_1.261.878224381
xifenfei  xifenfei       +data/xifenfei/onlinelog/group_2.262.878224383
xifenfei  xifenfei       +data/xifenfei/onlinelog/group_3.263.878224385
xifenfei  xifenfei       +data/xifenfei/tempfile/temp.264.878224395
xifenfei  xifenfei       +xifenfei/xifenfei/datafile/xifenfei.256.878397315

由于xifenfei磁盘组被实例使用,因此磁盘组无法删除,报ORA-15027错误
由于xifenfei磁盘组中有文件,因此磁盘组无法删除,报ORA-15053错误
如果这两个阻止你误删除磁盘组的警告依然不能救你,那我也不好多说啥了,只能向我一样继续往下

关闭数据库实例,删除磁盘组

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> drop diskgroup xifenfei;
drop diskgroup xifenfei
*
ERROR at line 1:
ORA-15039: diskgroup not dropped
ORA-15053: diskgroup "XIFENFEI" contains existing files
SQL> drop diskgroup xifenfei  including contents;
Diskgroup dropped.
SQL>  select name,path,header_status from v$asm_disk;
NAME                           PATH                           HEADER_STATU
------------------------------ ------------------------------ ------------
                               /dev/asm-disk3                 FORMER
DATA_0000                      /dev/asm-disk1                 MEMBER
DATA_0001                      /dev/asm-disk2                 MEMBER
SQL> alter diskgroup xifenfei mount;
alter diskgroup xifenfei mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "XIFENFEI" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup
"XIFENFEI"

磁盘组被drop之后,无法正常mount,mount之时报ORA-15063凑无

kfed恢复删除磁盘组

[grid@xifenfei ~]$ kfed read /dev/asm-disk3 >/tmp/disk3-0-0
[grid@xifenfei ~]$ kfed  read /dev/asm-disk3  blkn=1 >/tmp/disk3-0-1
[grid@xifenfei ~]$ kfed  read /dev/asm-disk3  aun=1 >/tmp/disk3-1-0
通过vi修改这些/tmp/disk3-*中的部分值
[grid@xifenfei ~]$ kfed merge /dev/asm-disk3 text=/tmp/disk3-0-0
[grid@xifenfei ~]$ kfed merge /dev/asm-disk3  blkn=1 text=/tmp/disk3-0-1
[grid@xifenfei ~]$ kfed merge /dev/asm-disk3 aun=1 text=/tmp/disk3-1-0

查询修复后的asm disk

SQL> col path for a30
SQL> set lines 150
SQL> select name,path,header_status from v$asm_disk;
NAME                           PATH                           HEADER_STATU
------------------------------ ------------------------------ ------------
                               /dev/asm-disk3                 MEMBER
DATA_0000                      /dev/asm-disk1                 MEMBER
DATA_0001                      /dev/asm-disk2                 MEMBER

尝试mount xifenfei 磁盘组

SQL> alter diskgroup xifenfei mount;
Diskgroup altered.
SQL> select name,path,header_status from v$asm_disk;
NAME                           PATH                           HEADER_STATU
------------------------------ ------------------------------ ------------
XIFENFEI_0000                  /dev/asm-disk3                 MEMBER
DATA_0000                      /dev/asm-disk1                 MEMBER
DATA_0001                      /dev/asm-disk2                 MEMBER

测试恢复后磁盘组

SQL> startup
ORACLE instance started.
Total System Global Area  952020992 bytes
Fixed Size                  2258960 bytes
Variable Size             306186224 bytes
Database Buffers          637534208 bytes
Redo Buffers                6041600 bytes
Database mounted.
Database opened.
SQL>  select count(*) from t_xifenfei;
  COUNT(*)
----------
     86259

这里证明,当磁盘组被误删除后,立即停止进一步损坏,可以通过kfed进行完美恢复
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

dbms_diskgroup拷贝block/datafile

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:dbms_diskgroup拷贝block/datafile

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

对于asm而言,如果我们要操作里面的数据文件,虽然从11.2开始有asmcmd的cp可能进行拷贝,但是如果想对数据文件中的某个block进行拷贝出来asm到文件系统(或者拷贝某个文件系统的block到asm中),还是比较麻烦的事情(请见:bbed修改ASM中数据)。其实oracle官方提供了dbms_diskgroup这个包,可以通过sqlplus直接操作asm里面的block/datafile,非常方便,这里简单列举几个例子:

dbms_diskgroup获取asm中文件属性

SQL> declare
  2  v_filename varchar2(4000);
  3  v_filetype number;
  4  v_filesize number;
  5  v_lbks number;
  6  v_typename varchar2(4000);
  7  begin
  8  dbms_output.enable(5000);
  9  v_filename := '&file_name';
 10  dbms_diskgroup.getfileattr(v_filename,v_filetype,v_filesize,v_lbks);
 11  select decode(v_filetype,1,'Control File',2,'Data File',3,'Online Log File',4,'Archive Log',5,'Trace File',6,'Temporary File',
 12  7,'Not Used',8,'Not Used',9,'Backup Piece',10,'Incremental Backup Piece',11,'Archive Backup Piece',12,'Data File Copy',
 13  13,'Spfile',14,'Disaster Recovery Configuration',15,'Storage Manager Disk',16,'Change Tracking File',17,'Flashback Log File',
 14  18,'DataPump Dump File',19,'Cross Platform Converted File',20,'Autobackup',21,'Any OS file',22,'Block Dump File',
 15  23,'CSS Voting File',24,'CRS') into v_typename from dual;
 16  dbms_output.put_line('File: '||v_filename); dbms_output.new_line;
 17  dbms_output.put_line('Type: '||v_filetype||' '||v_typename); dbms_output.new_line;
 18  dbms_output.put_line('Size (Logical Block Size): '||v_filesize); dbms_output.new_line;
 19  dbms_output.put_line('Logical Block Size: '||v_lbks); dbms_output.new_line;
 20  end;
 21  /
Enter value for file_name: +DATA/xifenfei/datafile/system.256.878224279
old   9: v_filename := '&file_name';
new   9: v_filename := '+DATA/xifenfei/datafile/system.256.878224279';
File: +DATA/xifenfei/datafile/system.256.878224279
Type: 12 Data File Copy
Size (Logical Block Size): 94720
Logical Block Size: 8192
PL/SQL procedure successfully completed.

创建测试表t_xifenfei

SQL> create table t_xifenfei tablespace users
  2  as select 'www.xifenfei.com' xifenfei from dual;
Table created.
SQL> select rowid,xifenfei from t_xifenfei;
ROWID              XIFENFEI
------------------ ----------------
AAAVU3AAEAAAACrAAA www.xifenfei.com
SQL> select
  2   dbms_rowid.rowid_relative_fno(rowid) rel_fno,
  3  dbms_rowid.rowid_block_number(rowid )block_no
  4  from t_xifenfei;
   REL_FNO   BLOCK_NO
---------- ----------
         4        171
SQL> alter system checkpoint;
System altered.
SQL> /
System altered.
SQL> alter system switch logfile;
System altered.
SQL> select name from v$datafile where file#=4;
NAME
--------------------------------------------------------------------------------
+DATA/xifenfei/datafile/users.259.878224279

dbms_diskgroup拷贝asm datafile 4 block 171

SQL> declare
  2  v_AsmFilename varchar2(4000);
  3  v_FsFilename varchar2(4000);
  4  v_offstart number;
  5  v_numblks number;
  6  v_filetype number;
  7  v_filesize number;
  8  v_lbks number;
  9  v_typename varchar2(4000);
 10  v_pblksize number;
 11  v_handle number;
 12  begin
 13  dbms_output.enable(500000);
 14  v_AsmFilename := '&ASM_File_Name';
 15  v_offstart := '&block_to_extract';
 16  v_numblks := '&number_of_blocks_to_extract';
 17  v_FsFilename := '&FileSystem_File_Name';
 18  dbms_diskgroup.getfileattr(v_AsmFilename,v_filetype,v_filesize,v_lbks);
 19  dbms_diskgroup.open(v_AsmFilename,'r',v_filetype,v_lbks,v_handle,v_pblksize,v_filesize);
 20  dbms_diskgroup.close(v_handle);
 21  select decode(v_filetype,1,'Control File',2,'Data File',3,'Online Log File',4,'Archive Log',5,'Trace File',6,'Temporary File',
 22  7,'Not Used',8,'Not Used',9,'Backup Piece',10,'Incremental Backup Piece',11,'Archive Backup Piece',12,'Data File Copy',
 23  13,'Spfile',14,'Disaster Recovery Configuration',15,'Storage Manager Disk',16,'Change Tracking File',17,'Flashback Log File',
 24  18,'DataPump Dump File',19,'Cross Platform Converted File',20,'Autobackup',21,'Any OS file',22,'Block Dump File',
 25  23,'CSS Voting File',24,'CRS') into v_typename from dual;
 26  dbms_output.put_line('File: '||v_AsmFilename); dbms_output.new_line;
 27  dbms_output.put_line('Type: '||v_filetype||' '||v_typename); dbms_output.new_line;
 28  dbms_output.put_line('Size (in logical blocks): '||v_filesize); dbms_output.new_line;
 29  dbms_output.put_line('Logical Block Size: '||v_lbks); dbms_output.new_line;
 30  dbms_output.put_line('Physical Block Size: '||v_pblksize); dbms_output.new_line;
 31  dbms_diskgroup.patchfile(v_AsmFilename,v_filetype,v_lbks,v_offstart,0,v_numblks,v_FsFilename,v_filetype,1,1);
 32  end;
 33  /
Enter value for asm_file_name: +DATA/xifenfei/datafile/users.259.878224279
old  14: v_AsmFilename := '&ASM_File_Name';
new  14: v_AsmFilename := '+DATA/xifenfei/datafile/users.259.878224279';
Enter value for block_to_extract: 171
old  15: v_offstart := '&block_to_extract';
new  15: v_offstart := '171';
Enter value for number_of_blocks_to_extract: 1
old  16: v_numblks := '&number_of_blocks_to_extract';
new  16: v_numblks := '1';
Enter value for filesystem_file_name: /tmp/xifenfei.dbf
old  17: v_FsFilename := '&FileSystem_File_Name';
new  17: v_FsFilename := '/tmp/xifenfei.dbf';
File: +DATA/xifenfei/datafile/users.259.878224279
Type: 12 Data File Copy
Size (in logical blocks): 640
Logical Block Size: 8192
Physical Block Size: 512
PL/SQL procedure successfully completed.
[grid@xifenfei ~]$ ls -l /tmp/xifenfei.dbf
-rw-r----- 1 grid oinstall 16384 Apr 28 15:55 /tmp/xifenfei.dbf

这里注意拷贝出来的block size 为8192,由于默认写了block 0信息,因此这里显示大小为2*block size=16384

bbed修改拷贝出来block内容

SQL> select dump('xifenfei',16) from dual;
DUMP('XIFENFEI',16)
-------------------------------------
Typ=96 Len=8: 78,69,66,65,6e,66,65,69
SQL> select dump('XIFENFEI',16) from dual;
DUMP('XIFENFEI',16)
-------------------------------------
Typ=96 Len=8: 58,49,46,45,4e,46,45,49
[oracle@xifenfei tmp]$ bbed filename='/tmp/xifenfei.dbf' blocksize=8192
Password:
BBED: Release 2.0.0.0.0 - Limited Production on Tue Apr 28 16:24:35 2015
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
************* !!! For Oracle Internal Use only !!! ***************
BBED> show
        FILE#           0
        BLOCK#          1
        OFFSET          0
        DBA             0x00000000 (0 0,1)
        FILENAME        /tmp/xifenfei.dbf
        BIFILE          bifile.bbd
        LISTFILE
        BLOCKSIZE       8192
        MODE            Browse
        EDIT            Unrecoverable
        IBASE           Dec
        OBASE           Dec
        WIDTH           80
        COUNT           512
        LOGFILE         log.bbd
        SPOOL           No
BBED> map
 File: /tmp/xifenfei.dbf (0)
 Block: 1                                     Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Table/Cluster)
 struct kcbh, 20 bytes                      @0
 struct ktbbh, 96 bytes                     @20
 struct kdbh, 14 bytes                      @124
 struct kdbt[1], 4 bytes                    @138
 sb2 kdbr[1]                                @142
 ub1 freespace[8024]                        @144
 ub1 rowdata[20]                            @8168
 ub4 tailchk                                @8188
BBED> p *kdbr[0]
rowdata[0]
----------
ub1 rowdata[0]                              @8168     0x2c
BBED> d /v offset 8168
 File: /tmp/xifenfei.dbf (0)
 Block: 1       Offsets: 8168 to 8191  Dba:0x00000000
-------------------------------------------------------
 2c000110 7777772e 78696665 6e666569 l ,...www.xifenfei
 2e636f6d 020624bc                   l .com..$.
 <16 bytes per line>
BBED> r /x c
BBED-00200: invalid keyword (r)
BBED> x /rc
rowdata[0]                                  @8168
----------
flag@8168: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@8169: 0x00
cols@8170:    1
col   0[16] @8171: www.xifenfei.com
BBED> set mode edit
        MODE            Edit
BBED> set offset 8171
        OFFSET          8171
BBED> set count 32
        COUNT           32
BBED> d
 File: /tmp/xifenfei.dbf (0)
 Block: 1                Offsets: 8171 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 10777777 2e786966 656e6665 692e636f 6d020624 bc
 <32 bytes per line>
BBED>
BBED> set offset +5
        OFFSET          8176
BBED> d
 File: /tmp/xifenfei.dbf (0)
 Block: 1                Offsets: 8176 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 78696665 6e666569 2e636f6d 020624bc
 <32 bytes per line>
BBED> m /x 58494645
 File: /tmp/xifenfei.dbf (0)
 Block: 1                Offsets: 8176 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 58494645 6e666569 2e636f6d 020624bc
 <32 bytes per line>
BBED> set offset +4
        OFFSET          8180
BBED> d
 File: /tmp/xifenfei.dbf (0)
 Block: 1                Offsets: 8180 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 6e666569 2e636f6d 020624bc
 <32 bytes per line>
BBED> m /x 4e464549
 File: /tmp/xifenfei.dbf (0)
 Block: 1                Offsets: 8180 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 4e464549 2e636f6d 020624bc
 <32 bytes per line>
BBED> d /v offset 8168
 File: /tmp/xifenfei.dbf (0)
 Block: 1       Offsets: 8168 to 8191  Dba:0x00000000
-------------------------------------------------------
 2c000110 7777772e 58494645 4e464549 l ,...www.XIFENFEI
 2e636f6d 020624bc                   l .com..$.
 <16 bytes per line>
BBED> x /rc
rowdata[0]                                  @8168
----------
flag@8168: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@8169: 0x00
cols@8170:    1
col   0[16] @8171: www.XIFENFEI.com
BBED> sum apply
Check value for File 0, Block 1:
current = 0x3060, required = 0x3060

这里通过bbed把拷贝出来的datafile 4 block 171中的www.xifenfei.com修改为www.XIFENFEI.com

dbms_diskgroup拷贝os block to asm datafile 4 block 171

SQL> declare
  2  v_FsFileName varchar2(4000);
  3  v_AsmFileName varchar2(4000);
  4  v_FsFileType number;
  5  v_AsmFileType number;
  6  v_offstart number;
  7  v_filesize number;
  8  v_lbks number;
  9  v_typename varchar2(4000);
 10  v_handle number;
 11  error number;
 12  txt varchar2(4000);
 13  begin
 14  dbms_output.enable(500000);
 15  v_FsFileName := '&file_with_patched_block';
 16  v_AsmFileName := '&file_to_patch_in_ASM';
 17  v_offstart := '&block_to_patch';
 18  dbms_diskgroup.getfileattr(v_AsmFileName,v_AsmFileType,v_filesize,v_lbks);
 19  select decode(v_AsmFileType,1,'Control File',2,'Data File',3,'Online Log File',4,'Archive Log',5,'Trace File',6,'Temporary File',
 20  7,'Not Used',8,'Not Used',9,'Backup Piece',10,'Incremental Backup Piece',11,'Archive Backup Piece',12,'Data File Copy',
 21  13,'Spfile',14,'Disaster Recovery Configuration',15,'Storage Manager Disk',16,'Change Tracking File',
 22  17,'Flashback Log File',
 23  18,'DataPump Dump File',19,'Cross Platform Converted File',20,'Autobackup',21,'Any OS file',22,'Block Dump File',
 24  23,'CSS Voting File',24,'CRS') into v_typename from dual;
 25  dbms_output.put_line('File: '||v_AsmFileName); dbms_output.new_line;
 26  dbms_output.put_line('Type: '||v_AsmFileType||' '||v_typename); dbms_output.new_line;
 27  dbms_output.put_line('Size: '||v_filesize); dbms_output.new_line;
 28  dbms_output.put_line('Logical Block Size: '||v_lbks); dbms_output.new_line;
 29  dbms_diskgroup.patchfile(v_FsFileName,12,v_lbks,1,0,1,v_AsmFileName,v_AsmFileType,v_offstart,0);
 30  end;
 31  /
Enter value for file_with_patched_block: /tmp/xifenfei.dbf
old  15: v_FsFileName := '&file_with_patched_block';
new  15: v_FsFileName := '/tmp/xifenfei.dbf';
Enter value for file_to_patch_in_asm: +DATA/xifenfei/datafile/users.259.878224279
old  16: v_AsmFileName := '&file_to_patch_in_ASM';
new  16: v_AsmFileName := '+DATA/xifenfei/datafile/users.259.878224279';
Enter value for block_to_patch: 171
old  17: v_offstart := '&block_to_patch';
new  17: v_offstart := '171';
File: +DATA/xifenfei/datafile/users.259.878224279
Type: 12 Data File Copy
Size: 640
Logical Block Size: 8192
PL/SQL procedure successfully completed.

验证修改block是否正确

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area  952020992 bytes
Fixed Size                  2258960 bytes
Variable Size             306186224 bytes
Database Buffers          637534208 bytes
Redo Buffers                6041600 bytes
Database mounted.
Database opened.
SQL> select rowid,xifenfei from t_xifenfei;
ROWID              XIFENFEI
------------------ ----------------
AAAVU3AAEAAAACrAAA www.XIFENFEI.com

dbms_diskgroup拷贝asm datafile to os

SQL> declare
  2  v_AsmFileName varchar2(4000);
  3  v_FsFileName varchar2(4000);
  4  v_filetype number;
  5  v_filesize number;
  6  v_lbks number;
  7  v_typename varchar2(4000);
  8  v_pblksize number;
  9  v_handle number;
 10  begin
 11  dbms_output.enable(500000);
 12  v_AsmFileName := '&ASM_file_name';
 13  v_FsFileName := '&FileSystem_file_name';
 14  dbms_diskgroup.getfileattr(v_AsmFileName,v_filetype,v_filesize,v_lbks);
 15  dbms_diskgroup.open(v_AsmFileName,'r',v_filetype,v_lbks,v_handle,v_pblksize,v_filesize);
 16  dbms_diskgroup.close(v_handle);
 17  select decode(v_filetype,1,'Control File',2,'Data File',3,'Online Log File',4,'Archive Log',5,'Trace File',6,'Temporary File',
 18  7,'Not Used',8,'Not Used',9,'Backup Piece',10,'Incremental Backup Piece',11,'Archive Backup Piece',12,'Data File Copy',
 19  13,'Spfile',14,'Disaster Recovery Configuration',15,'Storage Manager Disk',16,'Change Tracking File',17,'Flashback Log File',
 20  18,'DataPump Dump File',19,'Cross Platform Converted File',20,'Autobackup',21,'Any OS file',22,'Block Dump File',
 21  23,'CSS Voting File',24,'CRS') into v_typename from dual;
 22  dbms_output.put_line('File: '||v_AsmFileName); dbms_output.new_line;
 23  dbms_output.put_line('Type: '||v_filetype||' '||v_typename); dbms_output.new_line;
 24  dbms_output.put_line('Size (in logical blocks): '||v_filesize); dbms_output.new_line;
 25  dbms_output.put_line('Logical Block Size: '||v_lbks); dbms_output.new_line;
 26  dbms_output.put_line('Physical Block Size: '||v_pblksize); dbms_output.new_line;
 27  dbms_diskgroup.patchfile(v_AsmFileName,v_filetype,v_lbks,1,0,v_filesize,v_FsFileName,2,1,1);
 28  end;
 29  /
Enter value for asm_file_name: +DATA/xifenfei/datafile/users.259.878224279
old  12: v_AsmFileName := '&ASM_file_name';
new  12: v_AsmFileName := '+DATA/xifenfei/datafile/users.259.878224279';
Enter value for filesystem_file_name: /tmp/users01.dbf
old  13: v_FsFileName := '&FileSystem_file_name';
new  13: v_FsFileName := '/tmp/users01.dbf';
File: +DATA/xifenfei/datafile/users.259.878224279
Type: 12 Data File Copy
Size (in logical blocks): 640
Logical Block Size: 8192
Physical Block Size: 512
PL/SQL procedure successfully completed.
[grid@xifenfei ~]$ ls -l /tmp/users01.dbf
-rw-r----- 1 grid oinstall 5251072 Apr 28 16:39 /tmp/users01.dbf

通过上述几个简单例子说明:dbms_diskgroup可以看asm file的属性,可以拷贝asm中的datafile中的某个block到os,也可以从os拷贝到asm,可以从asm中直接拷贝文件文件到os等功能

asm disk误设置pvid导致asm diskgroup无法mount恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:asm disk误设置pvid导致asm diskgroup无法mount恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友找到我说他们把以前存储到AIX直连的存储切换为含光纤交换机的存储网络后,RAC无法启动,让我给予支持.通过分析是由于换链路之后开始磁盘顺序不对,维护人员对其asm disk 设置了pvid,导致asm 磁盘组无法正常mount,从而使得含votedisk的dg的asm disk无法正常访问,从而RAC的cssd进程无法启动,同样数据文件的磁盘组也无法mount,通过kfed修复成功,实现数据0丢失.
平台版本信息(2节点RAC)

$ sqlplus -v
SQL*Plus: Release 11.2.0.4.0 Production
$ uname -a
AIX db2 1 7 00F9733E4C00

GI日志报错信息

2014-12-20 16:44:08.769:
[ohasd(6946818)]CRS-2769:Unable to failover resource 'ora.diskmon'.
2014-12-20 16:44:11.775:
[cssd(9502756)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds;
Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log
2014-12-20 16:44:26.791:
[cssd(9502756)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds;
、Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log
2014-12-20 16:44:41.812:
[cssd(9502756)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds;
Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log

从这里可以看出来是由于RAC启动过程中无法获得votedisk使得其无法正常启动,通过分析日志找出来votedisk相关磁盘

2014-12-15 17:36:15.424:
[cssd(10027070)]CRS-1605:CSSD voting file is online: /dev/rhdisk4; details in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log
2014-12-15 17:36:15.433:
[cssd(10027070)]CRS-1605:CSSD voting file is online: /dev/rhdisk5; details in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log
2014-12-15 17:36:15.445:
[cssd(10027070)]CRS-1605:CSSD voting file is online: /dev/rhdisk6; details in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log

从这里可以知道rhdisk4,5,6为votedisk对应磁盘,使用kfed查看磁盘头信息

$kfed read /dev/rhdisk4
kfbh.endian:                        201 ; 0x000: 0xc9
kfbh.hard:                          194 ; 0x001: 0xc2
kfbh.type:                          212 ; 0x002: *** Unknown Enum ***
kfbh.datfmt:                        193 ; 0x003: 0xc1
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
1102BEE00 C9C2D4C1 00000000 00000000 00000000  [................]
1102BEE10 00000000 00000000 00000000 00000000  [................]
        Repeat 6 times
1102BEE80 00F9733D 67553E0A 00000000 00000000  [..s=gU>.........]
1102BEE90 00000000 00000000 00000000 00000000  [................]
  Repeat 246 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][212]
$kfed read /dev/rhdisk4 blkn=1
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            2 ; 0x002: KFBTYP_FREESPC
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                       1 ; 0x004: blk=1
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  3883664132 ; 0x00c: 0xe77c0304
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdfsb.aunum:                         0 ; 0x000: 0x00000000
kfdfsb.max:                         254 ; 0x004: 0x00fe
kfdfsb.cnt:                          23 ; 0x006: 0x0017
kfdfsb.bound:                         0 ; 0x008: 0x0000
kfdfsb.flag:                          1 ; 0x00a: B=1
kfdfsb.ub1spare:                      0 ; 0x00b: 0x00
kfdfsb.spare[0]:                      0 ; 0x00c: 0x00000000
kfdfsb.spare[1]:                      0 ; 0x010: 0x00000000
kfdfsb.spare[2]:                      0 ; 0x014: 0x00000000
kfdfse[0].fse:                      119 ; 0x018: FREE=0x7 FRAG=0x7
kfdfse[1].fse:                       16 ; 0x019: FREE=0x0 FRAG=0x1
…………
$kfed read /dev/rhdisk4 blkn=510
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                     254 ; 0x004: blk=254
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  3460116983 ; 0x00c: 0xce3d31f7
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:         ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]:            0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                186646528 ; 0x020: 0x0b200000
kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        2 ; 0x026: KFDGTP_NORMAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:                CRS_0000 ; 0x028: length=8
kfdhdb.grpname:                     CRS ; 0x048: length=3
kfdhdb.fgname:                 CRS_0000 ; 0x068: length=8
…………

由上述分析可以基本上确定是asm disk header 被破坏,进一步分析破坏原因

[db2/dev#]lspv
hdisk0          00f9733ef7cf27e9                    rootvg          active
hdisk1          00f9733e21b953e6                    rootvg          active
hdisk2          00f9733e21b97a83                    appvg           active
hdisk3          00f9733e21b98434                    appvg           active
hdisk4          00f9733d67553e0a                    None
hdisk5          00f9733d67553f31                    None
hdisk6          00f9733d67554011                    None
hdisk7          00f9733d67554165                    None
hdisk8          00f9733d675541e5                    None
hdisk9          00f9733d675542e4                    None
hdisk10         none                                None
[db2/dev#]ls -l rhdisk*
crw-------    2 root     system       24,  1 Oct 18 11:45 rhdisk0
crw-------    1 root     system       24,  3 Oct 18 13:27 rhdisk1
crw-------    1 root     system       24,  5 Dec 20 20:02 rhdisk10
crw-------    1 root     system       24,  2 Oct 18 13:32 rhdisk2
crw-------    1 root     system       24,  0 Oct 18 13:32 rhdisk3
crw-rw----    1 grid     asmadmin     24,  8 Dec 20 20:02 rhdisk4
crw-rw----    1 grid     asmadmin     24,  9 Dec 20 20:02 rhdisk5
crw-rw----    1 grid     asmadmin     24, 10 Dec 20 20:02 rhdisk6
crw-rw----    1 grid     asmadmin     24,  4 Dec 20 20:02 rhdisk7
crw-rw----    1 grid     asmadmin     24,  6 Dec 20 20:02 rhdisk8
crw-rw----    1 grid     asmadmin     24,  7 Dec 20 20:02 rhdisk9

从这里基本上可以看出来,是由于磁盘头被重写了pvid,导致asm disk header 被破坏.进一步分析asm log,确定哪些磁盘被用作asm disk

SQL> CREATE DISKGROUP CRS NORMAL REDUNDANCY  DISK '/dev/rhdisk4',
'/dev/rhdisk5',
'/dev/rhdisk6' ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='1M' /* ASMCA */
NOTE: Assigning number (1,0) to disk (/dev/rhdisk4)
NOTE: Assigning number (1,1) to disk (/dev/rhdisk5)
NOTE: Assigning number (1,2) to disk (/dev/rhdisk6)
NOTE: initializing header on grp 1 disk CRS_0000
NOTE: initializing header on grp 1 disk CRS_0001
NOTE: initializing header on grp 1 disk CRS_0002
SQL> CREATE DISKGROUP DATA EXTERNAL REDUNDANCY  DISK
'/dev/rhdisk9' SIZE 614400M  ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='1M' /* ASMCA */
NOTE: Assigning number (2,0) to disk (/dev/rhdisk9)
NOTE: initializing header on grp 2 disk DATA_0000
SQL> CREATE DISKGROUP FBA EXTERNAL REDUNDANCY  DISK
'/dev/rhdisk8' SIZE 204800M  ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='1M' /* ASMCA */
NOTE: Assigning number (3,0) to disk (/dev/rhdisk8)
NOTE: initializing header on grp 3 disk FBA_0000
SQL> CREATE DISKGROUP ARCH EXTERNAL REDUNDANCY  DISK
'/dev/rhdisk7' SIZE 102400M  ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='1M' /* ASMCA */
NOTE: Assigning number (4,0) to disk (/dev/rhdisk7)
NOTE: initializing header on grp 4 disk ARCH_0000

这里可以确定asm disk为rhdisk[4-9],通过kfed分析全部和rhdisk4一样的问题,也符合lspv查询出来的结果,使用kfed repair修复asm disk header后

SQL> alter diskgroup data mount;
Diskgroup altered.
SQL> alter diskgroup fba mount;
Diskgroup altered.
SQL> alter diskgroup arch mount;
Diskgroup altered.
SQL> alter diskgroup crs mount;
Diskgroup altered.
SQL> select group_number,disk_number,path from v$asm_disk;
GROUP_NUMBER DISK_NUMBER PATH
------------ ----------- --------------------------------------------------
           2           0 /dev/rhdisk4
           2           1 /dev/rhdisk5
           2           2 /dev/rhdisk6
           1           0 /dev/rhdisk7
           4           0 /dev/rhdisk8
           3           0 /dev/rhdisk9
6 rows selected.
SQL> select group_number,name from v$asm_diskgroup;
GROUP_NUMBER NAME
------------ ------------------------------------------------------------
           1 ARCH
           2 CRS
           3 DATA
           4 FBA

这里证明通过kfed对磁盘头的修复,asm磁盘组已经全部mount成功,GI状态也恢复正常

[db2/#]crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCH.dg
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.CRS.dg
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.DATA.dg
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.FBA.dg
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.LISTENER.lsnr
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.asm
               ONLINE  ONLINE       db1                      Started
               ONLINE  ONLINE       db2                      Started
ora.gsd
               OFFLINE OFFLINE      db1
               OFFLINE OFFLINE      db2
ora.net1.network
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.ons
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.registry.acfs
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       db1
ora.cvu
      1        ONLINE  ONLINE       db1
ora.db1.vip
      1        ONLINE  ONLINE       db1
ora.db2.vip
      1        ONLINE  ONLINE       db2
ora.nkora.db
      1        ONLINE  ONLINE       db1                      Open
      2        ONLINE  ONLINE       db2                      Open
ora.oc4j
      1        ONLINE  ONLINE       db1
ora.scan1.vip
      1        ONLINE  ONLINE       db1

这里忽略了一个问题,在修复磁盘头之前没有清除pvid,导致磁盘头修复后,pvid依然存储在odm中

[db2/dev#]lspv
hdisk0          00f9733ef7cf27e9                    rootvg          active
hdisk1          00f9733e21b953e6                    rootvg          active
hdisk2          00f9733e21b97a83                    appvg           active
hdisk3          00f9733e21b98434                    appvg           active
hdisk4          00f9733d67553e0a                    None
hdisk5          00f9733d67553f31                    None
hdisk6          00f9733d67554011                    None
hdisk7          00f9733d67554165                    None
hdisk8          00f9733d675541e5                    None
hdisk9          00f9733d675542e4                    None
hdisk10         none                                None

通过分析发现fba磁盘组中无任何记录,使用该磁盘组进行直接清除pvid测试

$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.4.0 Production on Sun Dec 21 03:13:31 2014
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> alter diskgroup fba dismount;
Diskgroup altered.
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
$ exit
You have mail in /usr/spool/mail/root
[db2/#]chdev -l hdisk8 -a pv=clear
hdisk8 changed
[db2/#]lspv
hdisk0          00f9733ef7cf27e9                    rootvg          active
hdisk1          00f9733e21b953e6                    rootvg          active
hdisk2          00f9733e21b97a83                    appvg           active
hdisk3          00f9733e21b98434                    appvg           active
hdisk4          00f9733d67553e0a                    None
hdisk5          00f9733d67553f31                    None
hdisk6          00f9733d67554011                    None
hdisk7          00f9733d67554165                    None
hdisk8          none                                None
hdisk9          00f9733d675542e4                    None
hdisk10         none                                None
[db2/#]su - grid
$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.4.0 Production on Sun Dec 21 03:15:19 2014
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> alter diskgroup fba mount;
Diskgroup altered.
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

通过测试直接清除pvid asm 磁盘头依然工作正常,关闭GI,使用chdev清除hdisk[4-9]所有pvid,启动GI一切正常

[db1/#]crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCH.dg
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.CRS.dg
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.DATA.dg
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.FBA.dg
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.LISTENER.lsnr
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.asm
               ONLINE  ONLINE       db1                      Started
               ONLINE  ONLINE       db2                      Started
ora.gsd
               OFFLINE OFFLINE      db1
               OFFLINE OFFLINE      db2
ora.net1.network
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.ons
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.registry.acfs
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       db1
ora.cvu
      1        ONLINE  ONLINE       db1
ora.db1.vip
      1        ONLINE  ONLINE       db1
ora.db2.vip
      1        ONLINE  ONLINE       db2
ora.nkora.db
      1        ONLINE  ONLINE       db1                      Open
      2        ONLINE  ONLINE       db2                      Open
ora.oc4j
      1        ONLINE  ONLINE       db1
ora.scan1.vip
      1        ONLINE  ONLINE       db1
[db1/#]lspv
hdisk0          00f9733df7c7a9db                    rootvg          active
hdisk1          00f9733d21dad8fe                    rootvg          active
hdisk2          00f9733d21dbd08b                    appvg           active
hdisk3          00f9733d21dbd2ab                    appvg           active
hdisk4          none                                None
hdisk5          none                                None
hdisk6          none                                None
hdisk7          none                                None
hdisk8          none                                None
hdisk9          none                                None
hdisk10         none                                None

至此设置pvid导致asm disk header损坏的asm 恢复正常,实现数据0丢失。
温馨提示:aix asm disk磁盘中不能设置pvid,否则将会导致asm disk header 损坏,无法正常mount
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

asm disk header 彻底损坏恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:asm disk header 彻底损坏恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在asm 磁盘组不能mount的情况下,如果是磁盘头的少数部分损坏,或者是asm disk header存在,可以通过kfed修复,或者使用备份的磁盘头信息去恢复从而实现磁盘组mount来恢复数据库.如果没有备份也无法修复可以尝试使用amdu,dul来实现对不能mount的磁盘组进行恢复.在极端情况下(比如磁盘组完全丢失),amdu/dul都无论为力的情况下,可以考虑使用扫描磁盘找出来datafile 的方法求救数据的最后稻草.本实验大概的模拟了asm disk 前10M完全损坏的情况下数据库恢复
测试准备
创建新表空间,创建T_XIFENFEI测试表

SQL> create tablespace xifenfei datafile '+XIFENFEI' SIZE 50m;
Tablespace created.
SQL> CREATE TABLE T_XIFENFEI TABLESPACE XIFENFEI
  2  AS SELECT * FROM DBA_OBJECTS;
Table created.
SQL> SELECT COUNT(*) FROM T_XIFENFEI;
  COUNT(*)
----------
     50031
SQL> select ts#,rfile#,bytes/1024/1024,blocks,name from v$datafile;
       TS#     RFILE# BYTES/1024/1024     BLOCKS NAME
---------- ---------- --------------- ---------- --------------------------------------------------
         0          1             480      61440 +XIFENFEI/asm10g/datafile/system.256.845260203
         1          2              25       3200 +XIFENFEI/asm10g/datafile/undotbs1.258.845260205
         2          3             250      32000 +XIFENFEI/asm10g/datafile/sysaux.257.845260203
         4          4               5        640 +XIFENFEI/asm10g/datafile/users.259.845260205
         6          5              50       6400 +XIFENFEI/asm10g/datafile/xifenfei.266.845262139
SQL> select GROUP_NUMBER,DISK_NUMBER,STATE,TOTAL_MB,FREE_MB,NAME,path from  v$asm_disk;
GROUP_NUMBER DISK_NUMBER STATE      TOTAL_MB    FREE_MB NAME                 PATH
------------ ----------- -------- ---------- ---------- -------------------- ------------------
           1           0 NORMAL         2048          0 XIFENFEI_0000        /dev/raw/raw1
           1           1 NORMAL          784          0 XIFENFEI_0001        /dev/raw/raw2
           1           2 NORMAL         7059          0 XIFENFEI_0002        /dev/raw/raw3
--关闭数据库
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
--关闭ASM
SQL> shutdown immediate
ASM diskgroups dismounted
ASM instance shutdown

查看裸设备对应磁盘

[oracle@xifenfei dul]$ more /etc/sysconfig/rawdevices
/dev/raw/raw1   /dev/sdc
/dev/raw/raw2   /dev/sdd1
/dev/raw/raw3   /dev/sdd2

dd磁盘头
dd asm disk 前面10M,彻底破坏asm disk

[oracle@xifenfei ~]$ dd if=/dev/zero of=/dev/raw/raw1 bs=1M count=10 conv=notrunc
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.175424 seconds, 59.8 MB/s
[oracle@xifenfei ~]$ dd if=/dev/zero of=/dev/raw/raw2 bs=1M count=10 conv=notrunc
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.11584 seconds, 90.5 MB/s
[oracle@xifenfei ~]$ dd if=/dev/zero of=/dev/raw/raw3 bs=1M count=10 conv=notrunc
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.353435 seconds, 29.7 MB/s

kfed查看磁盘
确定所有asm disk header完全被破坏

[oracle@xifenfei dul]$ kfed read /dev/raw/raw1
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:                       0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
[oracle@xifenfei dul]$ kfed read /dev/raw/raw2
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:                       0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
[oracle@xifenfei dul]$ kfed read /dev/raw/raw3
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:                       0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000

amdu查看asm 磁盘

[oracle@xifenfei ~]$ amdu -diskstring '/dev/raw/raw*'
amdu_2014_04_18_23_17_17/
[oracle@xifenfei ~]$ cd amdu_2014_04_18_23_17_17
[oracle@xifenfei amdu_2014_04_18_23_17_17]$ ls
report.txt
[oracle@xifenfei amdu_2014_04_18_23_17_17]$ more report.txt
-*-amdu-*-
…………
--------------------------------- Operations ---------------------------------
------------------------------- Disk Selection -------------------------------
 -diskstring '/dev/raw/raw*'
------------------------------ Reading Control -------------------------------
------------------------------- Output Control -------------------------------
********************************* DISCOVERY **********************************
----------------------------- DISK REPORT N0001 ------------------------------
                Disk Path: /dev/raw/raw1
           Unique Disk ID:
               Disk Label:
     Physical Sector Size: 512 bytes
                Disk Size: 65536 megabytes
** NOT A VALID ASM DISK HEADER. BAD VALUE IN FIELD blksize_kfdhdb **
----------------------------- DISK REPORT N0002 ------------------------------
                Disk Path: /dev/raw/raw2
           Unique Disk ID:
               Disk Label:
     Physical Sector Size: 512 bytes
                Disk Size: 65536 megabytes
** NOT A VALID ASM DISK HEADER. BAD VALUE IN FIELD blksize_kfdhdb **
----------------------------- DISK REPORT N0003 ------------------------------
                Disk Path: /dev/raw/raw3
           Unique Disk ID:
               Disk Label:
     Physical Sector Size: 512 bytes
                Disk Size: 65536 megabytes
** NOT A VALID ASM DISK HEADER. BAD VALUE IN FIELD blksize_kfdhdb **
******************************* END OF REPORT ********************************

通过这里证明,当asm disk header 损坏严重之时,amdu无法识别,更加无法恢复相关数据库

dul查看完全损坏asm disk header
测试在asm disk header完全损坏情况下,dul是否还能够实现asm磁盘组中抽取数据,同理amdu也无法正常工作.

[oracle@xifenfei dul]$ ./dul
Data UnLoader: 10.2.0.5.28 - Internal Only - on Sat Apr 19 04:02:02 2014
with 64-bit io functions
Copyright (c) 1994 2014 Bernard van Duijnen All rights reserved.
 Strictly Oracle Internal Use Only
DUL: Warning: block 0 is not a disk header block
DUL: Error: Block is not in use
DUL: Error: Block type mismatch ( seen 0 expect 1) when parsing block 0 of disk /dev/raw/raw1
DUL: Warning: block 0 is not a disk header block
DUL: Error: Block is not in use
DUL: Error: Block type mismatch ( seen 0 expect 1) when parsing block 0 of disk /dev/raw/raw2
DUL: Warning: block 0 is not a disk header block
DUL: Error: Block is not in use
DUL: Error: Block type mismatch ( seen 0 expect 1) when parsing block 0 of disk /dev/raw/raw3

这里可以看出来,当asm disk header完全异常,dul也无法识别出来asm磁盘组(该情况下dul无法正常操作)

通过工具扫描磁盘抽取数据块

CPFL> scan disk  /dev/raw/raw1
Scanning  disk /dev/raw/raw1, at 2014-04-19 04:05:11
Completed  disk /dev/raw/raw1, at 2014-04-19 04:05:56
CPFL> scan  disk  /dev/raw/raw2
Scanning  disk /dev/raw/raw2, at 2014-04-19 04:05:56
Completed  disk /dev/raw/raw2, at 2014-04-19 04:06:15
CPFL> scan  disk  /dev/raw/raw3
Scanning  disk /dev/raw/raw3, at 2014-04-19 04:06:15
Completed  disk /dev/raw/raw3, at 2014-04-19 04:07:44
CPFL> list datafiles
 Tablespace: SYSTEM    File:    1  Blocks:      61440
 Tablespace: UNDOTBS1  File:    2  Blocks:       3200
 Tablespace: SYSAUX    File:    3  Blocks:      32000
 Tablespace: USERS     File:    4  Blocks:        640
 Tablespace: XIFENFEI  File:    5  Blocks:       6400
CPFL> copy datafile 1 to /u01/oracle/oradata/datafile/1.dbf
copy datafile start: 2014-04-19 04:10:35
copy datafile 1 have blocks 61440
copy datafile completed: 2014-04-19 04:11:18
CPFL> copy datafile 2  to /u01/oracle/oradata/datafile/2.dbf
copy datafile start: 2014-04-19 04:11:52
copy datafile 2 have blocks 3200
copy datafile completed: 2014-04-19 04:11:54
CPFL>  copy datafile 3  to /u01/oracle/oradata/datafile/3.dbf
copy datafile start: 2014-04-19 04:12:03
copy datafile 3 have blocks 32000
copy datafile completed: 2014-04-19 04:12:27
CPFL>  copy datafile 4  to /u01/oracle/oradata/datafile/4.dbf
copy datafile start: 2014-04-19 04:13:07
copy datafile 4 have blocks 640
copy datafile completed: 2014-04-19 04:13:08
CPFL> copy datafile 5 to /u01/oracle/oradata/datafile/5.dbf
copy datafile start: 2014-04-19 04:13:18
copy datafile 5 have blocks 6400
copy datafile completed: 2014-04-19 04:13:19

查看使用工具抽取数据文件

[oracle@xifenfei datafile]$ ls -l
total 830320
-rw-r--r-- 1 oracle oinstall 503324672 Apr 19 04:34 1.dbf
-rw-r--r-- 1 oracle oinstall  26222592 Apr 19 04:34 2.dbf
-rw-r--r-- 1 oracle oinstall 262152192 Apr 19 04:34 3.dbf
-rw-r--r-- 1 oracle oinstall   5251072 Apr 19 04:34 4.dbf
-rw-r--r-- 1 oracle oinstall  52436992 Apr 19 04:34 5.dbf

dul验证抽取文件

[oracle@xifenfei dul]$ ./dul
Data UnLoader: 10.2.0.5.28 - Internal Only - on Sat Apr 19 06:56:09 2014
with 64-bit io functions
Copyright (c) 1994 2014 Bernard van Duijnen All rights reserved.
 Strictly Oracle Internal Use Only
DUL: Warning: Recreating file "dul.log"
Found db_id = 181793355
Found db_name = ASM10G
DUL> show datafiles;
ts# rf# start   blocks offs open  err file name
  0   1     0    61440    0    1    0 /u01/oracle/oradata/datafile/1.dbf
  1   2     0     3200    0    1    0 /u01/oracle/oradata/datafile/2.dbf
  2   3     0    32000    0    1    0 /u01/oracle/oradata/datafile/3.dbf
  4   4     0      640    0    1    0 /u01/oracle/oradata/datafile/4.dbf
  6   5     0     6400    0    1    0 /u01/oracle/oradata/datafile/5.dbf
DUL> bootstrap;
Probing file = 1, block = 377
. unloading table                BOOTSTRAP$
DUL: Warning: block number is non zero but marked deferred trying to process it anyhow
      57 rows unloaded
DUL: Warning: Dictionary cache DC_BOOTSTRAP is empty
Reading BOOTSTRAP.dat 57 entries loaded
Parsing Bootstrap$ contents
DUL: Warning: Recreating file "dict.ddl"
Generating dict.ddl for version 10
 OBJ$: segobjno 18, file 1 block 121
 TAB$: segobjno 2, tabno 1, file 1  block 25
 COL$: segobjno 2, tabno 5, file 1  block 25
 USER$: segobjno 10, tabno 1, file 1  block 89
Running generated file "@dict.ddl" to unload the dictionary tables
. unloading table                      OBJ$   51171 rows unloaded
. unloading table                      TAB$    1576 rows unloaded
. unloading table                      COL$   55264 rows unloaded
. unloading table                     USER$      59 rows unloaded
Reading USER.dat 59 entries loaded
Reading OBJ.dat 51171 entries loaded and sorted 51171 entries
Reading TAB.dat 1576 entries loaded
Reading COL.dat 55264 entries loaded and sorted 55264 entries
Reading BOOTSTRAP.dat 57 entries loaded
DUL: Warning: Recreating file "dict.ddl"
Generating dict.ddl for version 10
 OBJ$: segobjno 18, file 1 block 121
 TAB$: segobjno 2, tabno 1, file 1  block 25
 COL$: segobjno 2, tabno 5, file 1  block 25
 USER$: segobjno 10, tabno 1, file 1  block 89
 TABPART$: segobjno 266, file 1 block 2121
 INDPART$: segobjno 271, file 1 block 2161
 TABCOMPART$: segobjno 288, file 1 block 2297
 INDCOMPART$: segobjno 293, file 1 block 2345
 TABSUBPART$: segobjno 278, file 1 block 2217
 INDSUBPART$: segobjno 283, file 1 block 2257
 IND$: segobjno 2, tabno 3, file 1  block 25
 ICOL$: segobjno 2, tabno 4, file 1  block 25
 LOB$: segobjno 2, tabno 6, file 1  block 25
 COLTYPE$: segobjno 2, tabno 7, file 1  block 25
 TYPE$: segobjno 181, tabno 1, file 1  block 1297
 COLLECTION$: segobjno 181, tabno 2, file 1  block 1297
 ATTRIBUTE$: segobjno 181, tabno 3, file 1  block 1297
 LOBFRAG$: segobjno 299, file 1 block 2393
 LOBCOMPPART$: segobjno 302, file 1 block 2425
 UNDO$: segobjno 15, file 1 block 105
 TS$: segobjno 6, tabno 2, file 1  block 57
 PROPS$: segobjno 96, file 1 block 721
Running generated file "@dict.ddl" to unload the dictionary tables
. unloading table                      OBJ$
DUL: Warning: Recreating file "OBJ.ctl"
   51171 rows unloaded
. unloading table                      TAB$
DUL: Warning: Recreating file "TAB.ctl"
    1576 rows unloaded
. unloading table                      COL$
DUL: Warning: Recreating file "COL.ctl"
   55264 rows unloaded
. unloading table                     USER$
DUL: Warning: Recreating file "USER.ctl"
      59 rows unloaded
. unloading table                  TABPART$      72 rows unloaded
. unloading table                  INDPART$      80 rows unloaded
. unloading table               TABCOMPART$       0 rows unloaded
. unloading table               INDCOMPART$       0 rows unloaded
. unloading table               TABSUBPART$       0 rows unloaded
. unloading table               INDSUBPART$       0 rows unloaded
. unloading table                      IND$    2231 rows unloaded
. unloading table                     ICOL$    3650 rows unloaded
. unloading table                      LOB$     530 rows unloaded
. unloading table                  COLTYPE$    1701 rows unloaded
. unloading table                     TYPE$    1945 rows unloaded
. unloading table               COLLECTION$     555 rows unloaded
. unloading table                ATTRIBUTE$    7275 rows unloaded
. unloading table                  LOBFRAG$       1 row  unloaded
. unloading table              LOBCOMPPART$       0 rows unloaded
. unloading table                     UNDO$      21 rows unloaded
. unloading table                       TS$       7 rows unloaded
. unloading table                    PROPS$      28 rows unloaded
Reading USER.dat 59 entries loaded
Reading OBJ.dat 51171 entries loaded and sorted 51171 entries
Reading TAB.dat 1576 entries loaded
Reading COL.dat 55264 entries loaded and sorted 55264 entries
Reading TABPART.dat 72 entries loaded and sorted 72 entries
Reading TABCOMPART.dat 0 entries loaded and sorted 0 entries
Reading TABSUBPART.dat 0 entries loaded and sorted 0 entries
Reading INDPART.dat 80 entries loaded and sorted 80 entries
Reading INDCOMPART.dat 0 entries loaded and sorted 0 entries
Reading INDSUBPART.dat 0 entries loaded and sorted 0 entries
Reading IND.dat 2231 entries loaded
Reading LOB.dat 530 entries loaded
Reading ICOL.dat 3650 entries loaded
Reading COLTYPE.dat 1701 entries loaded
Reading TYPE.dat 1945 entries loaded
Reading ATTRIBUTE.dat 7275 entries loaded
Reading COLLECTION.dat 555 entries loaded
Reading BOOTSTRAP.dat 57 entries loaded
Reading LOBFRAG.dat 1 entries loaded and sorted 1 entries
Reading LOBCOMPPART.dat 0 entries loaded and sorted 0 entries
Reading UNDO.dat 21 entries loaded
Reading TS.dat 7 entries loaded
Reading PROPS.dat 28 entries loaded
Database character set is ZHS16GBK
Database national character set is AL16UTF16
DUL> unload table sys.t_xifenfei;
. unloading table                T_XIFENFEI   50031 rows unloaded

通过这里可以发现,我们创建测试数据为50031条,dul读取抽取出来数据文件中对应表数据条数也为50031条;证明:在asm disk header完全损坏情况下,amdu,dul无法直接恢复asm里面数据库,但是可以通过工具扫描数据文件,找出来磁盘中的datafile block实现完整恢复数据[只要你的asm中的数据没有覆盖,都可以通过该方法恢复]

如果你在使用这些思路进行恢复遇到突发情况不能自行解决,请联系我们(ORACLE数据库恢复技术支持),将为您提供专业数据库技术支持:
Phone:17813235971    Q Q:107644445    E-Mail:dba@xifenfei.com

ORACLE 12C ASM 新特性:共享密码文件

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORACLE 12C ASM 新特性:共享密码文件

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在ORACLE 12C之前大家都知道密码文件是存放在?/dbs或者?/database中,如果要修改修改sysdba权限的用户密码时候,会去修改密码文件,而在rac数据库的sys密码文件是存在各个节点中,这个时候修改sysdba权限的密码就需要在两个节点都要做同样的操作,而对于数据库来说本身是只要在一个节点上修改即可,因为密码是记录在user$中,就是因为密码文件非共享且在各个节点中都有,因此需要在各个节点均要执行修改密码命令,确保密码文件被正常修改。因为rac 密码文件非共享的机制存在,导致修改sysdba权限密码繁琐,有些时候甚至有节点忘记修改,导致需要使用密码文件操作数据库的时候不能正常进行,DG传输日志异常等故障。在ORACLE 12C中为了解决这个问题,引入了密码文件可以存入ASM新特性,从而使得密码文件存储在ASM中实现所有节点共享,从而解决该问题.
ASM存储密码文件前提条件 COMPATIBLE.ASM>= 12.1
查询ASM信息

SQL>  select * from v$version;
BANNER                                                                               CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production              0
PL/SQL Release 12.1.0.1.0 - Production                                                    0
CORE    12.1.0.1.0      Production                                                        0
TNS for Linux: Version 12.1.0.1.0 - Production                                            0
NLSRTL Version 12.1.0.1.0 - Production                                                    0
SQL> select NAME,COMPATIBILITY from v$asm_diskgroup;
NAME                           COMPATIBILITY
------------------------------ ------------------------------------------------------------
DATA                           12.1.0.0.0

查询crs中关于db配置

[grid@xifenfei ~]$  srvctl config database -d cdb
Database unique name: cdb
Database name: cdb
Oracle home: /u01/app/oracle/product/12.1/db_1
Oracle user: oracle
Spfile: +DATA/cdb/spfilecdb.ora
Password file:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: MANUAL
Database instance: cdb
Disk Groups: DATA
Services:

这里db的password file为空,即表示使用默认值,也就是为$ORACLE_HOME/dbs/orapwxifenfei

创建密码文件存储在ASM中

--创建db新密码文件
[oracle@xifenfei ~]$ orapwd file='+data/CDB/orapwdxifenfei' dbuniquename='cdb'
Enter password for SYS:
----输入sys用户密码
--创建asm新密码文件
orapwd file='+data/ASM/orapwasm' asm=y
----asm=y 表示创建的密码文件为asm的
--使用老密码文件创建db/asm新密码文件
orapwd input_file='/oraclegrid/dbs/orapwasm' file='+data/ASM/orapwasm' [asm=y]
----input_file 表示使用老的密码文件创建新的存储在ASM中的密码文件

查看ASM中密码文件

ASMCMD> showversion
ASM version         : 12.1.0.1.0
ASMCMD> pwd
+data/cdb
ASMCMD>  ls -l orapwdxifenfei
Type      Redund  Striped  Time             Sys  Name
PASSWORD  UNPROT  COARSE   MAY 31 19:00:00  N    orapwdxifenfei => +DATA/CDB/PASSWORD/pwdcdb.290.816897265

配置crs中password file项

[grid@xifenfei ~]$ srvctl modify database -db cdb -pwfile  +data/CDB/orapwdxifenfei

查询crs中关于db配置

[grid@xifenfei ~]$  srvctl config database -d cdb
Database unique name: cdb
Database name: cdb
Oracle home: /u01/app/oracle/product/12.1/db_1
Oracle user: oracle
Spfile: +DATA/cdb/spfilecdb.ora
Password file: +data/CDB/orapwdxifenfei
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: MANUAL
Database instance: cdb
Disk Groups: DATA
Services:

至此数据库启动使用密码ASM中的密码文件完成,补充说明,该方式配置在ASM中的密码文件,只能是通过crs方式启动db才会生效,如果手工使用sqlplus启动数据库不会使用该密码文件,还是使用默认密码文件。这里也就提醒大家操作规范:在RAC环境(包含单节点的GI环境)中,对数据库的启动关闭操作强烈建议使用crs相关命令来完成,而不推荐使用sqlplus命令

监控asm disk磁盘性能

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:监控asm disk磁盘性能

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

使用ASM的朋友估计都有一个困惑,ASM就是一个黑盒子,怎么才能够做到类似如裸设备或者文件系统一样,通过系统的命令(iostat)来监控其磁盘IO的运行性能.其实ORACLE在设计ASM的过程中,也就考虑到了这个需求,把磁盘相关的情况都记录到了ASM相关视图中v$asm_disk和v$asm_disk_stat(这两个视图功能相同,只是查询v$asm_disk需要每次访问磁盘头获取数据,v$asm_disk_stat是磁盘头存储在内存中的数据,查询v$asm_disk_stat对磁盘影响非常小),所以我们可以通过查询v$asm_disk_stat中的数据,然后做减法就可以获得asm disk某个时间段的磁盘io性能情况.ORACLE提供了相关工具叫做asmiostat用来监控,具体可以参考ASMIOSTAT Script to collect iostats for ASM disks [ID 437996.1]

确保TIMED_STATISTICS=TRUE
虽然是默认值,多检查无错,因为到该值为false之时READ_TIME/WRITE_TIME为0

[grid@xifenfei tmp]$ sqlplus / as sysdba
SQL*Plus: Release 12.1.0.1.0 Production on Fri Feb 1 08:29:01 2013
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Automatic Storage Management option
SQL> show parameter TIMED_STATISTICS
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
timed_statistics                     boolean     TRUE

asmiostat使用

[grid@xifenfei tmp]$ ./asmiostat.sh help=y
Invalid parameter: <interval> must be > 0; <count> must be >= 0
./asmiostat.sh [-s ASM ORACLE_SID] [-h ASM ORACLE_HOME] [-g diskgroup] [<interval>] [<count>]
Output:
  DiskPath - Path to ASM disk
  DiskName - ASM disk name
  Gr       - ASM disk group number
  Dsk      - ASM disk number
  Reads    - Reads
  Writes   - Writes
  AvRdTm   - Average read time (in msec)
  AvWrTm   - Average write time (in msec)
  KBRd     - Kilobytes read
  KBWr     - Kilobytes written
  AvRdSz   - Average read size (in bytes)
  AvWrSz   - Average write size (in bytes)
  RdEr     - Read errors
  WrEr     - Write errors

相关值说明

  DiskPath - Path to ASM disk
  DiskName - ASM disk name
  Gr       - ASM disk group number
  Dsk      - ASM disk number
  Reads    - 指定时间内I/O读请求次数
  Writes   - 指定时间内I/O写请求次数
  AvRdTm   - 平均每次I/O读请求所需时间 (in msec)
  AvWrTm   - 平均每次I/O写请求所需时间 (in msec)
  KBRd     - 指定时间内读操作的量(KB)
  KBWr     - 指定时间内写操作的量(KB)
  AvRdSz   - 平均每次I/O读请求得到的数据量(B)
  AvWrSz   - 平均每次I/O写请求得到的数据量(B)
  RdEr     - 指定时间内I/O读请求错误次数
  WrEr     - 指定时间内I/O写请求错误次数

asmiostat效果展示

[grid@xifenfei tmp]$ ./asmiostat.sh -s $ORACLE_SID -h $ORACLE_HOME -g DATA 1 3
Date: Fri Feb  1 08:31:45 CST 2013    Interval: 1 secs    Disk Group: DATA
DiskPath - DiskName                      Gr Dsk    Reads   Writes AvRdTm AvWrTm     KBRd     KBWr  AvRdSz  AvWrSz RdEr WrEr
/dev/sdb - DATA_0000                      1   0        0        0    0.0    0.0        0        0       0       0    0    0
Date: Fri Feb  1 08:31:47 CST 2013    Interval: 1 secs    Disk Group: DATA
DiskPath - DiskName                      Gr Dsk    Reads   Writes AvRdTm AvWrTm     KBRd     KBWr  AvRdSz  AvWrSz RdEr WrEr
/dev/sdb - DATA_0000                      1   0        4        3    0.6 1006.1        0        0       0       0    0    0
Date: Fri Feb  1 08:31:49 CST 2013    Interval: 1 secs    Disk Group: DATA
DiskPath - DiskName                      Gr Dsk    Reads   Writes AvRdTm AvWrTm     KBRd     KBWr  AvRdSz  AvWrSz RdEr WrEr
/dev/sdb - DATA_0000                      1   0        8        2    1.3    1.5        0        0       0       0    0    0

asmiostat下载

ASM中磁盘组权限设置

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ASM中磁盘组权限设置

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

aix平台11gr2单库使用使用grid和oracle用户分别部署gi和db,在添加磁盘的时候,使用设置磁盘所属用户和组为grid与oinstall,设置权限为755.添加磁盘成功后,数据库直接crash.
asm添加磁盘操作

SQL>  alter diskgroup DATA add disk '/dev/rhdisk15'
NOTE: Assigning number (2,7) to disk (/dev/rhdisk15)
NOTE: requesting all-instance membership refresh for group=2
NOTE: initializing header on grp 2 disk DATA_0007
NOTE: requesting all-instance disk validation for group=2
Wed Apr 03 22:09:03 2013
NOTE: skipping rediscovery for group 2/0xa026f7ec (DATA) on local instance.
NOTE: requesting all-instance disk validation for group=2
NOTE: skipping rediscovery for group 2/0xa026f7ec (DATA) on local instance.
NOTE: initiating PST update: grp = 2
Wed Apr 03 22:09:03 2013
GMON updating group 2 at 21 for pid 17, osid 22610284
NOTE: PST update grp = 2 completed successfully
NOTE: membership refresh pending for group 2/0xa026f7ec (DATA)
GMON querying group 2 at 22 for pid 13, osid 20643916
NOTE: cache opening disk 7 of grp 2: DWDATAGRP_0007 path:/dev/rhdisk15
GMON querying group 2 at 23 for pid 13, osid 20643916
SUCCESS: refreshed membership for 2/0xa026f7ec (DATA)
NOTE: starting rebalance of group 2/0xa026f7ec (DATA) at power 1
SUCCESS:  alter diskgroup DATA add disk '/dev/rhdisk15'
Starting background process ARB0
Wed Apr 03 22:09:07 2013
ARB0 started with pid=22, OS id=14155890
NOTE: assigning ARB0 to group 2/0xa026f7ec (DATA) with 1 parallel I/O
NOTE: Attempting voting file refresh on diskgroup DATA
Wed Apr 03 22:09:19 2013
SQL>  alter diskgroup DATA add disk '/dev/rhdisk11'
Wed Apr 03 22:09:20 2013
NOTE: stopping process ARB0
NOTE: rebalance interrupted for group 2/0xa026f7ec (DATA)
NOTE: Assigning number (2,8) to disk (/dev/rhdisk11)
NOTE: requesting all-instance membership refresh for group=2
NOTE: initializing header on grp 2 disk DATA_0008
NOTE: requesting all-instance disk validation for group=2
NOTE: skipping rediscovery for group 2/0xa026f7ec (DATA) on local instance.
NOTE: requesting all-instance disk validation for group=2
NOTE: skipping rediscovery for group 2/0xa026f7ec (DATA) on local instance.
NOTE: initiating PST update: grp = 2
Wed Apr 03 22:09:23 2013
GMON updating group 2 at 24 for pid 17, osid 22610284
NOTE: PST update grp = 2 completed successfully
NOTE: membership refresh pending for group 2/0xa026f7ec (DATA)
GMON querying group 2 at 25 for pid 13, osid 20643916
NOTE: cache opening disk 8 of grp 2: DATA_0008 path:/dev/rhdisk11
GMON querying group 2 at 26 for pid 13, osid 20643916
SUCCESS: refreshed membership for 2/0xa026f7ec (DATA)
NOTE: starting rebalance of group 2/0xa026f7ec (DATA) at power 1
SUCCESS:  alter diskgroup DATA add disk '/dev/rhdisk11'
Starting background process ARB0
Wed Apr 03 22:09:26 2013
ARB0 started with pid=22, OS id=22872116
NOTE: assigning ARB0 to group 2/0xa026f7ec (DATA) with 1 parallel I/O
NOTE: Attempting voting file refresh on diskgroup DATA
Wed Apr 03 22:14:41 2013
NOTE: ASM client xifenfei:xifenfei disconnected unexpectedly.
NOTE: check client alert log.
NOTE: Trace records dumped in trace file /u01/diag/asm/+asm/+ASM/trace/+ASM_ora_15073468.trc
Wed Apr 03 22:16:53 2013
NOTE: client xifenfei:xifenfei registered, osid 20709378, mbr 0x0
Wed Apr 03 22:20:33 2013
NOTE: client xifenfei:xifenfei deregistered

这里可看到增加磁盘操作正常并且开始做rebalance,但是也看到关于client xifenfei异常断开连接(本质就是数据库crash)

crash时的alert日志

Wed Apr 03 22:00:00 2013
Setting Resource Manager plan SCHEDULER[0x318B]:DEFAULT_MAINTENANCE_PLAN via scheduler window
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Wed Apr 03 22:00:00 2013
Starting background process VKRM
Wed Apr 03 22:00:00 2013
VKRM started with pid=31, OS id=22413426
Wed Apr 03 22:09:06 2013
ORA-15025: could not open disk "/dev/rhdisk15"
ORA-27041: unable to open file
IBM AIX RISC System/6000 Error: 13: Permission denied
Additional information: 11
Wed Apr 03 22:09:06 2013
SUCCESS: disk DATA_0007 (7.2092304189) added to diskgroup DATA
Wed Apr 03 22:09:26 2013
ORA-15025: could not open disk "/dev/rhdisk15"
ORA-27041: unable to open file
IBM AIX RISC System/6000 Error: 13: Permission denied
Additional information: 11
Wed Apr 03 22:09:26 2013
SUCCESS: disk DATA_0008 (8.2092304190) added to diskgroup DATA
Wed Apr 03 22:14:40 2013
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_dbw0_17367438.trc:
ORA-15080: synchronous I/O operation to a disk failed
WARNING: failed to write mirror side 1 of virtual extent 1 logical extent 0 of file 261 in
group 2 on disk 7 allocation unit 464
KCF: read, write or open error, block=0x6a online=1
        file=1 '+DATA/xifenfei/datafile/system.261.788373447'
        error=15081 txt: ''
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_dbw0_17367438.trc:
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_dbw0_17367438.trc:
ORA-63999: data file suffered media failure
ORA-01114: IO error writing block to file 1 (block # 106)
ORA-01110: data file 1: '+DATA/xifenfei/datafile/system.261.788373447'
ORA-15081: failed to submit an I/O operation to a disk
ORA-15081: failed to submit an I/O operation to a disk
DBW0 (ospid: 17367438): terminating the instance due to error 63999

这里可以看到数据库异常crash是因为/dev/rhdisk15没有权限去操作该文件,导致dbw0进程异常,从而出现该数据库crash

尝试重启数据库(asm重启正常)

SQL> startup
ORACLE instance started.
Total System Global Area 1.2827E+10 bytes
Fixed Size                  2233480 bytes
Variable Size            1711278968 bytes
Database Buffers         1.1073E+10 bytes
Redo Buffers               40894464 bytes
Database mounted.
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: '+DATA/xifenfei/datafile/system.261.788373447'

这里提示file 1需要恢复,查看alert日志,出现以下错误

Completed: ALTER DATABASE   MOUNT
Wed Apr 03 22:17:02 2013
ALTER DATABASE OPEN
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc:
ORA-27041: unable to open file
IBM AIX RISC System/6000 Error: 13: Permission denied
Additional information: 3
Additional information: 4
Additional information: 4194306
WARNING: Write Failed. group:2 disk:8 AU:462 offset:16384 size:16384
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc:
ORA-15080: synchronous I/O operation to a disk failed
WARNING: failed to write mirror side 1 of virtual extent 0 logical extent 0 of file 261 in
group 2 on disk 8 allocation unit 462
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc:
ORA-27041: unable to open file
IBM AIX RISC System/6000 Error: 13: Permission denied
Additional information: 3
Additional information: 4
Additional information: 4194306
WARNING: Write Failed. group:2 disk:8 AU:690 offset:16384 size:16384
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc:
ORA-27041: unable to open file
IBM AIX RISC System/6000 Error: 13: Permission denied
Additional information: 3
Additional information: 4
Additional information: 4194306
WARNING: Write Failed. group:2 disk:8 AU:918 offset:16384 size:16384
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc:
ORA-15080: synchronous I/O operation to a disk failed
WARNING: failed to write mirror side 1 of virtual extent 0 logical extent 0 of file 263 in
group 2 on disk 8 allocation unit 918
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc:
ORA-15080: synchronous I/O operation to a disk failed
WARNING: failed to write mirror side 1 of virtual extent 0 logical extent 0 of file 262 in
group 2 on disk 8 allocation unit 690
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc:
ORA-01110: data file 3: '+DATA/xifenfei/datafile/undotbs1.263.788373475'
ORA-01114: IO error writing block to file 3 (block # 1)
ORA-15081: failed to submit an I/O operation to a disk
ORA-15081: failed to submit an I/O operation to a disk
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_11534798.trc:
ORA-01110: data file 2: '+DATA/xifenfei/datafile/sysaux.262.788373463'
ORA-01114: IO error writing block to file 2 (block # 1)
ORA-15081: failed to submit an I/O operation to a disk
ORA-15081: failed to submit an I/O operation to a disk

recover database 操作

SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-01201: file 1 header failed to write correctly
Wed Apr 03 22:18:49 2013
ALTER DATABASE RECOVER  database
Media Recovery Start
 started logmerger process
Wed Apr 03 22:18:50 2013
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_pr00_12714126.trc:
ORA-27041: unable to open file
IBM AIX RISC System/6000 Error: 13: Permission denied
Additional information: 3
Additional information: 4
Additional information: 4194306
WARNING: Write Failed. group:2 disk:8 AU:462 offset:16384 size:16384
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_pr00_12714126.trc:
ORA-27041: unable to open file
IBM AIX RISC System/6000 Error: 13: Permission denied
Additional information: 3
Additional information: 4
Additional information: 4194306

依然是这里的提示依然是因为磁盘无读写权限从而出现数据库无法写数据文件问题,修改刚刚加入的磁盘文件权限问为660(4读2写1执行),表明与oinstall相同组的oracle用户对该磁盘也有读写权限.
这个事故是一个很简单,而且随着11g中asm使用grid和oracle用户的客户越来越多,相关的事故也越来越多,因为大多数使用人习惯直接给某个文件授权为755,而在这样的grid和oracle分开安装的系统中,将出现增加磁盘后,数据库crash,而且不能起来(因为oracle用户对磁盘只有读权限,无写权限),一种比较好的规范:在11gr2的asm系统中(grid和oracle用户),建议设置磁盘为grid.oinstall,权限设置为660

asmlib异常报ORA-00600[kfklLibFetchNext00]

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:asmlib异常报ORA-00600[kfklLibFetchNext00]

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一个朋友的历史库出现故障,在linux 4的平台上asm的10.2.0.1的单库,asm使用asmlib来处理。
asm不能正常mount磁盘组,可以看到asmdisk,alert日志报ORA-00600[kfklLibFetchNext00]
操作系统内核是:2.6.9-78
oracleasmlib是:2.0.2-1
asm磁盘组mount失败

--以前故障
SQL> ALTER DISKGROUP ALL MOUNT
Thu Sep  6 14:23:16 2012
NOTE: cache registered group DGARC number=1 incarn=0x2bf96274
NOTE: cache registered group DGDATA number=2 incarn=0x2c196275
NOTE: cache registered group DGSYS number=3 incarn=0x2c196276
Thu Sep  6 14:23:16 2012
Errors in file /opt/app/oracle/admin/+ASM/bdump/+asm_rbal_10204.trc:
ORA-15183: ASMLIB initialization error [driver/agent not installed]
Thu Sep  6 14:23:16 2012
Errors in file /opt/app/oracle/admin/+ASM/bdump/+asm_rbal_10204.trc:
ORA-15183: ASMLIB initialization error [/opt/oracle/extapi/64/asm/orcl/1/libasm.so]
ORA-15183: ASMLIB initialization error [driver/agent not installed]
Thu Sep  6 14:23:16 2012
ERROR: no PST quorum in group 1: required 2, found 0
Thu Sep  6 14:23:16 2012
NOTE: cache dismounting group 1/0x2BF96274 (DGARC)
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DGARC was not mounted
Thu Sep  6 14:23:16 2012
ERROR: no PST quorum in group 2: required 2, found 0
Thu Sep  6 14:23:16 2012
NOTE: cache dismounting group 2/0x2C196275 (DGDATA)
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DGDATA was not mounted
Thu Sep  6 14:23:16 2012
ERROR: no PST quorum in group 3: required 2, found 0
Thu Sep  6 14:23:16 2012
NOTE: cache dismounting group 3/0x2C196276 (DGSYS)
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DGSYS was not mounted
--现在故障
Thu Jan 24 13:49:45 2013
SQL> ALTER DISKGROUP ALL MOUNT
Thu Jan 24 13:49:45 2013
NOTE: cache registered group DGARC number=1 incarn=0xf388cee9
NOTE: cache registered group DGDATA number=2 incarn=0xf3a8ceea
NOTE: cache registered group DGSYS number=3 incarn=0xf3a8ceeb
Thu Jan 24 13:49:45 2013
Errors in file /opt/app/oracle/admin/+ASM/bdump/+asm_rbal_13449.trc:
ORA-00600: internal error code, arguments: [kfklLibFetchNext00],
[18446744073709551614], [0], [], [], [], [], []
Thu Jan 24 13:49:46 2013
Errors in file /opt/app/oracle/admin/+ASM/bdump/+asm_rbal_13449.trc:
ORA-00600: internal error code, arguments: [kfklLibFetchNext00],
[18446744073709551614], [0], [], [], [], [], []
Thu Jan 24 13:49:46 2013
ERROR: no PST quorum in group 1: required 2, found 0
Thu Jan 24 13:49:46 2013
NOTE: cache dismounting group 1/0xF388CEE9 (DGARC)
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DGARC was not mounted
Thu Jan 24 13:49:46 2013
ERROR: no PST quorum in group 2: required 2, found 0
Thu Jan 24 13:49:46 2013
NOTE: cache dismounting group 2/0xF3A8CEEA (DGDATA)
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DGDATA was not mounted
Thu Jan 24 13:49:46 2013
ERROR: no PST quorum in group 3: required 2, found 0
Thu Jan 24 13:49:46 2013
NOTE: cache dismounting group 3/0xF3A8CEEB (DGSYS)
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DGSYS was not mounted
Shutting down instance: further logons disabled

trace文件信息

----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000001 ?
ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000001 ?
ksfdmp()+21          call     ksedmp()             000000003 ? 000000001 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000001 ?
kgerinv()+161        call     ksfdmp()             000000003 ? 000000001 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000001 ?
kgesinv()+33         call     kgerinv()            006469D40 ? 0064E1C58 ?
                                                   000000000 ? 000000000 ?
                                                   000000001 ? 000000001 ?
kgesinw()+166        call     kgesinv()            006469D40 ? 0064E1C58 ?
                                                   000000000 ? 000000000 ?
                                                   000000001 ? 000000001 ?
kfklLibScanNext()+2  call     kgesinw()            006469D40 ? 000000000 ?
39                                                 000000001 ? 000000000 ?
                                                   FFFFFFFFFFFFFFFE ?
                                                   000000000 ?
kfkLibFetchNext()+3  call     kfklLibScanNext()    0064DDD70 ? 7FBFFFDCD0 ?
43                                                 000000001 ? 000000000 ?
                                                   FFFFFFFFFFFFFFFE ?
                                                   000000000 ?
kfuitrnInit()+524    call     kfkLibFetchNext()    006469D40 ? 2A971DFF90 ?
                                                   000000001 ? 000000000 ?
                                                   FFFFFFFFFFFFFFFE ?
                                                   000000000 ?
kfkLibIterInit()+18  call     kfuitrnInit()        006469D40 ? 2A971DFCB0 ?
0                                                  2A971DFF90 ? 000000009 ?
                                                   000000009 ? 000000000 ?
kfkLoadAllLibs()+36  call     kfkLibIterInit()     000000000 ? 00646C7E0 ?
3                                                  2A971DFF90 ? 000000009 ?
                                                   000000009 ? 000000000 ?
kfkDiscoverString()  call     kfkLoadAllLibs()     000000000 ? 00646C7E0 ?
+107                                               2A971DFF90 ? 000000009 ?
                                                   000000009 ? 000000000 ?
Cannot find symbol
Cannot find symbol
Cannot find symbol
kfdDiscoverString()  call     kfkDiscoverString()  067A53768 ? 00646C7E0 ?
+28                                                2A971DFF90 ? 000000009 ?
                                                   000000009 ? 000000000 ?
kfdDiscoverShallow(  call     kfdDiscoverString()  067A53768 ? 000000000 ?
)+315                                              2A971DFF90 ? 000000009 ?
                                                   000000009 ? 000000000 ?
kfgbDriver()+1174    call     kfdDiscoverShallow(  000000180 ? 000000000 ?
                              )                    2A971DFF90 ? 000000009 ?
                                                   000000009 ? 000000000 ?
ksbabs()+564         call     kfgbDriver()         7FBFFFE5C0 ? 000000048 ?
                                                   000000000 ? 000000009 ?
                                                   000000009 ? 000000000 ?
ksbrdp()+727         call     ksbabs()             7FBFFFE5C0 ? 000000048 ?
                                                   000000000 ? 000000009 ?
                                                   000000009 ? 000000000 ?
opirip()+616         call     ksbrdp()             7FBFFFE5C0 ? 000000048 ?
                                                   000000001 ? 06002C770 ?
                                                   000000009 ? 000000000 ?
opidrv()+582         call     opirip()             000000032 ? 000000004 ?
                                                   7FBFFFF6C8 ? 06002C770 ?
                                                   000000009 ? 000000000 ?
sou2o()+114          call     opidrv()             000000032 ? 000000004 ?
                                                   7FBFFFF6C8 ? 06002C770 ?
                                                   000000009 ? 000000000 ?
opimai_real()+317    call     sou2o()              7FBFFFF6A0 ? 000000032 ?
                                                   000000004 ? 7FBFFFF6C8 ?
                                                   000000009 ? 000000000 ?
main()+116           call     opimai_real()        000000003 ? 7FBFFFF730 ?
                                                   000000004 ? 7FBFFFF6C8 ?
                                                   000000009 ? 000000000 ?
<0x3c9fb1c40b>       call     main()               000000003 ? 7FBFFFF730 ?
                                                   000000004 ? 7FBFFFF6C8 ?
                                                   000000009 ? 000000000 ?
--------------------- Binary Stack Dump ---------------------

因为客户的库是一个历史库,基本上不怎么使用,在2012年启动asm就出现了ORA-15183错误,然后在2013年重启机器后,再次启动asm就出现了ORA-00600[kfklLibFetchNext00]错误,通过2012年的错误提示,我们大概可以判断出来该问题和ASMLIB有关系,查询mos发现429945.1,发现Call Stack Trace完全一致,可以定位是该问题(如果想深入分析,可以通过strace继续分析)

ORA-600: [kfklLibFetchNext00], [18446744073709551614], [0] when mounting diskgroup in ASM

Applies to:
Linux OS - Version: 2.0.1-1 and later   [Release: RHEL4 and later ]
Information in this document applies to any platform.
Linux Kernel - Version: 2.0.1
Symptoms
 3 RAC db.
2 nodes are up and functioning except for 1 node - ASM did not come back up after
the reboot eventhough all disks show available from asmlib's perspective:
Changes
 All that was done with resources were stopped on Node1 and an extra LUN added.
 A reboot was then performed.
Cause
 The cause of the issue is libasm.o corruption
Ran the following to confirm that disks are ok:
/dev/oracleasm listdisks
/usr/sbin/asmtool -I -l /dev/oracleasm -n /dev/sdg1 -a label
/usr/sbin/oracleasm-discover 'ORCL:*'
dd if=/dev/sdg1 bs=8192 count=1 | od -c
==> output checked out fine
.
kfod asm_diskstring='ORCL:*'
==> this failed on Node1
KFOD-00600: file not found; argument [610][kfklLibFetchNext00] even though libasm.o exists
You might see the following call stack as well
----- Call Stack Trace -----
kfklLibScanNext
kfkLibFetchNext
kfuitrnInit
kfkLibIterInit
kfkLoadAllLibs
kfkDiscoverString
kfdDiscoverString
kfdDiscoverShallow
kfgbDriver
strace showed
 Node1-failing
-------
stat("/opt/oracle/extapi/64/asm/orcl/1/libasm.so", {st_mode=S_IFREG|0777, st_size=19344, ...}) = 0
 getdents64(4, /* 0 entries */, 4096) = 0 <<<<
 close(4) = 0
 open("/opt/oracle/product/10.2.0/db_1/rdbms/mesg/kfodus.msb", O_RDONLY) = -1
 ENOENT (No such file or directory)
 open("/opt/oracle/product/10.2.0/db_1/rdbms/mesg/kfodus.msb", O_RDONLY) = -1
 ENOENT (No such file or directory)
 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2a9750d000
write(1, "KFOD-00600: file not found; argu"..., 69) = 69
Node2-working
 -----
 stat("/opt/oracle/extapi/64/asm/orcl/1/libasm.so", {st_mode=S_IFREG|0755, st_size=19344, ...}) = 0
 open("/opt/oracle/extapi/64/asm/orcl/1/libasm.so", O_RDONLY) = 4
read(4, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\23\0"..., 832) = 832
fstat(4, {st_mode=S_IFREG|0755, st_size=19344, ...}) = 0
mmap(NULL, 1066104, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) 0x2a9750d000

通过MOS的描述,可以明确定位到问题是:libasm.o异常导致

解决方案

To implement the solution, reinstall the ASMlib RPM
>rpm -Uvh oracleasmlib-2.0.0-1
This replaces the /opt/oracle/extapi/64/asm/orcl/1/libasm.so

How to Get the Contents of an Spfile on ASM when ASM/GRID is down

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:How to Get the Contents of an Spfile on ASM when ASM/GRID is down

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在11g中asm的spfile文件是存放在asm中的,如果asm不能正常启动是否可以获得其spfile信息.这里通过gpnptool来获得spfile文件信息,给大家提供了在11gr2的rac是怎么利用asm 中的spfile启动asm的思路
asm spfile信息

[grid@rac1 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Fri Dec 21 01:41:31 2012
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
With the Real Application Clusters and Automatic Storage Management options
SQL> create pfile='/tmp/pfile' from spfile;
File created.
SQL> !more /tmp/pfile
+ASM1.__oracle_base='/u01/app/gridbase'#ORACLE_BASE set from in memory value
+ASM2.asm_diskgroups='XIFENFEI'#Manual Mount
+ASM1.asm_diskgroups='XIFENFEI'#Manual Mount
*.asm_diskstring='/dev/oracleasm/disks/*'
*.asm_power_limit=1
*.diagnostic_dest='/u01/app/gridbase'
*.instance_type='asm'
*.large_pool_size=12M
*.remote_login_passwordfile='EXCLUSIVE'

关闭集群(asm已关闭)

[root@rac1 ~]# crsctl stop crs
[root@rac1 ~]# ps -ef|grep pmon
root      8768  6372  0 02:53 pts/1    00:00:00 grep pmon
[root@rac1 ~]# crsctl stat res
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

gpnptool命令获取asm disk信息

[root@rac1 ~]# gpnptool get -o-
<?xml version="1.0" encoding="UTF-8"?>
<gpnp:GPnP-Profile Version="1.0" xmlns="http://www.grid-pnp.org/2005/11/gpnp-profile"
xmlns:gpnp="http://www.grid-pnp.org/2005/11/gpnp-profile"
xmlns:orcl="http://www.oracle.com/gpnp/2005/11/gpnp-profile" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.grid-pnp.org/2005/11/gpnp-profile gpnp-profile.xsd" ProfileSequence="4"
ClusterUId="885339054e904f1dbfa646b41d7a0edb" ClusterName="rac-cluster" PALocation="">
<gpnp:Network-Profile>
<gpnp:HostNetwork id="gen" HostName="*">
<gpnp:Network id="net1" IP="192.168.1.0" Adapter="eth0" Use="public"/>
<gpnp:Network id="net2" IP="10.10.1.0" Adapter="eth1" Use="cluster_interconnect"/>
</gpnp:HostNetwork>
</gpnp:Network-Profile>
<orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/>
--重点关注信息(asm disk 信息)
<orcl:ASM-Profile id="asm" DiscoveryString="/dev/oracleasm/disks/*"
SPFile="+DATA/rac-cluster/asmparameterfile/registry.253.776955291"/>
<ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
<ds:SignedInfo>
<ds:CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/>
<ds:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"/>
<ds:Reference URI="">
<ds:Transforms><ds:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/>
<ds:Transform Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#">
<InclusiveNamespaces xmlns="http://www.w3.org/2001/10/xml-exc-c14n#" PrefixList="gpnp orcl xsi"/>
</ds:Transform></ds:Transforms><ds:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
<ds:DigestValue>T2Q3r+5sER2Rp0VfeqzYh461f2s=</ds:DigestValue>
</ds:Reference>
</ds:SignedInfo>
<ds:SignatureValue>
LwcQEtlsPGfywzdYJrOqiTp4cZNFGB/S9Ts8OCvYOGf/Z8HDT2yN5p2nCuxArUfW+KzaPzPHHihpRVaTcAY31nJ2Rcf2vMqYp4e++shliQXC8mg
1oGxQGifkjZwA4pTTEK5MBmr4FTZnR3VArZjjVfJdsmOMfyH4YeSMU5HPjdA=
</ds:SignatureValue>
</ds:Signature>
</gpnp:GPnP-Profile>
Success.
Error CLSGPNP_NO_DAEMON getting profile.

获得asm spfile信息
通过kfed找磁盘中的kfdhdb.sp|ausize来获得asm spfile相关信息

[root@rac1 ~]# ls /dev/oracleasm/disks/
VOL1  VOL2  VOL3  VOL4
[root@rac1 ~]# kfed dev=/dev/oracleasm/disks/VOL1 op=READ | egrep "kfdhdb.sp|ausize"
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.spfile:                       22 ; 0x0f4: 0x00000016
kfdhdb.spfflg:                        1 ; 0x0f8: 0x00000001
[root@rac1 ~]# kfed dev=/dev/oracleasm/disks/VOL2 op=READ | egrep "kfdhdb.sp|ausize"
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.spfile:                        0 ; 0x0f4: 0x00000000
kfdhdb.spfflg:                        0 ; 0x0f8: 0x00000000
[root@rac1 ~]# kfed dev=/dev/oracleasm/disks/VOL3 op=READ | egrep "kfdhdb.sp|ausize"
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.spfile:                        0 ; 0x0f4: 0x00000000
kfdhdb.spfflg:                        0 ; 0x0f8: 0x00000000
[root@rac1 ~]# kfed dev=/dev/oracleasm/disks/VOL4 op=READ | egrep "kfdhdb.sp|ausize"
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.spfile:                        0 ; 0x0f4: 0x00000000
kfdhdb.spfflg:                        0 ; 0x0f8: 0x00000000

这里可以看出来asm spfile信息在磁盘VOL1中,spfile从第22个au开始,1个au(1M).

获得asm spfile 内容

[root@rac1 ~]# dd if=/dev/oracleasm/disks/VOL1 bs=1M skip=22 count=1 > /tmp/spfile
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 1.47474 seconds, 711 kB/s
[root@rac1 ~]# strings /tmp/spfile
+ASM1.__oracle_base='/u01/app/gridbase'#ORACLE_BASE set from in memory value
+ASM2.asm_diskgroups='XIFENFEI'#Manual Mount
+ASM1.asm_diskgroups='XIFENFEI'#Manual Mount
*.asm_diskstring='/dev/oracleasm/disks/*'
*.asm_power_limit=1
*.diagnostic_dest='/u01/app/gridbase'
*.instance_type='asm'
*.large_pool_size=12M
*.remote_login_passwordfile='EXCLUSIVE'

通过对比发现,在asm实例未正常启动的情况下,也可以通过其他方面来获得asm spfile文件.本实验只是对于spfile在asm中位置的定位(大家去猜测11gr2的rac是怎么利用asm 中的spfile启动asm的思路),实际生产环境中请勿模仿,gpnptool命令有较大风险

使用asm disk header 自动备份信息恢复异常asm disk header

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:使用asm disk header 自动备份信息恢复异常asm disk header

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

通过参考kamus的Where is the backup of ASM disk header block,发现从10.2.0.5开始的asm确实存在自动备份asm disk header功能.有了这个功能对于那些不备份asm disk header的同学,提供了一层保证,也增加了asm的安全性.
对于10.2.0.5.0以及以后版本,不管au size是多少,asm disk header自动备份存储的位置是第2个au的倒数第2个block.
计算方法:AU中包含的block num[AU_SIZE/block_size]*2-2[因为从第一个块从0计数],通过该方法计算结论为:
1M AU在510
2M AU在1022
4M AU在2046
8M AU在4094
16M AU在8190
32M AU在16382
64M AU在32766
1.对比备份asm disk header

SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - Prod
PL/SQL Release 10.2.0.5.0 - Production
CORE    10.2.0.5.0      Production
TNS for Linux: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production
SQL> select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') "xifenfei.com"  from dual;
xifenfei.com
-------------------
2012-06-17 09:41:19
SQL>  select group_number,DISK_NUMBER,PATH,HEADER_STATUS
   2  from v$asm_disk where group_number<>0;
GROUP_NUMBER DISK_NUMBER PATH            HEADER_STATU
------------ ----------- --------------- ------------
           1           1 /dev/raw/raw2   MEMBER
           1           0 /dev/raw/raw1   MEMBER
SQL> select group_number,name,BLOCK_SIZE,ALLOCATION_UNIT_SIZE from v$asm_diskgroup;
GROUP_NUMBER NAME                           BLOCK_SIZE ALLOCATION_UNIT_SIZE
------------ ------------------------------ ---------- --------------------
           1 DATA                                 4096              1048576
rac1->  kfed read /dev/raw/raw1 blknum=510|>/tmp/xifenfei.510
rac1->  kfed read /dev/raw/raw1 blknum=0|>/tmp/xifenfei.0
rac1-> ll /tmp/xifenfei*
-rw-r--r--  1 oracle oinstall 6606 Jun 14 04:11 /tmp/xifenfei.0
-rw-r--r--  1 oracle oinstall 6606 Jun 14 04:12 /tmp/xifenfei.510
rac1-> diff /tmp/xifenfei.510 /tmp/xifenfei.0
--通过对比发现两者无不同记录返回,证明他们记录内容完全相同

2.尝试破坏asm disk header

rac1-> dd if=/dev/zero of=/dev/raw/raw1 bs=4096 count=1
1+0 records in
1+0 records out
rac1->  kfed read /dev/raw/raw1 blknum=0
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:                       0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
SQL> select group_number,DISK_NUMBER,PATH,HEADER_STATUS
   2 from v$asm_disk where group_number<>0;
GROUP_NUMBER DISK_NUMBER PATH            HEADER_STATU
------------ ----------- --------------- ------------
           1           1 /dev/raw/raw2   MEMBER
           1           0 /dev/raw/raw1   CANDIDATE
SQL> alter diskgroup  data dismount;
Diskgroup altered.
SQL> alter diskgroup  data mount;
alter diskgroup  data mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA"

3.使用kfed repair修改损坏asm disk header

rac1-> kfed  repair '/dev/raw/raw1'
rac1->  kfed read /dev/raw/raw1 blknum=0
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:              2147483648 ; 0x008: TYPE=0x8 NUMB=0x0
kfbh.check:                   883602253 ; 0x00c: 0x34aab34d
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
…………
SQL> alter diskgroup  data mount;
Diskgroup altered.

4.使用kfed merge恢复asm disk header

rac1-> dd if=/dev/zero of=/dev/raw/raw1 bs=4096 count=1
1+0 records in
1+0 records out
rac1->  kfed read /dev/raw/raw1 blknum=0
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:                       0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
SQL> alter diskgroup  data dismount;
Diskgroup altered.
SQL> alter diskgroup  data mount;
alter diskgroup  data mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA"
rac1->  kfed merge /dev/raw/raw1 /tmp/xifenfei.510
SQL> alter diskgroup  data mount;
Diskgroup altered.

通过试验证明在10.2.0.5及其以后版本中,对于备份的asm disk header我们可以通过使用kfed repair和kfed merge来恢复.