联系:手机/微信(+86 17813235971) QQ(107644445)
标题:Exadata磁盘损坏导致磁盘组无法mount恢复(oracle一体机磁盘组异常恢复)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
Oracle Exadata客户,在换盘过程中,cell节点又一块磁盘损坏,导致datac1磁盘组(该磁盘组是normal方式冗余)无法mount
Thu Jul 20 22:01:21 2023 SQL> alter diskgroup datac1 mount force NOTE: cache registered group DATAC1 number=1 incarn=0x0728ad12 NOTE: cache began mount (first) of group DATAC1 number=1 incarn=0x0728ad12 NOTE: Assigning number (1,35) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_11_dm01celadm03) NOTE: Assigning number (1,31) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_07_dm01celadm03) NOTE: Assigning number (1,24) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_00_dm01celadm03) NOTE: Assigning number (1,25) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_01_dm01celadm03) NOTE: Assigning number (1,27) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_03_dm01celadm03) NOTE: Assigning number (1,33) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_09_dm01celadm03) NOTE: Assigning number (1,30) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_06_dm01celadm03) NOTE: Assigning number (1,28) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_04_dm01celadm03) NOTE: Assigning number (1,26) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_02_dm01celadm03) NOTE: Assigning number (1,1) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_08_dm01celadm03) NOTE: Assigning number (1,34) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_10_dm01celadm03) NOTE: Assigning number (1,29) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_05_dm01celadm03) NOTE: Assigning number (1,3) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_07_dm01celadm02) NOTE: Assigning number (1,4) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_06_dm01celadm02) NOTE: Assigning number (1,5) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_00_dm01celadm02) NOTE: Assigning number (1,6) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_10_dm01celadm02) NOTE: Assigning number (1,7) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_08_dm01celadm02) NOTE: Assigning number (1,8) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_03_dm01celadm02) NOTE: Assigning number (1,9) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_11_dm01celadm02) NOTE: Assigning number (1,10) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_01_dm01celadm02) NOTE: Assigning number (1,11) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_04_dm01celadm02) NOTE: Assigning number (1,21) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_05_dm01celadm02) NOTE: Assigning number (1,43) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_02_dm01celadm02) NOTE: Assigning number (1,36) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_07_dm01celadm01) NOTE: Assigning number (1,37) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_09_dm01celadm01) NOTE: Assigning number (1,38) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_11_dm01celadm01) NOTE: Assigning number (1,0) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_08_dm01celadm01) NOTE: Assigning number (1,40) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_00_dm01celadm01) NOTE: Assigning number (1,41) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_03_dm01celadm01) NOTE: Assigning number (1,42) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_06_dm01celadm01) NOTE: Assigning number (1,44) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_05_dm01celadm01) NOTE: Assigning number (1,45) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_01_dm01celadm01) NOTE: Assigning number (1,46) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_02_dm01celadm01) NOTE: Assigning number (1,47) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_10_dm01celadm01) NOTE: Assigning number (1,2) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_04_dm01celadm01) Thu Jul 20 22:01:28 2023 NOTE: GMON heartbeating for grp 1 GMON querying group 1 at 450 for pid 30, osid 171838 NOTE: Assigning number (1,32) to disk () NOTE: Assigning number (1,39) to disk () GMON querying group 1 at 451 for pid 30, osid 171838 NOTE: cache closing disk 32 of grp 1: (not open) NOTE: process _user171838_+asm1 (171838) initiating offline of disk 39.3915945266 () with mask 0x7e[0x7f] in group 1 NOTE: initiating PST update: grp = 1, dsk = 39/0xe9689532, mask = 0x6a, op = clear GMON updating disk modes for group 1 at 452 for pid 30, osid 171838 NOTE: cache closing disk 32 of grp 1: (not open) ERROR: Disk 39 cannot be offlined, since all the disks [39, 32] with mirrored data would be offline. ERROR: too many offline disks in PST (grp 1) WARNING: Offline for disk in mode 0x7f failed. NOTE: cache dismounting (not clean) group 1/0x0728AD12 (DATAC1) NOTE: messaging CKPT to quiesce pins Unix process pid: 171838, image: oracle@dm01dbadm01.gyzq.cn (TNS V1-V3) NOTE: dbwr not being msg'd to dismount NOTE: lgwr not being msg'd to dismount NOTE: cache dismounted group 1/0x0728AD12 (DATAC1) NOTE: cache ending mount (fail) of group DATAC1 number=1 incarn=0x0728ad12 NOTE: cache deleting context for group DATAC1 1/0x0728ad12 NOTE: cache closing disk 32 of grp 1: (not open) GMON dismounting group 1 at 453 for pid 30, osid 171838 NOTE: Disk DATAC1_CD_08_DM01CELADM01 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_08_DM01CELADM03 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_04_DM01CELADM01 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_07_DM01CELADM02 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_06_DM01CELADM02 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_00_DM01CELADM02 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_10_DM01CELADM02 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_08_DM01CELADM02 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_03_DM01CELADM02 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_11_DM01CELADM02 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_01_DM01CELADM02 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_04_DM01CELADM02 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_05_DM01CELADM02 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_00_DM01CELADM03 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_01_DM01CELADM03 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_02_DM01CELADM03 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_03_DM01CELADM03 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_04_DM01CELADM03 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_05_DM01CELADM03 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_06_DM01CELADM03 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_07_DM01CELADM03 in mode 0x7f marked for de-assignment NOTE: Disk in mode 0x1 marked for de-assignment NOTE: Disk DATAC1_CD_09_DM01CELADM03 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_10_DM01CELADM03 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_11_DM01CELADM03 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_07_DM01CELADM01 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_09_DM01CELADM01 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_11_DM01CELADM01 in mode 0x7f marked for de-assignment NOTE: Disk in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_00_DM01CELADM01 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_03_DM01CELADM01 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_06_DM01CELADM01 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_02_DM01CELADM02 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_05_DM01CELADM01 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_01_DM01CELADM01 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_02_DM01CELADM01 in mode 0x7f marked for de-assignment NOTE: Disk DATAC1_CD_10_DM01CELADM01 in mode 0x7f marked for de-assignment ERROR: diskgroup DATAC1 was not mounted ORA-15032: not all alterations performed ORA-15040: diskgroup is incomplete ORA-15066: offlining disk "39" in group "DATAC1" may result in a data loss ORA-15042: ASM disk "39" is missing from group number "1" ORA-15042: ASM disk "32" is missing from group number "1" ERROR: alter diskgroup datac1 mount force
故障原因是由于asm disk 32还已经损坏在换盘过程中(数据没有reblance完成),又损坏了asm disk 39,而这两份磁盘中有数据互为镜像,因此磁盘组无法正常mount起来.
检查cell节点celldisk和griddisk情况,确认底层磁盘损坏
对于这种情况,因为normal冗余的两份数据都有部分丢失,无法直接恢复数据,通过底层磁盘级别恢复(参考以前一次的Oracle exadata故障恢复:Oracle Exadata坏盘导致磁盘组无法mount恢复),然后比较顺利恢复数据,实现业务数据0丢失
SQL> alter datac1 mount; Diskgroup altered. SQL> alter diskgroup datac1 check all; Diskgroup altered.
在实际恢复过程中由于客户进行了各种尝试,直接新镜像盘然后插入新盘,强制拉磁盘组drop异常disk操作等,导致第一现场发生一些破坏,增加了恢复难道,但是最终通过各种方法弥补,实现了预期的恢复效果(业务数据0丢失)