ocr磁盘组掉盘故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ocr磁盘组掉盘故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

由于某种故障导致crs的OCR_0001盘掉线,votedisk从3个变为了2个

WARNING: Write Failed. group:3 disk:1 AU:1 offset:4190208 size:4096
WARNING: Hbeat write to PST disk 1.3915948466 in group 3 failed. [4]
Mon Jun 14 15:31:11 2021
NOTE: process _b000_+asm1 (21889) initiating offline of disk 1.3915948466 (OCR_0001) with mask 0x7e in group 3
NOTE: checking PST: grp = 3
GMON checking disk modes for group 3 at 14 for pid 28, osid 21889
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: checking PST for grp 3 done.
NOTE: sending set offline flag message 1047812201 to 1 disk(s) in group 3
WARNING: Disk OCR_0001 in mode 0x7f is now being offlined
INFO: Instance #2 could not find disk 1 in group 3
NOTE: initiating PST update: grp = 3, dsk = 1/0xe968a1b2, mask = 0x6a, op = clear
GMON updating disk modes for group 3 at 15 for pid 28, osid 21889
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: PST update grp = 3 completed successfully 
NOTE: initiating PST update: grp = 3, dsk = 1/0xe968a1b2, mask = 0x7e, op = clear
GMON updating disk modes for group 3 at 16 for pid 28, osid 21889
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: cache closing disk 1 of grp 3: OCR_0001
NOTE: PST update grp = 3 completed successfully 
Mon Jun 14 15:31:13 2021
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 3 voting file(s).
NOTE: Voting file relocation is required in diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Successful voting file relocation on diskgroup OCR
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 2 voting file(s).
NOTE: Voting file relocation is required in diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Successful voting file relocation on diskgroup OCR
Mon Jun 14 15:34:08 2021
WARNING: PST-initiated drop of 1 disk(s) in group 3(.1918390620))
SQL> alter diskgroup OCR drop disk OCR_0001 force /* ASM SERVER */ 
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=3
Mon Jun 14 15:34:10 2021
GMON updating for reconfiguration, group 3 at 17 for pid 28, osid 21889
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: cache closing disk 1 of grp 3: (not open) OCR_0001
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: group 3 PST updated.
Mon Jun 14 15:34:10 2021
NOTE: membership refresh pending for group 3/0x7258515c (OCR)
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 2 voting file(s).
NOTE: Voting file relocation is required in diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Successful voting file relocation on diskgroup OCR
GMON querying group 3 at 18 for pid 18, osid 8900
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: cache closing disk 1 of grp 3: (not open) _DROPPED_0001_OCR
SUCCESS: refreshed membership for 3/0x7258515c (OCR)
SUCCESS: alter diskgroup OCR drop disk OCR_0001 force /* ASM SERVER */

在第一次掉盘之后rebalance完成之后,又掉一块盘,ocr磁盘组正常,表决盘因为就只有一个磁盘,无法在ocr磁盘组中refresh到其他磁盘上

Tue Jun 15 04:41:42 2021
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
Tue Jun 15 04:41:42 2021
NOTE: process _b000_+asm1 (58548) initiating offline of disk 0.3915948465 (OCR_0000) with mask 0x7e in group 3
NOTE: checking PST: grp = 3
GMON checking disk modes for group 3 at 23 for pid 28, osid 58548
NOTE: group OCR: updated PST location: disk 0002 (PST copy 0)
NOTE: checking PST for grp 3 done.
NOTE: sending set offline flag message 3615961191 to 1 disk(s) in group 3
WARNING: Disk OCR_0000 in mode 0x7f is now being offlined
INFO: Instance #2 could not find disk 1 in group 3
NOTE: initiating PST update: grp = 3, dsk = 0/0xe968a1b1, mask = 0x6a, op = clear
GMON updating disk modes for group 3 at 24 for pid 28, osid 58548
NOTE: group OCR: updated PST location: disk 0002 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 0)
NOTE: PST update grp = 3 completed successfully 
NOTE: initiating PST update: grp = 3, dsk = 0/0xe968a1b1, mask = 0x7e, op = clear
GMON updating disk modes for group 3 at 25 for pid 28, osid 58548
NOTE: group OCR: updated PST location: disk 0002 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 0)
NOTE: cache closing disk 0 of grp 3: OCR_0000
NOTE: PST update grp = 3 completed successfully 
Tue Jun 15 04:41:44 2021
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 2 voting file(s).
NOTE: Voting file relocation is required in diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
WARNING: Waited 18 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 18 secs for write IO to PST disk 0 in group 3.
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
Tue Jun 15 04:44:21 2021
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
Tue Jun 15 04:44:21 2021
WARNING: PST-initiated drop of 1 disk(s) in group 3(.1918390620))
SQL> alter diskgroup OCR drop disk OCR_0000 force /* ASM SERVER */ 
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=3
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
Tue Jun 15 04:44:24 2021
GMON updating for reconfiguration, group 3 at 26 for pid 28, osid 58548
NOTE: cache closing disk 0 of grp 3: (not open) OCR_0000
NOTE: group OCR: updated PST location: disk 0002 (PST copy 0)
NOTE: group 3 PST updated.
NOTE: membership refresh pending for group 3/0x7258515c (OCR)
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
GMON querying group 3 at 27 for pid 18, osid 8900
NOTE: cache closing disk 0 of grp 3: (not open) _DROPPED_0000_OCR
SUCCESS: refreshed membership for 3/0x7258515c (OCR)
NOTE: starting rebalance of group 3/0x7258515c (OCR) at power 1
SUCCESS: alter diskgroup OCR drop disk OCR_0000 force /* ASM SERVER */

查询这个时候的ocr磁盘组相关信息
7a57d339c00820129cb3522c5082f35


可以明显的看到,ocr磁盘组只剩余1个disk,查询表决盘信息

node1-> crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   3619aee7c3b04fc1bfa5c4ce659acbf7 (/dev/emcpowerc) [OCR]
 2. ONLINE   00bc3e79f7404ff2bf60925a7b8a5a6d (/dev/emcpowere) [OCR]
Located 2 voting disk(s).

可以发现表决盘中的两个disk一个属于ocr磁盘组,一个是被ocr磁盘组drop掉的磁盘,尝试增加以前离线的磁盘到ocr磁盘组

SQL> alter diskgroup OCR add  disk '/dev/emcpowerc';
alter diskgroup OCR add  disk '/dev/emcpowerc'
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15033: disk '/dev/emcpowerc' belongs to diskgroup "OCR"


SQL> alter diskgroup OCR add  disk '/dev/emcpowerc' force  
  2  ;
alter diskgroup OCR add  disk '/dev/emcpowerc' force
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 15191
Session ID: 1613 Serial number: 7

查看alert日志

SQL> alter diskgroup OCR add  disk '/dev/emcpowerc' force
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (3,4) to disk (/dev/emcpowerc)
NOTE: requesting all-instance membership refresh for group=3
WARNING: ignoring disk /dev/emcpowerd in deep discovery
NOTE: initializing header on grp 3 disk OCR_0004
WARNING: ignoring disk /dev/emcpowerd in deep discovery
NOTE: requesting all-instance disk validation for group=3
NOTE: skipping rediscovery for group 3/0x725d2390 (OCR) on local instance.
NOTE: requesting all-instance disk validation for group=3
NOTE: skipping rediscovery for group 3/0x725d2390 (OCR) on local instance.
NOTE: Attempting voting file relocation on diskgroup OCR
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_12207.trc  (incident=311185):
ORA-00600: internal error code, arguments: [kfdvfGetCurrent_baddsk], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_311185/+ASM1_rbal_12207_i311185.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ERROR: ORA-600 thrown in RBAL for group number 3
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_12207.trc:
ORA-00600: internal error code, arguments: [kfdvfGetCurrent_baddsk], [], [], [], [], [], [], [], [], [], [], []
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_12207.trc:
ORA-00600: internal error code, arguments: [kfdvfGetCurrent_baddsk], [], [], [], [], [], [], [], [], [], [], []
RBAL (ospid: 12207): terminating the instance due to error 488

由于ORA-600 kfdvfGetCurrent_baddsk错误导致增加磁盘失败,通过上面查询的votedisk的信息,可以发现emcpowerc这个盘虽然ocr中离线,但是依旧还是votedisk盘,因此无法增加到该磁盘组中,采用变通方法,先加另外一块盘

SQL> alter diskgroup OCR add failgroup OCR_0001 disk '/dev/emcpowerd' force;

Diskgroup altered.

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
node1-> crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   00bc3e79f7404ff2bf60925a7b8a5a6d (/dev/emcpowere) [OCR]
 2. ONLINE   0eef8152df5d4f41bf973ad5dc5a6cb1 (/dev/emcpowerd) [OCR]
Located 2 voting disk(s).

增加成功emcpowerd之后,emcpowerc已经不再是表决盘,变为了emcpowerd,再次增加emcpowerc

SQL> alter diskgroup OCR add failgroup OCR_0000 disk '/dev/emcpowerc' force;

Diskgroup altered.

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
node1-> crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   00bc3e79f7404ff2bf60925a7b8a5a6d (/dev/emcpowere) [OCR]
 2. ONLINE   0eef8152df5d4f41bf973ad5dc5a6cb1 (/dev/emcpowerd) [OCR]
 3. ONLINE   4f6201f808dc4ff3bf928b14eae0d4a6 (/dev/emcpowerc) [OCR]
Located 3 voting disk(s).

ASMCMD> lsdsk -G ocr
Path
/dev/emcpowerc
/dev/emcpowerd
/dev/emcpowere
SQL> alter diskgroup OCR add failgroup OCR_0000 disk '/dev/emcpowerc' force 
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (3,0) to disk (/dev/emcpowerc)
NOTE: requesting all-instance membership refresh for group=3
NOTE: initializing header on grp 3 disk OCR_0000
NOTE: requesting all-instance disk validation for group=3
Mon Jan 24 17:47:42 2022
NOTE: skipping rediscovery for group 3/0x725dccb9 (OCR) on local instance.
NOTE: requesting all-instance disk validation for group=3
NOTE: skipping rediscovery for group 3/0x725dccb9 (OCR) on local instance.
Mon Jan 24 17:47:48 2022
GMON updating for reconfiguration, group 3 at 20 for pid 30, osid 16978
NOTE: group 3 PST updated.
NOTE: initiating PST update: grp = 3
GMON updating group 3 at 21 for pid 30, osid 16978
NOTE: group OCR: updated PST location: disk 0002 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0005 (PST copy 1)
NOTE: group OCR: updated PST location: disk 0000 (PST copy 2)
NOTE: PST update grp = 3 completed successfully 
NOTE: membership refresh pending for group 3/0x725dccb9 (OCR)
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 2 voting file(s).
NOTE: Voting file relocation is required in diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
GMON querying group 3 at 22 for pid 18, osid 15952
NOTE: cache opening disk 0 of grp 3: OCR_0000 path:/dev/emcpowerc
Mon Jan 24 17:47:53 2022
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 2 voting file(s).
NOTE: Voting file relocation is required in diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
GMON querying group 3 at 23 for pid 18, osid 15952
SUCCESS: refreshed membership for 3/0x725dccb9 (OCR)
Mon Jan 24 17:47:53 2022
SUCCESS: alter diskgroup OCR add failgroup OCR_0000 disk '/dev/emcpowerc' force
NOTE: starting rebalance of group 3/0x725dccb9 (OCR) at power 1
Starting background process ARB0
Mon Jan 24 17:47:53 2022
ARB0 started with pid=31, OS id=17092 
NOTE: assigning ARB0 to group 3/0x725dccb9 (OCR) with 1 parallel I/O
cellip.ora not found.
NOTE: F1X0 copy 3 relocating from 65534:4294967294 to 0:2 for diskgroup 3 (OCR)
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 3/0x725dccb9 (OCR)
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 2 voting file(s).
NOTE: Voting file relocation is required in diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: voting file allocation on grp 3 disk OCR_0000
NOTE: Successful voting file relocation on diskgroup OCR
Mon Jan 24 17:47:57 2022
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=3
NOTE: membership refresh pending for group 3/0x725dccb9 (OCR)
Mon Jan 24 17:48:03 2022
GMON querying group 3 at 24 for pid 18, osid 15952
SUCCESS: refreshed membership for 3/0x725dccb9 (OCR)
Mon Jan 24 17:48:06 2022
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 3 voting file(s).

表决磁盘组从2个变为了3个,ocr磁盘组也恢复了正常的3个,至此OCR掉盘的故障处理完成