ocr磁盘组掉盘故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ocr磁盘组掉盘故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

由于某种故障导致crs的OCR_0001盘掉线,votedisk从3个变为了2个

WARNING: Write Failed. group:3 disk:1 AU:1 offset:4190208 size:4096
WARNING: Hbeat write to PST disk 1.3915948466 in group 3 failed. [4]
Mon Jun 14 15:31:11 2021
NOTE: process _b000_+asm1 (21889) initiating offline of disk 1.3915948466 (OCR_0001) with mask 0x7e in group 3
NOTE: checking PST: grp = 3
GMON checking disk modes for group 3 at 14 for pid 28, osid 21889
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: checking PST for grp 3 done.
NOTE: sending set offline flag message 1047812201 to 1 disk(s) in group 3
WARNING: Disk OCR_0001 in mode 0x7f is now being offlined
INFO: Instance #2 could not find disk 1 in group 3
NOTE: initiating PST update: grp = 3, dsk = 1/0xe968a1b2, mask = 0x6a, op = clear
GMON updating disk modes for group 3 at 15 for pid 28, osid 21889
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: PST update grp = 3 completed successfully 
NOTE: initiating PST update: grp = 3, dsk = 1/0xe968a1b2, mask = 0x7e, op = clear
GMON updating disk modes for group 3 at 16 for pid 28, osid 21889
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: cache closing disk 1 of grp 3: OCR_0001
NOTE: PST update grp = 3 completed successfully 
Mon Jun 14 15:31:13 2021
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 3 voting file(s).
NOTE: Voting file relocation is required in diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Successful voting file relocation on diskgroup OCR
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 2 voting file(s).
NOTE: Voting file relocation is required in diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Successful voting file relocation on diskgroup OCR
Mon Jun 14 15:34:08 2021
WARNING: PST-initiated drop of 1 disk(s) in group 3(.1918390620))
SQL> alter diskgroup OCR drop disk OCR_0001 force /* ASM SERVER */ 
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=3
Mon Jun 14 15:34:10 2021
GMON updating for reconfiguration, group 3 at 17 for pid 28, osid 21889
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: cache closing disk 1 of grp 3: (not open) OCR_0001
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: group 3 PST updated.
Mon Jun 14 15:34:10 2021
NOTE: membership refresh pending for group 3/0x7258515c (OCR)
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 2 voting file(s).
NOTE: Voting file relocation is required in diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Successful voting file relocation on diskgroup OCR
GMON querying group 3 at 18 for pid 18, osid 8900
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: cache closing disk 1 of grp 3: (not open) _DROPPED_0001_OCR
SUCCESS: refreshed membership for 3/0x7258515c (OCR)
SUCCESS: alter diskgroup OCR drop disk OCR_0001 force /* ASM SERVER */

在第一次掉盘之后rebalance完成之后,又掉一块盘,ocr磁盘组正常,表决盘因为就只有一个磁盘,无法在ocr磁盘组中refresh到其他磁盘上

Tue Jun 15 04:41:42 2021
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
Tue Jun 15 04:41:42 2021
NOTE: process _b000_+asm1 (58548) initiating offline of disk 0.3915948465 (OCR_0000) with mask 0x7e in group 3
NOTE: checking PST: grp = 3
GMON checking disk modes for group 3 at 23 for pid 28, osid 58548
NOTE: group OCR: updated PST location: disk 0002 (PST copy 0)
NOTE: checking PST for grp 3 done.
NOTE: sending set offline flag message 3615961191 to 1 disk(s) in group 3
WARNING: Disk OCR_0000 in mode 0x7f is now being offlined
INFO: Instance #2 could not find disk 1 in group 3
NOTE: initiating PST update: grp = 3, dsk = 0/0xe968a1b1, mask = 0x6a, op = clear
GMON updating disk modes for group 3 at 24 for pid 28, osid 58548
NOTE: group OCR: updated PST location: disk 0002 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 0)
NOTE: PST update grp = 3 completed successfully 
NOTE: initiating PST update: grp = 3, dsk = 0/0xe968a1b1, mask = 0x7e, op = clear
GMON updating disk modes for group 3 at 25 for pid 28, osid 58548
NOTE: group OCR: updated PST location: disk 0002 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 0)
NOTE: cache closing disk 0 of grp 3: OCR_0000
NOTE: PST update grp = 3 completed successfully 
Tue Jun 15 04:41:44 2021
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 2 voting file(s).
NOTE: Voting file relocation is required in diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
WARNING: Waited 18 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 18 secs for write IO to PST disk 0 in group 3.
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
Tue Jun 15 04:44:21 2021
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
Tue Jun 15 04:44:21 2021
WARNING: PST-initiated drop of 1 disk(s) in group 3(.1918390620))
SQL> alter diskgroup OCR drop disk OCR_0000 force /* ASM SERVER */ 
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=3
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
Tue Jun 15 04:44:24 2021
GMON updating for reconfiguration, group 3 at 26 for pid 28, osid 58548
NOTE: cache closing disk 0 of grp 3: (not open) OCR_0000
NOTE: group OCR: updated PST location: disk 0002 (PST copy 0)
NOTE: group 3 PST updated.
NOTE: membership refresh pending for group 3/0x7258515c (OCR)
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
GMON querying group 3 at 27 for pid 18, osid 8900
NOTE: cache closing disk 0 of grp 3: (not open) _DROPPED_0000_OCR
SUCCESS: refreshed membership for 3/0x7258515c (OCR)
NOTE: starting rebalance of group 3/0x7258515c (OCR) at power 1
SUCCESS: alter diskgroup OCR drop disk OCR_0000 force /* ASM SERVER */

查询这个时候的ocr磁盘组相关信息
7a57d339c00820129cb3522c5082f35


可以明显的看到,ocr磁盘组只剩余1个disk,查询表决盘信息

node1-> crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   3619aee7c3b04fc1bfa5c4ce659acbf7 (/dev/emcpowerc) [OCR]
 2. ONLINE   00bc3e79f7404ff2bf60925a7b8a5a6d (/dev/emcpowere) [OCR]
Located 2 voting disk(s).

可以发现表决盘中的两个disk一个属于ocr磁盘组,一个是被ocr磁盘组drop掉的磁盘,尝试增加以前离线的磁盘到ocr磁盘组

SQL> alter diskgroup OCR add  disk '/dev/emcpowerc';
alter diskgroup OCR add  disk '/dev/emcpowerc'
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15033: disk '/dev/emcpowerc' belongs to diskgroup "OCR"


SQL> alter diskgroup OCR add  disk '/dev/emcpowerc' force  
  2  ;
alter diskgroup OCR add  disk '/dev/emcpowerc' force
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 15191
Session ID: 1613 Serial number: 7

查看alert日志

SQL> alter diskgroup OCR add  disk '/dev/emcpowerc' force
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (3,4) to disk (/dev/emcpowerc)
NOTE: requesting all-instance membership refresh for group=3
WARNING: ignoring disk /dev/emcpowerd in deep discovery
NOTE: initializing header on grp 3 disk OCR_0004
WARNING: ignoring disk /dev/emcpowerd in deep discovery
NOTE: requesting all-instance disk validation for group=3
NOTE: skipping rediscovery for group 3/0x725d2390 (OCR) on local instance.
NOTE: requesting all-instance disk validation for group=3
NOTE: skipping rediscovery for group 3/0x725d2390 (OCR) on local instance.
NOTE: Attempting voting file relocation on diskgroup OCR
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_12207.trc  (incident=311185):
ORA-00600: internal error code, arguments: [kfdvfGetCurrent_baddsk], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_311185/+ASM1_rbal_12207_i311185.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ERROR: ORA-600 thrown in RBAL for group number 3
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_12207.trc:
ORA-00600: internal error code, arguments: [kfdvfGetCurrent_baddsk], [], [], [], [], [], [], [], [], [], [], []
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_12207.trc:
ORA-00600: internal error code, arguments: [kfdvfGetCurrent_baddsk], [], [], [], [], [], [], [], [], [], [], []
RBAL (ospid: 12207): terminating the instance due to error 488

由于ORA-600 kfdvfGetCurrent_baddsk错误导致增加磁盘失败,通过上面查询的votedisk的信息,可以发现emcpowerc这个盘虽然ocr中离线,但是依旧还是votedisk盘,因此无法增加到该磁盘组中,采用变通方法,先加另外一块盘

SQL> alter diskgroup OCR add failgroup OCR_0001 disk '/dev/emcpowerd' force;

Diskgroup altered.

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
node1-> crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   00bc3e79f7404ff2bf60925a7b8a5a6d (/dev/emcpowere) [OCR]
 2. ONLINE   0eef8152df5d4f41bf973ad5dc5a6cb1 (/dev/emcpowerd) [OCR]
Located 2 voting disk(s).

增加成功emcpowerd之后,emcpowerc已经不再是表决盘,变为了emcpowerd,再次增加emcpowerc

SQL> alter diskgroup OCR add failgroup OCR_0000 disk '/dev/emcpowerc' force;

Diskgroup altered.

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
node1-> crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   00bc3e79f7404ff2bf60925a7b8a5a6d (/dev/emcpowere) [OCR]
 2. ONLINE   0eef8152df5d4f41bf973ad5dc5a6cb1 (/dev/emcpowerd) [OCR]
 3. ONLINE   4f6201f808dc4ff3bf928b14eae0d4a6 (/dev/emcpowerc) [OCR]
Located 3 voting disk(s).

ASMCMD> lsdsk -G ocr
Path
/dev/emcpowerc
/dev/emcpowerd
/dev/emcpowere
SQL> alter diskgroup OCR add failgroup OCR_0000 disk '/dev/emcpowerc' force 
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (3,0) to disk (/dev/emcpowerc)
NOTE: requesting all-instance membership refresh for group=3
NOTE: initializing header on grp 3 disk OCR_0000
NOTE: requesting all-instance disk validation for group=3
Mon Jan 24 17:47:42 2022
NOTE: skipping rediscovery for group 3/0x725dccb9 (OCR) on local instance.
NOTE: requesting all-instance disk validation for group=3
NOTE: skipping rediscovery for group 3/0x725dccb9 (OCR) on local instance.
Mon Jan 24 17:47:48 2022
GMON updating for reconfiguration, group 3 at 20 for pid 30, osid 16978
NOTE: group 3 PST updated.
NOTE: initiating PST update: grp = 3
GMON updating group 3 at 21 for pid 30, osid 16978
NOTE: group OCR: updated PST location: disk 0002 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0005 (PST copy 1)
NOTE: group OCR: updated PST location: disk 0000 (PST copy 2)
NOTE: PST update grp = 3 completed successfully 
NOTE: membership refresh pending for group 3/0x725dccb9 (OCR)
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 2 voting file(s).
NOTE: Voting file relocation is required in diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
GMON querying group 3 at 22 for pid 18, osid 15952
NOTE: cache opening disk 0 of grp 3: OCR_0000 path:/dev/emcpowerc
Mon Jan 24 17:47:53 2022
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 2 voting file(s).
NOTE: Voting file relocation is required in diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
GMON querying group 3 at 23 for pid 18, osid 15952
SUCCESS: refreshed membership for 3/0x725dccb9 (OCR)
Mon Jan 24 17:47:53 2022
SUCCESS: alter diskgroup OCR add failgroup OCR_0000 disk '/dev/emcpowerc' force
NOTE: starting rebalance of group 3/0x725dccb9 (OCR) at power 1
Starting background process ARB0
Mon Jan 24 17:47:53 2022
ARB0 started with pid=31, OS id=17092 
NOTE: assigning ARB0 to group 3/0x725dccb9 (OCR) with 1 parallel I/O
cellip.ora not found.
NOTE: F1X0 copy 3 relocating from 65534:4294967294 to 0:2 for diskgroup 3 (OCR)
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 3/0x725dccb9 (OCR)
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 2 voting file(s).
NOTE: Voting file relocation is required in diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: voting file allocation on grp 3 disk OCR_0000
NOTE: Successful voting file relocation on diskgroup OCR
Mon Jan 24 17:47:57 2022
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=3
NOTE: membership refresh pending for group 3/0x725dccb9 (OCR)
Mon Jan 24 17:48:03 2022
GMON querying group 3 at 24 for pid 18, osid 15952
SUCCESS: refreshed membership for 3/0x725dccb9 (OCR)
Mon Jan 24 17:48:06 2022
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 3 voting file(s).

表决磁盘组从2个变为了3个,ocr磁盘组也恢复了正常的3个,至此OCR掉盘的故障处理完成

一次 CRS-1013: ASM 磁盘组中的 OCR 位置不可访问 故障分析

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:一次 CRS-1013: ASM 磁盘组中的 OCR 位置不可访问 故障分析

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友告知我集群突然异常,让我给看看什么原因,集群alert日志

2021-03-14 21:02:15.517 [OHASD(31771)]CRS-8500: Oracle Clusterware OHASD 进程以操作系统进程 ID 31771 开头
2021-03-14 21:02:15.561 [OHASD(31771)]CRS-0714: Oracle Clusterware 发行版 12.1.0.2.0。
2021-03-14 21:02:15.619 [OHASD(31771)]CRS-2112: 已在节点 rac1 上启动 OLR 服务。
2021-03-14 21:02:15.791 [OHASD(31771)]CRS-1301: 已在节点 rac1 上启动 Oracle 高可用性服务。
2021-03-14 21:02:15.910 [OHASD(31771)]CRS-8017: 位置:/etc/oracle/lastgasp具有2个重新启动指导日志文件,0个已发布,0个出现错误
2021-03-14 21:02:16.789 [CSSDAGENT(32015)]CRS-8500: Oracle Clusterware CSSDAGENT 进程以操作系统进程 ID 32015 开头
2021-03-14 21:02:16.868 [CSSDMONITOR(32017)]CRS-8500: Oracle Clusterware CSSDMONITOR 进程以操作系统进程 ID 32017 开头
2021-03-14 21:02:17.751 [ORAROOTAGENT(32008)]CRS-8500: Oracle Clusterware ORAROOTAGENT 进程以操作系统进程 ID 32008 开头
2021-03-14 21:02:17.916 [ORAAGENT(32012)]CRS-8500: Oracle Clusterware ORAAGENT 进程以操作系统进程 ID 32012 开头
2021-03-14 21:02:18.604 [ORAAGENT(32012)]CRS-5011: 检查资源 "ora.asm" 失败: 详细资料见 "(:CLSN00006:)" 
(位于 "/u01/app/gridbase/diag/crs/rac1/crs/trace/ohasd_oraagent_grid.trc")
2021-03-14 21:02:18.969 [ORAAGENT(32117)]CRS-8500: Oracle Clusterware ORAAGENT 进程以操作系统进程 ID 32117 开头
2021-03-14 21:02:19.050 [MDNSD(32130)]CRS-8500: Oracle Clusterware MDNSD 进程以操作系统进程 ID 32130 开头
2021-03-14 21:02:19.117 [EVMD(32132)]CRS-8500: Oracle Clusterware EVMD 进程以操作系统进程 ID 32132 开头
2021-03-14 21:02:20.078 [GPNPD(32151)]CRS-8500: Oracle Clusterware GPNPD 进程以操作系统进程 ID 32151 开头
2021-03-14 21:02:21.145 [GIPCD(32172)]CRS-8500: Oracle Clusterware GIPCD 进程以操作系统进程 ID 32172 开头
2021-03-14 21:02:21.163 [GPNPD(32151)]CRS-2328: 已在节点 rac1 上启动 GPNPD。
2021-03-14 21:02:22.172 [ORAROOTAGENT(32181)]CRS-8500: Oracle Clusterware ORAROOTAGENT 进程以操作系统进程 ID 32181 开头
2021-03-14 21:02:22.339 [CLSECHO(32204)]CRS-10001: 14-Mar-21 21:02 ACFS-9391: 正在检查现有 ADVM/ACFS 安装。
2021-03-14 21:02:22.580 [CLSECHO(32209)]CRS-10001: 14-Mar-21 21:02 ACFS-9392: 正在验证操作系统的 ADVM/ACFS 安装文件。
2021-03-14 21:02:22.598 [CLSECHO(32211)]CRS-10001: 14-Mar-21 21:02 ACFS-9393: 正在验证 ASM 管理员设置。
2021-03-14 21:02:22.646 [CLSECHO(32216)]CRS-10001: 14-Mar-21 21:02 ACFS-9308: 正在加载已安装的 ADVM/ACFS 驱动程序。
2021-03-14 21:02:22.678 [CLSECHO(32219)]CRS-10001: 14-Mar-21 21:02 ACFS-9154: 正在加载 'oracleoks.ko' 驱动程序。
2021-03-14 21:02:22.809 [CLSECHO(32234)]CRS-10001: 14-Mar-21 21:02 ACFS-9154: 正在加载 'oracleadvm.ko' 驱动程序。
2021-03-14 21:02:22.892 [CLSECHO(32290)]CRS-10001: 14-Mar-21 21:02 ACFS-9154: 正在加载 'oracleacfs.ko' 驱动程序。
2021-03-14 21:02:23.054 [CLSECHO(32334)]CRS-10001: 14-Mar-21 21:02 ACFS-9327: 正在验证 ADVM/ACFS 设备。
2021-03-14 21:02:23.079 [CLSECHO(32336)]CRS-10001: 14-Mar-21 21:02 ACFS-9156: 正在检测控制设备 '/dev/asm/.asm_ctl_spec'。
2021-03-14 21:02:23.108 [CLSECHO(32340)]CRS-10001: 14-Mar-21 21:02 ACFS-9156: 正在检测控制设备 '/dev/ofsctl'。
2021-03-14 21:02:23.263 [CLSECHO(32346)]CRS-10001: 14-Mar-21 21:02 ACFS-9322: 已完成
2021-03-14 21:02:28.571 [CSSDMONITOR(32409)]CRS-8500: Oracle Clusterware CSSDMONITOR 进程以操作系统进程 ID 32409 开头
2021-03-14 21:02:28.756 [CSSDAGENT(32425)]CRS-8500: Oracle Clusterware CSSDAGENT 进程以操作系统进程 ID 32425 开头
2021-03-14 21:02:28.975 [OCSSD(32436)]CRS-8500: Oracle Clusterware OCSSD 进程以操作系统进程 ID 32436 开头
2021-03-14 21:02:30.072 [OCSSD(32436)]CRS-1713: CSSD 守护程序已在 hub 模式下启动
2021-03-14 21:02:46.185 [OCSSD(32436)]CRS-1707: 节点 rac1 (编号为 1) 的租约获取已完成
2021-03-14 21:02:47.337 [OCSSD(32436)]CRS-1605: CSSD 表决文件联机: ORCL:OCR3; 详细资料见
 /u01/app/gridbase/diag/crs/rac1/crs/trace/ocssd.trc。
2021-03-14 21:02:47.357 [OCSSD(32436)]CRS-1605: CSSD 表决文件联机: ORCL:OCR2; 详细资料见 
/u01/app/gridbase/diag/crs/rac1/crs/trace/ocssd.trc。
2021-03-14 21:02:47.365 [OCSSD(32436)]CRS-1605: CSSD 表决文件联机: ORCL:OCR1; 详细资料见 
/u01/app/gridbase/diag/crs/rac1/crs/trace/ocssd.trc。
2021-03-14 21:02:48.781 [OCSSD(32436)]CRS-1601: CSSD 重新配置完毕。活动节点为 rac1 rac2 。
2021-03-14 21:02:50.971 [OCTSSD(32591)]CRS-8500: Oracle Clusterware OCTSSD 进程以操作系统进程 ID 32591 开头
2021-03-14 21:02:51.938 [OCTSSD(32591)]CRS-2403: 主机 rac1 上的集群时间同步服务处于观察程序模式。
2021-03-14 21:02:52.140 [OCTSSD(32591)]CRS-2407: 新的集群时间同步服务引用节点为主机 rac2。
2021-03-14 21:02:52.140 [OCTSSD(32591)]CRS-2401: 已在主机 rac1 上启动了集群时间同步服务。
2021-03-14 21:02:52.167 [OCTSSD(32591)]CRS-2409: 主机 rac1 上的时钟与集群标准时间不同步。
由于集群时间同步服务正在以观察程序模式运行, 所以未采取任何操作。
2021-03-14 21:02:59.284 [ORAAGENT(32117)]CRS-5011: 检查资源 "ora.asm" 失败: 详细资料见 "(:CLSN00006:)" (
位于 "/u01/app/gridbase/diag/crs/rac1/crs/trace/ohasd_oraagent_grid.trc")
2021-03-14 21:03:01.486 [ORAAGENT(32117)]CRS-5011: 检查资源 "ora.asm" 失败: 详细资料见 "(:CLSN00006:)" (
位于 "/u01/app/gridbase/diag/crs/rac1/crs/trace/ohasd_oraagent_grid.trc")
2021-03-14 21:03:01.514 [ORAAGENT(32117)]CRS-5011: 检查资源 "ora.asm" 失败: 详细资料见 "(:CLSN00006:)" (
位于 "/u01/app/gridbase/diag/crs/rac1/crs/trace/ohasd_oraagent_grid.trc")
2021-03-14 21:03:18.163 [OCTSSD(32591)]CRS-2407: 新的集群时间同步服务引用节点为主机 rac1。
2021-03-14 21:03:19.406 [OCSSD(32436)]CRS-1625: 节点 rac2 (编号为 2) 已关闭
2021-03-14 21:03:19.419 [OCSSD(32436)]CRS-1601: CSSD 重新配置完毕。活动节点为 rac1 。
2021-03-14 21:03:24.916 [OSYSMOND(318)]CRS-8500: Oracle Clusterware OSYSMOND 进程以操作系统进程 ID 318 开头
2021-03-14 21:03:26.558 [CRSD(325)]CRS-8500: Oracle Clusterware CRSD 进程以操作系统进程 ID 325 开头
2021-03-14 21:03:27.750 [CRSD(325)]CRS-1012: 已在节点 rac1 上启动 OCR 服务。
2021-03-14 21:03:27.807 [CRSD(325)]CRS-1201: 已在节点 rac1 上启动 CRSD。
2021-03-14 21:03:28.470 [ORAAGENT(1027)]CRS-8500: Oracle Clusterware ORAAGENT 进程以操作系统进程 ID 1027 开头
2021-03-14 21:03:28.499 [ORAROOTAGENT(1031)]CRS-8500: Oracle Clusterware ORAROOTAGENT 进程以操作系统进程 ID 1031 开头
2021-03-14 21:03:28.515 [ORAAGENT(1036)]CRS-8500: Oracle Clusterware ORAAGENT 进程以操作系统进程 ID 1036 开头
2021-03-14 21:03:28.666 [ORAAGENT(1036)]CRS-5011: 检查资源 "oracledb" 失败: 详细资料见 "(:CLSN00007:)"
 (位于 "/u01/app/gridbase/diag/crs/rac1/crs/trace/crsd_oraagent_oracle.trc")
2021-03-14 21:03:30.649 [ORAAGENT(32117)]CRS-5011: 检查资源 "ora.asm" 失败: 详细资料见 "(:CLSN00006:)" 
(位于 "/u01/app/gridbase/diag/crs/rac1/crs/trace/ohasd_oraagent_grid.trc")
2021-03-14 21:03:30.718 [ORAAGENT(32117)]CRS-5011: 检查资源 "ora.asm" 失败: 详细资料见 "(:CLSN00006:)" 
(位于 "/u01/app/gridbase/diag/crs/rac1/crs/trace/ohasd_oraagent_grid.trc")
2021-03-14 21:03:30.722 [CRSD(325)]CRS-1024: 由于此节点上的 ASM 实例未处于活动状态, 此节点上的集群就绪服务终止。详细信息见 
(:PROCR00009:) (位于 /u01/app/gridbase/diag/crs/rac1/crs/trace/crsd.trc)。
2021-03-14 21:03:30.736 [ORAROOTAGENT(1031)]CRS-5822: 代理 '/u01/app/grid/12.1.0/bin/orarootagent_root' 已从服务器断开连接。
详细资料见 (:CRSAGF00117:) {0:3:3} (位于 /u01/app/gridbase/diag/crs/rac1/crs/trace/crsd_orarootagent_root.trc)。
2021-03-14 21:03:30.736 [ORAAGENT(1027)]CRS-5822: 代理 '/u01/app/grid/12.1.0/bin/oraagent_grid' 已从服务器断开连接。
详细资料见 (:CRSAGF00117:) {0:1:3} (位于 /u01/app/gridbase/diag/crs/rac1/crs/trace/crsd_oraagent_grid.trc)。
2021-03-14 21:03:30.793 [CRSD(1157)]CRS-8500: Oracle Clusterware CRSD 进程以操作系统进程 ID 1157 开头
2021-03-14 21:03:31.457 [OLOGGERD(1162)]CRS-8500: Oracle Clusterware OLOGGERD 进程以操作系统进程 ID 1162 开头
2021-03-14 21:03:31.798 [ORAAGENT(32117)]CRS-5011: 检查资源 "ora.asm" 失败: 详细资料见 "(:CLSN00006:)" 
(位于 "/u01/app/gridbase/diag/crs/rac1/crs/trace/ohasd_oraagent_grid.trc")
2021-03-14 21:03:31.823 [ORAAGENT(32117)]CRS-5011: 检查资源 "ora.asm" 失败: 详细资料见 "(:CLSN00006:)" 
(位于 "/u01/app/gridbase/diag/crs/rac1/crs/trace/ohasd_oraagent_grid.trc")
2021-03-14 21:03:40.234 [CRSD(1157)]CRS-1013: ASM 磁盘组中的 OCR 位置不可访问。
详细资料见 /u01/app/gridbase/diag/crs/rac1/crs/trace/crsd.trc。
2021-03-14 21:03:40.238 [CRSD(1157)]CRS-0804: 由于 Oracle 集群注册表错误 [PROC-26: 访问物理存储时出错
ORA-15077: 找不到提供所需磁盘组的 ASM 实例
], 集群就绪服务中止。详细资料见 (:CRSD00111:) (位于 /u01/app/gridbase/diag/crs/rac1/crs/trace/crsd.trc)。

从整个集群的启动过程看cssd,crs都起来了,然后等一会由于crs无法访问ocr磁盘组,导致异常.开始crs起来了,证明ocr磁盘组应该是mount成功过.后面看错误提示又无法访问了.根据经验以及ora.asm失败的提示,怀疑很可能是asm实例出现问题了.对于这样的情况,分析asm的alert日志是最好的方法.通过分析日志发现

Sun Mar 14 21:03:24 2021
NOTE: Instance updated compatible.asm to 12.1.0.0.0 for grp 1
Sun Mar 14 21:03:24 2021
SUCCESS: diskgroup ARCHLOG was mounted
Sun Mar 14 21:03:24 2021
NOTE: Instance updated compatible.asm to 12.1.0.0.0 for grp 2
Sun Mar 14 21:03:24 2021
SUCCESS: diskgroup DATA was mounted
Sun Mar 14 21:03:24 2021
NOTE: Instance updated compatible.asm to 12.1.0.0.0 for grp 3
Sun Mar 14 21:03:24 2021
SUCCESS: diskgroup OCR was mounted
Sun Mar 14 21:03:24 2021
NOTE: Instance updated compatible.asm to 12.1.0.0.0 for grp 5
Sun Mar 14 21:03:24 2021
SUCCESS: ALTER DISKGROUP ALL MOUNT /* asm agent call crs *//* {0:9:3} */
Sun Mar 14 21:03:24 2021
WARNING: failed to online diskgroup resource ora.ARCHLOG.dg (unable to communicate with CRSD/OHASD)
WARNING: failed to online diskgroup resource ora.DATA.dg (unable to communicate with CRSD/OHASD)
WARNING: failed to online diskgroup resource ora.OCR.dg (unable to communicate with CRSD/OHASD)
Errors in file /u01/app/gridbase/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_32721.trc  (incident=123423):
ORA-00600: internal error code, arguments: [kfdAuDealloc2], [ARCHLOG], [213], [410], [0], [], [], [], [], [], [], []
Incident details in: /u01/app/gridbase/diag/asm/+asm/+ASM1/incident/incdir_123423/+ASM1_rbal_32721_i123423.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sun Mar 14 21:03:25 2021
ERROR: An unrecoverable error has been identified in ASM metadata.
Sun Mar 14 21:03:27 2021
NOTE: [crsd.bin@rac1.schic.org (TNS V1-V3) 325] opening OCR file +OCR.255.4294967295
Starting background process ASMB
Sun Mar 14 21:03:27 2021
ASMB started with pid=28, OS id=932 
Sun Mar 14 21:03:27 2021
NOTE: ASMB registering with ASM instance as Standard client 0xffffffffffffffff (reg:3401595347) (new connection)
Sun Mar 14 21:03:27 2021
NOTE: Standard client +ASM1:+ASM:racscan registered, osid 934, mbr 0x0, asmb 932 (reg:3401595347)
Sun Mar 14 21:03:27 2021
NOTE: ASMB connected to ASM instance +ASM1 osid: 934 (Flex mode; client id 0xffffffffffffffff)
Sun Mar 14 21:03:28 2021
NOTE: AMDU dump of disk group ARCHLOG initiated at /u01/app/gridbase/diag/asm/+asm/+ASM1/incident/incdir_123423
ERROR: ORA-600 in COD recovery for diskgroup 1/0x730955f2 (ARCHLOG)
ERROR: ORA-600 thrown in RBAL for group number 1
Sun Mar 14 21:03:30 2021
Errors in file /u01/app/gridbase/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_32721.trc:
ORA-00600: internal error code, arguments: [kfdAuDealloc2], [ARCHLOG], [213], [410], [0], [], [], [], [], [], [], []
Sun Mar 14 21:03:30 2021
Errors in file /u01/app/gridbase/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_32721.trc:
ORA-00600: internal error code, arguments: [kfdAuDealloc2], [ARCHLOG], [213], [410], [0], [], [], [], [], [], [], []
USER (ospid: 32721): terminating the instance due to error 488
Sun Mar 14 21:03:30 2021
System state dump requested by (instance=1, osid=32721 (RBAL)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/gridbase/diag/asm/+asm/+ASM1/trace/+ASM1_diag_32691_20210314210330.trc
Sun Mar 14 21:03:30 2021
Dumping diagnostic data in directory=[cdmp_20210314210330], requested by (instance=1, osid=32721 (RBAL)), s
ummary=[abnormal instance termination].
Sun Mar 14 21:03:30 2021
Instance terminated by USER, pid = 32721

通过上述日志,果然发现ocr磁盘组先mount成功,然后asm实例由于ARCHLOG磁盘组的ORA-00600 kfdAuDealloc2错误而导致整个实例crash,从而使得ocr磁盘组无法被crs访问,从而出现了”CRS-0804: 由于 Oracle 集群注册表错误 [PROC-26: 访问物理存储时出错 ORA-15077: 找不到提供所需磁盘组的 ASM 实例], 集群就绪服务中止”这样的错误提示.进一步分析为什么archlog进程会报这个错误.

SQL> /* ASMCMD */alter diskgroup /*ASMCMD*/ "DATA" drop file '+DATA/xff/XIFENFEI.270.1040985885' 
Sun Mar 14 20:46:46 2021
SUCCESS: /* ASMCMD */alter diskgroup /*ASMCMD*/ "DATA" drop file '+DATA/xff/XIFENFEI.270.1040985885'
Sun Mar 14 20:49:24 2021
NOTE: Dropping directory '+archlog/oracledb/archivelog/2021_03_11' recursively
Sun Mar 14 20:49:24 2021
Errors in file /u01/app/gridbase/diag/asm/+asm/+ASM1/trace/+ASM1_ora_15281.trc  (incident=114015):
ORA-00600: internal error code, arguments: [kfdAuDealloc2], [ARCHLOG], [213], [410], [0], [], [], [], [], [], [], []
Incident details in: /u01/app/gridbase/diag/asm/+asm/+ASM1/incident/incdir_114015/+ASM1_ora_15281_i114015.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sun Mar 14 20:49:24 2021
ERROR: An unrecoverable error has been identified in ASM metadata.
NOTE:AMDU dump of disk group ARCHLOG initiated at/u01/app/gridbase/diag/asm/+asm/+ASM1/incident/incdir_114015
Sun Mar 14 20:49:28 2021
Errors in file /u01/app/gridbase/diag/asm/+asm/+ASM1/trace/+ASM1_ora_15281.trc  (incident=114016):
ORA-00600: internal error code, arguments: [kfdAuDealloc2], [ARCHLOG], [213], [410], [0], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kfdAuDealloc2], [ARCHLOG], [213], [410], [0], [], [], [], [], [], [], []

因为这个库有一个历史背景:几天前由于存储cache导致,数据库使用备份还原(还原到一个新磁盘组中,老磁盘组没有使用),今天估计是运维人员在清理老磁盘组中不要的文件,然后archlog中的归档日志的时候,清空了+archlog/oracledb/archivelog/2021_03_11中的文件,然后触发asm删除该目录的异常(异常原因估计和上次清理存储cache引起了该磁盘组的元数据异常有关).该故障的基本思路原因已经清楚:由于archlog磁盘组本身元数据库有问题,清理该磁盘组文件之后,引起该磁盘组删除空目录出发问题,从而使得整个asm 实例crash.进而引起crs异常.解决方法比较简单,因为archlog磁盘组本身已经不需要,直接dd掉磁盘头,让其启动的时候不再mount,故障解决

[grid@rac1 ~]$ crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCHLOG.dg
               ONLINE  OFFLINE      rac1                     STABLE
               ONLINE  OFFLINE      rac2                     STABLE
ora.DATA.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER1.lsnr
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.NEWDATA.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.OCR.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.asm
               ONLINE  ONLINE       rac1                     Started,STABLE
               ONLINE  ONLINE       rac2                     Started,STABLE
ora.net1.network
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.ons
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       rac1                     STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       rac1                     STABLE
ora.MGMTLSNR
      1        ONLINE  OFFLINE      rac2                     169.254.86.142 7.7.7
                                                             .1,STARTING
ora.cvu
      1        ONLINE  ONLINE       rac1                     STABLE
ora.mgmtdb
      1        ONLINE  OFFLINE                               Instance Shutdown,ST
                                                             ABLE
ora.oc4j
      1        ONLINE  OFFLINE      rac1                     STARTING
ora.xff.db
      1        ONLINE  OFFLINE      rac1                     STARTING
      2        ONLINE  OFFLINE      rac2                     STARTING
ora.rac1.vip
      1        ONLINE  ONLINE       rac1                     STABLE
ora.rac2.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       rac1                     STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       rac1                     STABLE
--------------------------------------------------------------------------------
[grid@rac1 ~]$ 

has a disk HB, but no network HB—-traceroute不通导致

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:has a disk HB, but no network HB—-traceroute不通导致

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户反馈集群有一个节点异常,检查发现crs进程异常
3


重启crs发现cssd进程无法正常启动
no-network-hb


明显私网异常,进一步分析发现私网相互可以ping,但是无法traceroute其他节点
traceroute-not-work
traceroute-not-work2


客户反馈近期安装了安全软件,客户停掉安全软件之后,traceroute恢复正常
1
2


集群也正常启动

[root@his01 cssd]# crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       his01
               ONLINE  ONLINE       his02
ora.FRA.dg
               ONLINE  ONLINE       his01
               ONLINE  ONLINE       his02
ora.LISTENER.lsnr
               ONLINE  ONLINE       his01
               ONLINE  ONLINE       his02
ora.OCRVOTE.dg
               ONLINE  ONLINE       his01
               ONLINE  ONLINE       his02
ora.asm
               ONLINE  ONLINE       his01                    Started
               ONLINE  ONLINE       his02                    Started
ora.gsd
               OFFLINE OFFLINE      his01
               OFFLINE OFFLINE      his02
ora.net1.network
               ONLINE  ONLINE       his01
               ONLINE  ONLINE       his02
ora.ons
               ONLINE  ONLINE       his01
               ONLINE  ONLINE       his02
ora.registry.acfs
               ONLINE  ONLINE       his01
               ONLINE  ONLINE       his02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       his02
ora.cvu
      1        ONLINE  ONLINE       his02
ora.his01.vip
      1        ONLINE  ONLINE       his01
ora.his02.vip
      1        ONLINE  ONLINE       his02
ora.oc4j
      1        ONLINE  ONLINE       his02
ora.orcl.db
      1        ONLINE  ONLINE       his01                    Open
      2        ONLINE  ONLINE       his02                    Open
ora.scan1.vip
      1        ONLINE  ONLINE       his02

12.1人工修改操作系统时间导致数据库异常

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:12.1人工修改操作系统时间导致数据库异常

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户数据库版本为12.1.0.1 版本RAC,突发发生重启,让协助分析原因
数据库alert日志报ORA-15064错误

Mon Apr 15 15:06:26 2019
WARNING: inbound connection timed out (ORA-3136)
Mon Apr 15 15:41:26 2019
NOTE: ASMB terminating
Mon Apr 15 15:41:26 2019
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_asmb_61426.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID:
Session ID: 1892 Serial number: 29
Mon Apr 15 15:41:26 2019
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_asmb_61426.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID:
Session ID: 1892 Serial number: 29
Mon Apr 15 15:41:26 2019
System state dump requested by (instance=1, osid=61426 (ASMB)), summary=[abnormal instance termination].
Mon Apr 15 15:41:26 2019
USER (ospid: 61426): terminating the instance due to error 15064
Mon Apr 15 15:41:26 2019
System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_61287.trc
Mon Apr 15 15:41:27 2019
opiodr aborting process unknown ospid (1171) as a result of ORA-1092
Mon Apr 15 15:41:27 2019
ORA-1092 : opitsk aborting process

这里看,明显asmb异常导致数据库无法正常访问asm从而出现数据库crash的问题.

分析asm日志

Mon Apr 15 15:41:26 2019
WARNING: client [+ASM1:+ASM] not responsive for 2069s; state=0x1. pid 23155
NOTE: umbilicus traces dumped to /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_gen0_23050.trc
WARNING: client [orcl1:orcl] not responsive for 2069s; state=0x1. killing pid 61436
NOTE: umbilicus traces dumped to /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_gen0_23050.trc
WARNING: fencing client [orcl1:orcl] after 2069 seconds (mbr 2)
WARNING: client [-MGMTDB:_mgmtdb] not responsive for 2070s; state=0x1. killing pid 24026
NOTE: umbilicus traces dumped to /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_gen0_23050.trc
WARNING: fencing client [-MGMTDB:_mgmtdb] after 2070 seconds (mbr 1)
Mon Apr 15 15:41:26 2019
NOTE: cleaned up ASM client -MGMTDB:_mgmtdb
NOTE: cleaned up ASM client orcl1:orcl
Mon Apr 15 15:41:43 2019
NOTE: Standard client -MGMTDB:_mgmtdb registered, osid 183707, mbr 0x1 (reg:1371965153)
Mon Apr 15 15:42:16 2019
NOTE: Standard client orcl1:orcl registered, osid 184063, mbr 0x2 (reg:2088418628)
Mon Apr 15 15:44:30 2019
Warning: VKTM detected a time drift.
Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.

asm日志中和mos中的GEN0 terminating the ASM instance due to error 15082 (文档 ID 2096988.1)描述比较匹配.根据客户反馈,他们使用ntp进行修改了时间,基本上可以确定是由于oracle的Bug 19032250(在12.1.0.2中修复)在ntp修改时间跨度过大触发的相关问题(人工直接修改时间也可能出现类似问题)

对于rac修改时间建议
1. 如果时间慢了,关闭数据库和集群直接把时间向前调整,启动集群和数据库
2. 如果时间快了,关闭数据库和集群等实际时间过关闭集群和库的时间之后,再往回调整时间,启动集群和数据库

私网直连后遗症:一节点无法启动导致另外节点haip无法启动

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:私网直连后遗症:一节点无法启动导致另外节点haip无法启动

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

该案例为两节点rac(11.2.0.4),private 网络使用直连方式,其中一个节点主机异常无法启动,另外一个节点集群启动发现haip无法正常启动

# crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
     1        ONLINE  ONLINE       xifenfei1                  Started
ora.cluster_interconnect.haip                                                      >>>>  OFFLINE
     1        ONLINE  OFFLINE
ora.crf
     1        ONLINE  ONLINE       xifenfei1
ora.crsd
     1        ONLINE  OFFLINE                                                      >>>>  OFFLINE
ora.cssd
     1        ONLINE  ONLINE       xifenfei1
ora.cssdmonitor
     1        ONLINE  ONLINE       xifenfei1
ora.ctssd
     1        ONLINE  ONLINE       xifenfei1                  OBSERVER
ora.diskmon
     1        OFFLINE OFFLINE
ora.drivers.acfs
     1        ONLINE  ONLINE       xifenfei1
ora.evmd
     1        ONLINE  INTERMEDIATE xifenfei1
ora.gipcd
     1        ONLINE  ONLINE       xifenfei1
ora.gpnpd
     1        ONLINE  ONLINE       xifenfei1
ora.mdnsd
     1        ONLINE  ONLINE       xifenfei1

alerthostname日志

2018-09-02 10:38:56.767:
[/u01/app/11.2.0/grid/bin/orarootagent.bin(7866)]CRS-5818:Aborted command 'start' for resource 'ora.cluster_interconnect.haip'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/agent/ohasd/orarootagent_root/orarootagent_root.log.
2018-09-02 10:39:00.771:
[ohasd(7495)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.cluster_interconnect.haip'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/ohasd/ohasd.log.
2018-09-02 10:40:00.802:
[/u01/app/11.2.0/grid/bin/orarootagent.bin(7866)]CRS-5818:Aborted command 'start' for resource 'ora.cluster_interconnect.haip'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/agent/ohasd/orarootagent_root/orarootagent_root.log.
2018-09-02 10:40:04.806:
[ohasd(7495)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.cluster_interconnect.haip'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/ohasd/ohasd.log.

orarootagent_root日志

2018-09-02 10:37:56.805: [ USRTHRD][3650455296]{0:0:2} No HAIP info configured in GPNP, using defaults
2018-09-02 10:37:56.805: [ USRTHRD][3650455296]{0:0:2} The final CIDR subnet 169.254/16
2018-09-02 10:37:56.805: [ default][3650455296]clsvactversion:4: Retrieving Active Version from local storage.
2018-09-02 10:37:56.809: [ USRTHRD][3650455296]{0:0:2} HAIP: mbr num is 0.
[   CLWAL][3650455296]clsw_Initialize: OLR initlevel [70000]
2018-09-02 10:37:56.843: [ USRTHRD][3650455296]{0:0:2} HAIP: initializing to 1 interfaces
2018-09-02 10:37:56.844: [ USRTHRD][3650455296]{0:0:2} HAIP: configured to use 1 interfaces

gipcd.log日志

2018-09-02 10:38:56.787: [ CLSINET][2477147904] Returning NETDATA: 0 interfaces
2018-09-02 10:38:56.988: [GIPCDCLT][2477147904] gipcdClientInterfaceRequest: sent local interface list back to client
2018-09-02 10:38:56.822: [GIPCHDEM][2468742912] gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0x1369730 [0000000000000010] { gipchaContext : host 'xifenfei1', name 'gipcd_ha_name', luid '184dd356-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2018-09-02 10:38:56.822: [GIPCDCLT][2477147904] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceRequest, endp 00000000000002cb
2018-09-02 10:38:56.822: [GIPCDCLT][2477147904] gipcdClientInterfaceRequest: Received type(gipcdmsgtypeInterfaceRequest), endp(00000000000002cb), len(1032), buf(0x7fab858b7a78):[hostname(xifenfei1), retStatus(gipcretSuccess)]
2018-09-02 10:38:56.822: [GIPCDCLT][2477147904] gipcdClientInterfaceQueryToMonitor: enqueue local interface query (2) to worklist
2018-09-02 10:38:56.823: [GIPCDCLT][2477147904] gipcdClientInterfaceRequest: sent local interface query
2018-09-02 10:38:56.823: [GIPCDMON][2472945408] gipcdMonitorCheckXfer: set new infQuery
2018-09-02 10:38:56.831: [ GIPCLIB][2477147904] gipclibSetTraceLevel: to set level to 0

ohasd.log日志

2018-09-02 10:38:52.494: [GIPCHDEM][1878710016]gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0x2749eb0 [0000000000000010] { gipchaContext : host 'xifenfei1', name 'CLSFRAME_oracler-cluster', luid '47624c02-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2018-09-02 10:38:57.255: [    AGFW][3305629440]{0:0:2} Received the reply to the message: RESOURCE_START[ora.cluster_interconnect.haip 1 1] ID 4098:502 from the agent /u01/app/11.2.0/grid/bin/orarootagent_root
2018-09-02 10:38:57.255: [    AGFW][3305629440]{0:0:2} Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.cluster_interconnect.haip 1 1] ID 4098:500
2018-09-02 10:38:57.255: [   CRSPE][3295123200]{0:0:2} Received reply to action [Start] message ID: 500
2018-09-02 10:38:57.256: [   CRSPE][3295123200]{0:0:2} Got agent-specific msg: CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error:
Start action for HAIP aborted. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0/grid/log/xifenfei1/agent/ohasd/orarootagent_root/orarootagent_root.log".
2018-09-02 10:38:57.500: [GIPCHDEM][1878710016]gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0x2749eb0 [0000000000000010] { gipchaContext : host 'xifenfei1', name 'CLSFRAME_oracler-cluster', luid '47624c02-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd

检查私网状态,发现eth2网络链路状态为down,由于网络直连,而另外一台机器无法启动

[root@xifenfei1 rules.d]# ethtool eth1
Settings for eth1:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supported pause frame use: Symmetric
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: Yes
        Speed: Unknown!
        Duplex: Unknown! (255)
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: Unknown
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000007 (7)
                               drv probe link
        Link detected: no   ====>网卡链路状态异常
[root@xifenfei1 rules.d]# ifconfig
eth0      Link encap:Ethernet  HWaddr 6C:92:BF:2B:7B:36
          inet addr:10.10.17.42  Bcast:172.17.17.255  Mask:255.255.255.0
          inet6 addr: fe80::6e92:bfff:fe2b:7b36/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1     --------->注意
          RX packets:234424 errors:0 dropped:0 overruns:0 frame:0
          TX packets:160916 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:16926236 (16.1 MiB)  TX bytes:24269882 (23.1 MiB)
          Memory:91160000-91180000
eth1      Link encap:Ethernet  HWaddr 6C:92:BF:2B:7B:37
          inet addr:11.1.1.2  Bcast:11.1.1.255  Mask:255.255.255.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1      --------->注意少了RUNNING
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Memory:91140000-91160000

关于网卡链路异常导致haip无法启动的mos描述请参考:CRSD & HAIP Resources Remain In OFFLINE as Private Network Interface is Partially Up (Doc ID 1529721.1).该案例是11.2集群私网使用直连引起的直接后遗症(非常不建议集群私网使用直连方式)

oracle rac 12.2 执行root.sh报CLSRSC-400

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:oracle rac 12.2 执行root.sh报CLSRSC-400

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在redhat 7.3版本中安装oracle rac 12.2的过程中,执行root.sh脚本的第14步的时候报如下错误,导致无法继续
CLSRSC-400: A system reboot is required to continue installing.
The command ‘/u01/app/grid/product/12.2.0/grid/perl/bin/perl -I/u01/app/grid/product/12.2.0/grid/perl/lib
-I/u01/app/grid/product/12.2.0/grid/crs/install /u01/app/grid/product/12.2.0/grid/crs/install/rootcrs.pl ‘ execution failed
os版本信息

[grid@xifenfei01 ~]$ more /etc/redhat-release
Red Hat Enterprise Linux Server release 7.3 (Maipo)
[grid@xifenfei01 ~]$ uname -a
Linux xifenfei01 3.10.0-514.el7.x86_64 #1 SMP Wed Oct 19 11:24:13 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

root.sh报错

[root@xifenfei01 ~]# /u01/app/grid/oraInventory/orainstRoot.sh
Changing permissions of /u01/app/grid/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.
Changing groupname of /u01/app/grid/oraInventory to oinstall.
The execution of the script is complete.
[root@xifenfei01 ~]# /u01/app/grid/product/12.2.0/grid/root.sh
Performing root user operation.
The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/grid/product/12.2.0/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...
Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Relinking oracle with rac_on option
Using configuration parameter file: /u01/app/grid/product/12.2.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/grid/grid_bash/crsdata/xifenfei01/crsconfig/rootcrs_xifenfei01_2017-06-11_09-52-55AM.log
2017/06/11 09:53:00 CLSRSC-594: Executing installation step 1 of 19: 'SetupTFA'.
2017/06/11 09:53:00 CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector.
2017/06/11 09:53:27 CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector.
2017/06/11 09:53:27 CLSRSC-594: Executing installation step 2 of 19: 'ValidateEnv'.
2017/06/11 09:53:30 CLSRSC-363: User ignored prerequisites during installation
2017/06/11 09:53:30 CLSRSC-594: Executing installation step 3 of 19: 'CheckFirstNode'.
2017/06/11 09:53:31 CLSRSC-594: Executing installation step 4 of 19: 'GenSiteGUIDs'.
2017/06/11 09:53:32 CLSRSC-594: Executing installation step 5 of 19: 'SaveParamFile'.
2017/06/11 09:53:37 CLSRSC-594: Executing installation step 6 of 19: 'SetupOSD'.
2017/06/11 09:53:38 CLSRSC-594: Executing installation step 7 of 19: 'CheckCRSConfig'.
2017/06/11 09:53:38 CLSRSC-594: Executing installation step 8 of 19: 'SetupLocalGPNP'.
2017/06/11 09:53:51 CLSRSC-594: Executing installation step 9 of 19: 'ConfigOLR'.
2017/06/11 09:53:56 CLSRSC-594: Executing installation step 10 of 19: 'ConfigCHMOS'.
2017/06/11 09:53:56 CLSRSC-594: Executing installation step 11 of 19: 'CreateOHASD'.
2017/06/11 09:54:00 CLSRSC-594: Executing installation step 12 of 19: 'ConfigOHASD'.
2017/06/11 09:54:15 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.service'
2017/06/11 09:54:44 CLSRSC-594: Executing installation step 13 of 19: 'InstallAFD'.
2017/06/11 09:54:48 CLSRSC-594: Executing installation step 14 of 19: 'InstallACFS'.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'xifenfei01'
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'xifenfei01' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
2017/06/11 09:55:15 CLSRSC-400: A system reboot is required to continue installing.
The command '/u01/app/grid/product/12.2.0/grid/perl/bin/perl -I/u01/app/grid/product/12.2.0/grid/perl/lib
-I/u01/app/grid/product/12.2.0/grid/crs/install /u01/app/grid/product/12.2.0/grid/crs/install/rootcrs.pl'execution failed

主要报错信息:
2017/06/11 09:55:15 CLSRSC-400: A system reboot is required to continue installing.
The command ‘/u01/app/grid/product/12.2.0/grid/perl/bin/perl -I/u01/app/grid/product/12.2.0/grid/perl/lib -I/u01/app/grid/product/12.2.0/grid/crs/install /u01/app/grid/product/12.2.0/grid/crs/install/rootcrs.pl ‘ execution failed
查询mos发下:ACFS Drivers Install reports CLSRSC-400: A system reboot is required to continue installing (Doc ID 2025056.1),主要是由于12c gi开始,acfs默认是安装的,由于acfs在redhat 7.3中不支持导致上述的错误信息.

[grid@xifenfei01 ~]$ acfsdriverstate -orahome $ORACLE_HOME supported
ACFS-9459: ADVM/ACFS is not supported on this OS version: '3.10.0-514.el7.x86_64'
ACFS-9201: Not Supported

处理方法
停掉crs,kill 进程(如果有不能停掉的,通过kill处理),执行root.sh

[root@xifenfei01 ~]# /u01/app/grid/product/12.2.0/grid/bin/crsctl status res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        OFFLINE OFFLINE                               STABLE
ora.cluster_interconnect.haip
      1        OFFLINE OFFLINE                               STABLE
ora.crf
      1        OFFLINE OFFLINE                               STABLE
ora.crsd
      1        OFFLINE OFFLINE                               STABLE
ora.cssd
      1        OFFLINE OFFLINE                               STABLE
ora.cssdmonitor
      1        OFFLINE OFFLINE                               STABLE
ora.ctssd
      1        OFFLINE OFFLINE                               STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.drivers.acfs
      1        OFFLINE OFFLINE                               STABLE
ora.evmd
      1        OFFLINE OFFLINE                               STABLE
ora.gipcd
      1        OFFLINE OFFLINE                               STABLE
ora.gpnpd
      1        OFFLINE OFFLINE                               STABLE
ora.mdnsd
      1        OFFLINE OFFLINE                               STABLE
ora.storage
      1        OFFLINE OFFLINE                               STABLE
--------------------------------------------------------------------------------
[root@xifenfei01 ~]# /u01/app/grid/product/12.2.0/grid/bin/crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'xifenfei01'
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'xifenfei02' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@xifenfei02 ~]# ps -ef|grep d.bin
root      29155  11754  0 10:46 pts/0    00:00:00 grep --color=auto d.bin
[root@xifenfei01 ~]# /u01/app/grid/product/12.2.0/grid/root.sh
Performing root user operation.
The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/grid/product/12.2.0/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Relinking oracle with rac_on option
Using configuration parameter file: /u01/app/grid/product/12.2.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/grid/grid_bash/crsdata/xifenfei01/crsconfig/rootcrs_xifenfei01_2017-06-11_10-33-57AM.log
2017/06/11 10:33:59 CLSRSC-594: Executing installation step 1 of 19: 'SetupTFA'.
2017/06/11 10:33:59 CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector.
2017/06/11 10:34:00 CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector.
2017/06/11 10:34:00 CLSRSC-594: Executing installation step 2 of 19: 'ValidateEnv'.
2017/06/11 10:34:01 CLSRSC-363: User ignored prerequisites during installation
2017/06/11 10:34:01 CLSRSC-594: Executing installation step 3 of 19: 'CheckFirstNode'.
2017/06/11 10:34:02 CLSRSC-594: Executing installation step 4 of 19: 'GenSiteGUIDs'.
2017/06/11 10:34:02 CLSRSC-594: Executing installation step 5 of 19: 'SaveParamFile'.
2017/06/11 10:34:03 CLSRSC-594: Executing installation step 6 of 19: 'SetupOSD'.
2017/06/11 10:34:04 CLSRSC-594: Executing installation step 7 of 19: 'CheckCRSConfig'.
2017/06/11 10:34:04 CLSRSC-594: Executing installation step 8 of 19: 'SetupLocalGPNP'.
2017/06/11 10:34:06 CLSRSC-594: Executing installation step 9 of 19: 'ConfigOLR'.
2017/06/11 10:34:06 CLSRSC-594: Executing installation step 10 of 19: 'ConfigCHMOS'.
2017/06/11 10:34:53 CLSRSC-594: Executing installation step 11 of 19: 'CreateOHASD'.
2017/06/11 10:34:54 CLSRSC-594: Executing installation step 12 of 19: 'ConfigOHASD'.
2017/06/11 10:35:09 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.service'
2017/06/11 10:35:31 CLSRSC-594: Executing installation step 13 of 19: 'InstallAFD'.
2017/06/11 10:35:33 CLSRSC-594: Executing installation step 14 of 19: 'InstallACFS'.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.evmd' on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'xifenfei01'
CRS-2677: Stop of 'ora.mdnsd' on 'xifenfei01' succeeded
CRS-2677: Stop of 'ora.evmd' on 'xifenfei01' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'xifenfei01' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'xifenfei01'
CRS-2677: Stop of 'ora.gipcd' on 'xifenfei01' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'xifenfei01' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
2017/06/11 10:35:57 CLSRSC-594: Executing installation step 15 of 19: 'InstallKA'.
2017/06/11 10:36:01 CLSRSC-594: Executing installation step 16 of 19: 'InitConfig'.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'xifenfei01'
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'xifenfei01' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.evmd' on 'xifenfei01'
CRS-2672: Attempting to start 'ora.mdnsd' on 'xifenfei01'
CRS-2676: Start of 'ora.mdnsd' on 'xifenfei01' succeeded
CRS-2676: Start of 'ora.evmd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'xifenfei01'
CRS-2676: Start of 'ora.gpnpd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'xifenfei01'
CRS-2672: Attempting to start 'ora.gipcd' on 'xifenfei01'
CRS-2676: Start of 'ora.cssdmonitor' on 'xifenfei01' succeeded
CRS-2676: Start of 'ora.gipcd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'xifenfei01'
CRS-2672: Attempting to start 'ora.diskmon' on 'xifenfei01'
CRS-2676: Start of 'ora.diskmon' on 'xifenfei01' succeeded
CRS-2676: Start of 'ora.cssd' on 'xifenfei01' succeeded
Disk groups created successfully. Check /u01/app/grid/grid_bash/cfgtoollogs/asmca/asmca-170611AM103637.log for details.
2017/06/11 10:37:40 CLSRSC-482: Running command: '/u01/app/grid/product/12.2.0/grid/bin/ocrconfig -upgrade grid oinstall'
CRS-2672: Attempting to start 'ora.crf' on 'xifenfei01'
CRS-2672: Attempting to start 'ora.storage' on 'xifenfei01'
CRS-2676: Start of 'ora.storage' on 'xifenfei01' succeeded
CRS-2676: Start of 'ora.crf' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'xifenfei01'
CRS-2676: Start of 'ora.crsd' on 'xifenfei01' succeeded
CRS-4256: Updating the profile
Successful addition of voting disk 49af246c7d2e4f5dbf0d9ea09cc047d5.
Successfully replaced voting disk group with +DATA.
CRS-4256: Updating the profile
CRS-4266: Voting file(s) successfully replaced
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   49af246c7d2e4f5dbf0d9ea09cc047d5 (/dev/mapper/data1) [DATA]
Located 1 voting disk(s).
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.crsd' on 'xifenfei01'
CRS-2677: Stop of 'ora.crsd' on 'xifenfei01' succeeded
CRS-2673: Attempting to stop 'ora.storage' on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.crf' on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'xifenfei01'
CRS-2677: Stop of 'ora.crf' on 'xifenfei01' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'xifenfei01' succeeded
CRS-2677: Stop of 'ora.storage' on 'xifenfei01' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'xifenfei01'
CRS-2677: Stop of 'ora.mdnsd' on 'xifenfei01' succeeded
CRS-2677: Stop of 'ora.asm' on 'xifenfei01' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'xifenfei01'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'xifenfei01' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.evmd' on 'xifenfei01'
CRS-2677: Stop of 'ora.ctssd' on 'xifenfei01' succeeded
CRS-2677: Stop of 'ora.evmd' on 'xifenfei01' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'xifenfei01'
CRS-2677: Stop of 'ora.cssd' on 'xifenfei01' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'xifenfei01'
CRS-2677: Stop of 'ora.gipcd' on 'xifenfei01' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'xifenfei01' has completed
CRS-4133: Oracle High Availability Services has been stopped.
2017/06/11 10:38:40 CLSRSC-594: Executing installation step 17 of 19: 'StartCluster'.
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'xifenfei01'
CRS-2672: Attempting to start 'ora.evmd' on 'xifenfei01'
CRS-2676: Start of 'ora.mdnsd' on 'xifenfei01' succeeded
CRS-2676: Start of 'ora.evmd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'xifenfei01'
CRS-2676: Start of 'ora.gpnpd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'xifenfei01'
CRS-2676: Start of 'ora.gipcd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'xifenfei01'
CRS-2674: Start of 'ora.drivers.acfs' on 'xifenfei01' failed
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'xifenfei01'
CRS-2676: Start of 'ora.cssdmonitor' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'xifenfei01'
CRS-2672: Attempting to start 'ora.diskmon' on 'xifenfei01'
CRS-2676: Start of 'ora.diskmon' on 'xifenfei01' succeeded
CRS-2676: Start of 'ora.cssd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'xifenfei01'
CRS-2672: Attempting to start 'ora.ctssd' on 'xifenfei01'
CRS-2676: Start of 'ora.ctssd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'xifenfei01'
CRS-2674: Start of 'ora.drivers.acfs' on 'xifenfei01' failed
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'xifenfei01'
CRS-2676: Start of 'ora.asm' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'xifenfei01'
CRS-2676: Start of 'ora.storage' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'xifenfei01'
CRS-2676: Start of 'ora.crf' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'xifenfei01'
CRS-2676: Start of 'ora.crsd' on 'xifenfei01' succeeded
CRS-6023: Starting Oracle Cluster Ready Services-managed resources
CRS-6017: Processing resource auto-start for servers: xifenfei01
CRS-6016: Resource auto-start has completed for server xifenfei01
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
2017/06/11 10:40:23 CLSRSC-343: Successfully started Oracle Clusterware stack
2017/06/11 10:40:23 CLSRSC-594: Executing installation step 18 of 19: 'ConfigNode'.
CRS-2672: Attempting to start 'ora.ASMNET1LSNR_ASM.lsnr' on 'xifenfei01'
CRS-2676: Start of 'ora.ASMNET1LSNR_ASM.lsnr' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'xifenfei01'
CRS-2676: Start of 'ora.asm' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.DATA.dg' on 'xifenfei01'
CRS-2676: Start of 'ora.DATA.dg' on 'xifenfei01' succeeded
2017/06/11 10:42:19 CLSRSC-594: Executing installation step 19 of 19: 'PostConfig'.
2017/06/11 10:43:16 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster ... succeeded

其他剩余节点也是类似处理,最终跳过acfs安装成功

[grid@xifenfei01 ~]$ crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       xifenfei01               STABLE
               ONLINE  ONLINE       xifenfei02               STABLE
ora.DATA.dg
               ONLINE  ONLINE       xifenfei01               STABLE
               ONLINE  ONLINE       xifenfei02               STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       xifenfei01               STABLE
               ONLINE  ONLINE       xifenfei02               STABLE
ora.chad
               ONLINE  ONLINE       xifenfei01               STABLE
               ONLINE  ONLINE       xifenfei02               STABLE
ora.net1.network
               ONLINE  ONLINE       xifenfei01               STABLE
               ONLINE  ONLINE       xifenfei02               STABLE
ora.ons
               ONLINE  ONLINE       xifenfei01               STABLE
               ONLINE  ONLINE       xifenfei02               STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xifenfei01               STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       xifenfei01               169.254.20.214 192.1
                                                             68.1.20 192.168.2.20
                                                             ,STABLE
ora.asm
      1        ONLINE  ONLINE       xifenfei01               Started,STABLE
      2        ONLINE  ONLINE       xifenfei02               Started,STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       xifenfei01               STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       xifenfei01               Open,STABLE
ora.qosmserver
      1        ONLINE  ONLINE       xifenfei01               STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       xifenfei01               STABLE
ora.xifenfei01.vip
      1        ONLINE  ONLINE       xifenfei01               STABLE
ora.xifenfei02.vip
      1        ONLINE  ONLINE       xifenfei02               STABLE
--------------------------------------------------------------------------------

最新官方处理方案:CLSRSC-400: A system reboot is required to continue installing.

crfclust.bdb文件过大处理

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:crfclust.bdb文件过大处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

grid所在目录空间满

[root@wldb01 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdw2              40G   22G   17G  58% /
tmpfs                  16G  219M   16G   2% /dev/shm
/dev/sdw1              50G   46G  1.5G  97% /u01

使用find命令找出来大文件

[root@wldb01 grid]# find ./ type f -size +1024M
./crf/db/wldb01/crfclust.bdb

由于文件crfclust.bdb是Cluster Health Monitor (CHM) file,他的默认大小是1G,但是有在一些平台和版本中由于bug原因导致过大.
Oracle Cluster Health Monitor (CHM) using large amount of space (more than default) (Doc ID 1343105.1)
Bug 20186278 – crfclust.bdb Becomes Huge Size Due to Sudden Retention Change (Doc ID 20186278.8)

[grid@wldb01 ~]$ /u01/app/11.2.0/grid/bin/oclumon manage -get reppath
CHM Repository Path = /u01/app/11.2.0/grid/crf/db/wldb01
 Done
[root@wldb01 ~]# cd  /u01/app/11.2.0/grid/crf/db/wldb01
[root@wldb01 wldb01]# du -sh
24G     .
[root@wldb01 wldb01]# ls -lhtr
total 24G
-rw-r-----. 1 root root  16M Mar 25 21:38 log.0000047847
-rw-r-----. 1 root root 8.0K Mar 25 21:38 repdhosts.bdb
-rw-r-----. 1 root root  24K Mar 25 21:39 __db.001
-rw-r--r--. 1 root root 115M Mar 25 21:39 wldb01.ldb
-rw-r-----. 1 root root 8.0K Mar 25 21:40 crfconn.bdb
-rw-r-----. 1 root root 329M Mar 25 21:52 crfts.bdb
-rw-r-----. 1 root root 508M Mar 25 21:53 crfloclts.bdb
-rw-r-----. 1 root root  22G Mar 25 21:53 crfclust.bdb
-rw-r-----. 1 root root 392K Mar 25 21:53 __db.002
-rw-r-----. 1 root root  16M Mar 25 21:53 log.0000047848
-rw-r-----. 1 root root 504M Mar 25 21:53 crfhosts.bdb
-rw-r-----. 1 root root 650M Mar 25 21:53 crfcpu.bdb
-rw-r-----. 1 root root 534M Mar 25 21:53 crfalert.bdb
-rw-r-----. 1 root root  56K Mar 25 21:53 __db.006
-rw-r-----. 1 root root 1.2M Mar 25 21:53 __db.005
-rw-r-----. 1 root root 2.1M Mar 25 21:53 __db.004
-rw-r-----. 1 root root 2.6M Mar 25 21:53 __db.003

清理bdb文件

[root@wldb01 wldb01]# /u01/app/11.2.0/grid/bin/crsctl stop res ora.crf -init
CRS-2673: Attempting to stop 'ora.crf' on 'wldb01'
CRS-2677: Stop of 'ora.crf' on 'wldb01' succeeded
[root@wldb01 wldb01]# /u01/app/11.2.0/grid/bin/crsctl status res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       wldb01                   Started
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       wldb01
ora.crf
      1        OFFLINE OFFLINE
ora.crsd
      1        ONLINE  ONLINE       wldb01
ora.cssd
      1        ONLINE  ONLINE       wldb01
ora.cssdmonitor
      1        ONLINE  ONLINE       wldb01
ora.ctssd
      1        ONLINE  ONLINE       wldb01                   ACTIVE:0
ora.diskmon
      1        OFFLINE OFFLINE
ora.drivers.acfs
      1        ONLINE  ONLINE       wldb01
ora.evmd
      1        ONLINE  ONLINE       wldb01
ora.gipcd
      1        ONLINE  ONLINE       wldb01
ora.gpnpd
      1        ONLINE  ONLINE       wldb01
ora.mdnsd
      1        ONLINE  ONLINE       wldb01
[root@wldb01 wldb01]# rm -rf *.bdb
[root@wldb01 wldb01]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdw2              40G   22G   17G  58% /
tmpfs                  16G  219M   16G   2% /dev/shm
/dev/sdw1              50G   22G   26G  46% /u01
[root@wldb01 wldb01]# du -sh
53M     .
[root@wldb01 wldb01]# /u01/app/11.2.0/grid/bin/crsctl start res ora.crf -init
CRS-2672: Attempting to start 'ora.crf' on 'wldb01'
CRS-2676: Start of 'ora.crf' on 'wldb01' succeeded
[root@wldb01 wldb01]# /u01/app/11.2.0/grid/bin/crsctl status res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       wldb01                   Started
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       wldb01
ora.crf
      1        ONLINE  ONLINE       wldb01
ora.crsd
      1        ONLINE  ONLINE       wldb01
ora.cssd
      1        ONLINE  ONLINE       wldb01
ora.cssdmonitor
      1        ONLINE  ONLINE       wldb01
ora.ctssd
      1        ONLINE  ONLINE       wldb01                   ACTIVE:0
ora.diskmon
      1        OFFLINE OFFLINE
ora.drivers.acfs
      1        ONLINE  ONLINE       wldb01
ora.evmd
      1        ONLINE  ONLINE       wldb01
ora.gipcd
      1        ONLINE  ONLINE       wldb01
ora.gpnpd
      1        ONLINE  ONLINE       wldb01
ora.mdnsd
      1        ONLINE  ONLINE       wldb01

修改11.2 RAC 的 SCAN IP

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:修改11.2 RAC 的 SCAN IP

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在某些情况下,由于是C/S架构,从以前的单机系统迁移到现在11.2的rac中,如果修改客户端ip地址工作量太大,而且也不现实,一般建议直接修改scan ip地址和以前一样,从而实现业务直接访问scan ip实现应用不用一个个单独配置.这里通过简单演示,实现修改scan ip的过程(网段不变),主要是把scan名字为scan-xff的ip地址从192.168.137.245修改为192.168.137.248
查看当前scan ip信息

[root-www.xifenfei.com@xff1 ~]# ping xff-scan
PING xff-scan (192.168.137.245) 56(84) bytes of data.
^C
--- xff-scan ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1738ms
[root-www.xifenfei.com@xff1 ~]# crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
ora.LISTENER.lsnr
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
ora.asm
               ONLINE  ONLINE       xff1                     Started
               ONLINE  ONLINE       xff2                     Started
ora.gsd
               OFFLINE OFFLINE      xff1
               OFFLINE OFFLINE      xff2
ora.net1.network
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
ora.ons
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xff1
ora.cvu
      1        ONLINE  ONLINE       xff1
ora.oc4j
      1        ONLINE  ONLINE       xff1
ora.scan1.vip
      1        ONLINE  ONLINE       xff1
ora.xff1.vip
      1        ONLINE  ONLINE       xff1
ora.xff2.vip
      1        ONLINE  ONLINE       xff2
ora.xffdb.db
      1        ONLINE  ONLINE       xff1                     Open
      2        ONLINE  ONLINE       xff2                     Open
[root-www.xifenfei.com@xff1 ~]# srvctl config scan
SCAN name: xff-scan, Network: 1/192.168.137.0/255.255.255.0/eth0
SCAN VIP name: scan1, IP: /xff-scan/192.168.137.245

修改scan ip

[root-www.xifenfei.com@xff1 ~]# srvctl stop scan_listener
[root-www.xifenfei.com@xff1 ~]# srvctl stop scan
[root-www.xifenfei.com@xff1 ~]# crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
ora.LISTENER.lsnr
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
ora.asm
               ONLINE  ONLINE       xff1                     Started
               ONLINE  ONLINE       xff2                     Started
ora.gsd
               OFFLINE OFFLINE      xff1
               OFFLINE OFFLINE      xff2
ora.net1.network
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
ora.ons
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        OFFLINE OFFLINE
ora.cvu
      1        ONLINE  ONLINE       xff1
ora.oc4j
      1        ONLINE  ONLINE       xff1
ora.scan1.vip
      1        OFFLINE OFFLINE
ora.xff1.vip
      1        ONLINE  ONLINE       xff1
ora.xff2.vip
      1        ONLINE  ONLINE       xff2
ora.xffdb.db
      1        ONLINE  ONLINE       xff1                     Open
      2        ONLINE  ONLINE       xff2                     Open
--如果是dns,注意修改dns中scan ip信息,如果是hosts文件注意多个节点都需要修改
[root-www.xifenfei.com@xff1 ~]# ping xff-scan
PING xff-scan (192.168.137.248) 56(84) bytes of data.
^C
--- xff-scan ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1738ms
[root-www.xifenfei.com@xff1 ~]# srvctl modify scan -n xff-scan
[root-www.xifenfei.com@xff1 ~]# srvctl config scan
SCAN name: xff-scan, Network: 1/192.168.137.0/255.255.255.0/eth0
SCAN VIP name: scan1, IP: /xff-scan/192.168.137.248
[root-www.xifenfei.com@xff1 ~]# srvctl start scan
[root-www.xifenfei.com@xff1 ~]# srvctl start scan_listener
[root-www.xifenfei.com@xff1 ~]# crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
ora.LISTENER.lsnr
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
ora.asm
               ONLINE  ONLINE       xff1                     Started
               ONLINE  ONLINE       xff2                     Started
ora.gsd
               OFFLINE OFFLINE      xff1
               OFFLINE OFFLINE      xff2
ora.net1.network
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
ora.ons
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xff2
ora.cvu
      1        ONLINE  ONLINE       xff1
ora.oc4j
      1        ONLINE  ONLINE       xff1
ora.scan1.vip
      1        ONLINE  ONLINE       xff2
ora.xff1.vip
      1        ONLINE  ONLINE       xff1
ora.xff2.vip
      1        ONLINE  ONLINE       xff2
ora.xffdb.db
      1        ONLINE  ONLINE       xff1                     Open
      2        ONLINE  ONLINE       xff2                     Open

查看修改后的scan listener状态

xff2:/home/grid> lsnrctl status LISTENER_SCAN1
LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 12-MAR-2016 17:02:32
Copyright (c) 1991, 2013, Oracle.  All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_SCAN1
Version                   TNSLSNR for Linux: Version 11.2.0.4.0 - Production
Start Date                12-MAR-2016 16:59:05
Uptime                    0 days 0 hr. 3 min. 27 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/oracle/app/grid/network/admin/listener.ora
Listener Log File         /u01/oracle/app/grid/log/diag/tnslsnr/xff2/listener_scan1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.137.248)(PORT=1521)))
The listener supports no services
The command completed successfully
[root-www.xifenfei.com@xff2 ~]# su - oracle
xff2:/home/oracle> sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Sat Mar 12 17:01:11 2016
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> alter system register;
System altered.
SQL> /
System altered.
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
xff2:/home/oracle> exit
logout
[root-www.xifenfei.com@xff2 ~]# su - grid
xff2:/home/grid> lsnrctl status LISTENER_SCAN1
LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 12-MAR-2016 17:01:24
Copyright (c) 1991, 2013, Oracle.  All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_SCAN1
Version                   TNSLSNR for Linux: Version 11.2.0.4.0 - Production
Start Date                12-MAR-2016 16:59:05
Uptime                    0 days 0 hr. 2 min. 18 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/oracle/app/grid/network/admin/listener.ora
Listener Log File         /u01/oracle/app/grid/log/diag/tnslsnr/xff2/listener_scan1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.137.248)(PORT=1521)))
The listener supports no services
The command completed successfully

这里发现修改scan ip之后,scan listener没有正确或者到监听的动态注册信息,哪怕人工执行了alter system register;也不行.

通过重启数据库,解决修改scan ip后的动态监听注册问题

[root-www.xifenfei.com@xff2 ~]# su - oracle
xff2:/home/oracle> srvctl stop database -d xffdb
xff2:/home/oracle> srvctl start database -d xffdb
xff2:/home/oracle> exit
logout
[root-www.xifenfei.com@xff2 ~]# su - grid
xff2:/home/grid> lsnrctl status LISTENER_SCAN1
LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 12-MAR-2016 17:06:17
Copyright (c) 1991, 2013, Oracle.  All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_SCAN1
Version                   TNSLSNR for Linux: Version 11.2.0.4.0 - Production
Start Date                12-MAR-2016 16:59:05
Uptime                    0 days 0 hr. 7 min. 11 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/oracle/app/grid/network/admin/listener.ora
Listener Log File         /u01/oracle/app/grid/log/diag/tnslsnr/xff2/listener_scan1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.137.248)(PORT=1521)))
Services Summary...
Service "xffdb" has 2 instance(s).
  Instance "xffdb1", status READY, has 1 handler(s) for this service...
  Instance "xffdb2", status READY, has 1 handler(s) for this service...
Service "xffdbXDB" has 2 instance(s).
  Instance "xffdb1", status READY, has 1 handler(s) for this service...
  Instance "xffdb2", status READY, has 1 handler(s) for this service...
The command completed successfully

ora.crf资源异常—临时停止和禁用

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ora.crf资源异常—临时停止和禁用

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

检查发现运行在win 2008平台的11.2.0.3 rac的crs的alert日志里面出现大量类似记录
CRS-2765错误

2015-09-04 00:12:10.431
[ohasd(3844)]CRS-2765:资源 'ora.crf' 已失败 (在服务器 'rac2' 上)。
2015-09-04 00:16:46.047
[ohasd(3844)]CRS-2765:资源 'ora.crf' 已失败 (在服务器 'rac2' 上)。
2015-09-04 00:21:21.479
[ohasd(3844)]CRS-2765:资源 'ora.crf' 已失败 (在服务器 'rac2' 上)。
2015-09-04 00:25:57.365
[ohasd(3844)]CRS-2765:资源 'ora.crf' 已失败 (在服务器 'rac2' 上)。

查看crfmond.log日志发现类似记录

2015-09-04 00:07:35.607: [    GPNP][19080] clsgpnp_getCachedProfileEx: [at clsgpnp.c:613] Result: (26)
 CLSGPNP_NO_PROFILE. Can't get offline GPnP service profile: local gpnpd is up and running. Use getProfile instead.
2015-09-04 00:07:35.607: [    GPNP][19080] clsgpnp_getCachedProfileEx: [at clsgpnp.c:623] Result:
(26) CLSGPNP_NO_PROFILE. Failed to get offline GPnP service profile.
2015-09-04 00:07:35.732: [ CRFMOND][19080]Sysmond coming up...
2015-09-04 00:07:35.732: [ CRFMOND][19080]Failed to load init file ret=1
2015-09-04 00:07:35.732: [ CRFMOND][19080]OSD error: op="scrfosm_loadInitFile" loc="read fail1"
other="crfhome="D:\app\11.2.0\grid" and gipath="D:\app\11.2.0\grid\crf\admin\crf.ora"" dep="2"
2015-09-04 00:07:37.095: [ COMMCRS][19696]clsc_send_msg: (00000000058C98E0) NS err (12571, 12560), transport (533, 57, 0)
[  clsdmc][19676]Fail to connect (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=61022)) with status 9
[  clsdmt][19712]Listening to (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=61022))
2015-09-04 00:07:37.201: [  clsdmt][19712]PID for the Process [19672], connkey 5
2015-09-04 00:07:37.201: [  clsdmt][19712]Creating PID [19672] file for home D:\app\11.2.0\grid
host rac2 bin osysmond to D:\app\11.2.0\grid\osysmond\init\
2015-09-04 00:07:37.202: [  clsdmt][19712]Writing PID [19672] to the file [D:\app\11.2.0\grid\osysmond\init\rac2.pid]
2015-09-04 00:07:37.734: [ CRFMOND][19676]mond_init: clsdms init successful
[   CLWAL][19676]clsw_Initialize: OLR initlevel [70000]
2015-09-04 00:12:10.050: [    GPNP][19676] clsgpnp_getCachedProfileEx: [at clsgpnp.c:613] Result: (26)
CLSGPNP_NO_PROFILE. Can't get offline GPnP service profile: local gpnpd is up and running. Use getProfile instead.
2015-09-04 00:12:10.051: [    GPNP][19676] clsgpnp_getCachedProfileEx: [at clsgpnp.c:623] Result:
(26) CLSGPNP_NO_PROFILE. Failed to get offline GPnP service profile.
2015-09-04 00:12:10.197: [ CRFMOND][19676]Sysmond coming up...
2015-09-04 00:12:10.197: [ CRFMOND][19676]Failed to load init file ret=1
2015-09-04 00:12:10.197: [ CRFMOND][19676]OSD error: op="scrfosm_loadInitFile" loc="read fail1"
other="crfhome="D:\app\11.2.0\grid" and gipath="D:\app\11.2.0\grid\crf\admin\crf.ora"" dep="2"
2015-09-04 00:12:11.557: [ COMMCRS][18376]clsc_send_msg: (00000000059498E0) NS err (12571, 12560), transport (533, 57, 0)

查询mos发现匹配文章Windows: CRS-2765:Resource ‘ora.crf’ has failed on server (文档 ID 1480263.1),从文中说明看是由于unpublished bug 14010695导致该问题,给出来建议是打psu到最新,但是升级psu需要停机窗口。临时想通过禁用ora.crf资源的方式来解决,在禁用该资源之前,我们先看下该资源的用途,确定是否可以禁用。

ora.crf用途
资源对应的功能是CHM.Cluster Health Monitor(以下简称CHM)是一个Oracle提供的工具,用来自动收集操作系统的资源(CPU、内存、SWAP、进程、I/O以及网络等)的使用情况。CHM会每秒收集一次数据。这些系统资源数据对于诊断集群系统的节点重启、Hang、实例驱逐(Eviction)、性能问题等是非常有帮助的。另外,用户可以使用CHM来及早发现一些系统负载高、内存异常等问题,从而避免产生更严重的问题。CHM会自动安装在下面的软件:
11.2.0.2 及更高版本的 Oracle Grid Infrastructure for Linux (不包括Linux Itanium) 、Solaris (Sparc 64 和 x86-64)
11.2.0.3 及更高版本 Oracle Grid Infrastructure for AIX 、 Windows (不包括Windows Itanium)。
根据上述描述可知ora.crf资源主要是用来收集信息的,而且在11.2.0.2之后才有,因此可以停止并禁用它

停止ora.crf资源

C:\Users\Administrator>crsctl status res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       rac2                     Started
ora.crf
      1        ONLINE  ONLINE       rac2
ora.crsd
      1        ONLINE  ONLINE       rac2
ora.cssd
      1        ONLINE  ONLINE       rac2
ora.cssdmonitor
      1        ONLINE  ONLINE       rac2
ora.ctssd
      1        ONLINE  ONLINE       rac2                     OBSERVER
ora.drivers.acfs
      1        ONLINE  ONLINE       rac2
ora.evmd
      1        ONLINE  ONLINE       rac2
ora.gipcd
      1        ONLINE  ONLINE       rac2
ora.gpnpd
      1        ONLINE  ONLINE       rac2
ora.mdnsd
      1        ONLINE  ONLINE       rac2
C:\Users\Administrator>crsctl stop res ora.crf -init
CRS-2673: 尝试停止 'ora.crf' (在 'rac2' 上)
CRS-2677: 成功停止 'ora.crf' (在 'rac2' 上)
C:\Users\Administrator>crsctl status res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       rac2                     Started
ora.crf
      1        OFFLINE OFFLINE
ora.crsd
      1        ONLINE  ONLINE       rac2
ora.cssd
      1        ONLINE  ONLINE       rac2
ora.cssdmonitor
      1        ONLINE  ONLINE       rac2
ora.ctssd
      1        ONLINE  ONLINE       rac2                     OBSERVER
ora.drivers.acfs
      1        ONLINE  ONLINE       rac2
ora.evmd
      1        ONLINE  ONLINE       rac2
ora.gipcd
      1        ONLINE  ONLINE       rac2
ora.gpnpd
      1        ONLINE  ONLINE       rac2
ora.mdnsd
      1        ONLINE  ONLINE       rac2

禁用ora.crf资源

C:\Users\Administrator>crsctl stat res ora.crf -init
NAME=ora.crf
TYPE=ora.crf.type
TARGET=OFFLINE
STATE=OFFLINE
C:\Users\Administrator>crsctl modify resource "ora.crf" -attr "AUTO_START=0" -init
C:\Users\Administrator>crsctl stat res ora.crf -init
NAME=ora.crf
TYPE=ora.crf.type
TARGET=OFFLINE
STATE=OFFLINE

多cpu环境中运行root.sh失败,asm报ORA-04031

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:多cpu环境中运行root.sh失败,asm报ORA-04031

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友和我反馈,说他们在装linux 6.5上面装11.2.0.3的rac出现异常,root.sh在第一个节点执行就失败了,请求帮助
root.sh-asm-fail


根据上面记录,查看asmca日志

[main] [ 2015-07-24 12:49:35.885 CST ] [SQLEngine.reInitialize:738]  Reinitializing SQLEngine...
[main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:889]  OracleHome.getVersion called.  Current Version: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:957]  Current Version From Inventory: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:889]  OracleHome.getVersion called.  Current Version: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:957]  Current Version From Inventory: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:889]  OracleHome.getVersion called.  Current Version: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:957]  Current Version From Inventory: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.886 CST ] [SQLPlusEngine.getCmmdParams:222]  m_home 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.887 CST ] [SQLPlusEngine.getCmmdParams:223]  version > 112 true
[main] [ 2015-07-24 12:49:35.887 CST ] [SQLEngine.getEnvParams:555]  Default NLS_LANG: AMERICAN_AMERICA.AL32UTF8
[main] [ 2015-07-24 12:49:35.887 CST ] [SQLEngine.getEnvParams:565]  NLS_LANG: AMERICAN_AMERICA.AL32UTF8
[main] [ 2015-07-24 12:49:35.888 CST ] [SQLEngine.initialize:325]  Execing SQLPLUS/SVRMGR process...
[main] [ 2015-07-24 12:49:35.900 CST ] [SQLEngine.initialize:362]  m_bReaderStarted: false
[main] [ 2015-07-24 12:49:35.900 CST ] [SQLEngine.initialize:366]  Starting Reader Thread...
[main] [ 2015-07-24 12:49:35.901 CST ] [SQLEngine.initialize:415]  Waiting for m_bReaderStarted to be true
[main] [ 2015-07-24 12:49:35.972 CST ] [SQLEngine.done:2189]  Done called
[main] [ 2015-07-24 12:49:35.972 CST ] [UsmcaLogger.logException:173]  SEVERE:method oracle.sysman.assistants.usmca.backend.USMInstance:configureLocalASM
[main] [ 2015-07-24 12:49:35.973 CST ] [UsmcaLogger.logException:174]  ORA-01012: not logged on
[main] [ 2015-07-24 12:49:35.973 CST ] [UsmcaLogger.logException:175]  oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-01012: not logged on
oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeImpl(SQLEngine.java:1658)
oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeQuery(SQLEngine.java:831)
oracle.sysman.assistants.usmca.backend.USMInstance.configureLocalASM(USMInstance.java:3036)
oracle.sysman.assistants.usmca.service.UsmcaService.configureLocalASM(UsmcaService.java:1049)
oracle.sysman.assistants.usmca.model.UsmcaModel.performConfigureLocalASM(UsmcaModel.java:944)
oracle.sysman.assistants.usmca.model.UsmcaModel.performOperation(UsmcaModel.java:797)
oracle.sysman.assistants.usmca.Usmca.execute(Usmca.java:174)
oracle.sysman.assistants.usmca.Usmca.main(Usmca.java:369)
[main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:173]  SEVERE:method oracle.sysman.assistants.usmca.backend.USMInstance:configureLocalASM
[main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:174]  ORA-03113: end-of-file on communication channel
[main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:175]  oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-03113: end-of-file on communication channel

这里可以看出来,asm实例无法登陆(ORA-01012和ORA-03113),根据这样的错误,分析asm日志

Reconfiguration complete
Fri Jul 24 12:49:29 2015
LCK0 started with pid=22, OS id=46913
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lmd0_46887.trc  (incident=81):
ORA-04031: unable to allocate 7072 bytes of shared memory ("shared pool","unknown object","sga heap(1,1)","ges resource ")
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_81/+ASM1_lmd0_46887_i81.trc
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lck0_46913.trc  (incident=177):
ORA-04031: unable to allocate 760 bytes of shared memory ("shared pool","unknown object","KKSSP^1343","kglss")
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_177/+ASM1_lck0_46913_i177.trc
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lmon_46885.trc  (incident=73):
ORA-04031: unable to allocate 632 bytes of shared memory ("shared pool","unknown object","sga heap(1,1)","name-service ")
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_73/+ASM1_lmon_46885_i73.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lck0_46913.trc:
ORA-04031: unable to allocate 760 bytes of shared memory ("shared pool","unknown object","KKSSP^1343","kglss")
System state dump requested by (instance=1, osid=46913 (LCK0)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_diag_46879.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
LCK0 (ospid: 46913): terminating the instance due to error 4031
Fri Jul 24 12:49:35 2015
ORA-1092 : opitsk aborting process
Instance terminated by LCK0, pid = 46913

进一步分析asm日志,发现是大家熟悉的asm的ORA-4031问题,那就是说明数据库在执行root.sh的时候使用默认参数文件启动asm的时候shared pool不够大(根据ORACLE最佳实践,建议memory_target=1536M及其以上值),从而出现该问题。类似Bug 14292825 ORA-4031 in ASM as default memory parameters values for 11.2 ASM instances low,根据官方描述该问题在11.2.0.4中修复
BUG-14292825


通过asm日志发现相关默认值配置

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options.
ORACLE_HOME = /u01/app/11.2.0/grid
System name:	Linux
Node name:	RAC01
Release:	2.6.32-358.el6.x86_64
Version:	#1 SMP Tue Jan 29 11:47:41 EST 2013
Machine:	x86_64
Using parameter settings in client-side pfile /u01/app/11.2.0/grid/dbs/init+ASM1.ora on machine RAC01
System parameters with non-default values:
  large_pool_size          = 16M
  instance_type            = "asm"
  remote_login_passwordfile= "EXCLUSIVE"
  asm_power_limit          = 1
  diagnostic_dest          = "/u01/app/grid"
Cluster communication is configured to use the following interface(s) for this instance
  10.10.10.31
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Fri Jul 24 12:49:27 2015

通过查询/proc/cpuinfo,检查cpu数量

processor	: 191
vendor_id	: GenuineIntel
cpu family	: 6
model		: 62
model name	: Intel(R) Xeon(R) CPU E7-8850 v2 @ 2.30GHz
stepping	: 7
cpu MHz		: 1200.000
cache size	: 24576 KB
physical id	: 7
siblings	: 24
core id		: 13
cpu cores	: 12
apicid		: 251
initial apicid	: 251
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes

而根据How To Determine The Default Number Of Subpools Allocated During Startup (Doc ID 455179.1)中描述
最多7个subpool(这里一共有192个cpu,因此subpool数量为7)
1


每个suppool最少512m内存,因此shared pool最小需要3.5G(而默认值几百M,远远不够)
2


由于cpu多,导致shared pool的Subpools 更加多,使得shared pool的需求量更加大。至此本次故障原因可以总结:
由于cpu较多,需要更多的shared pool,而11.2.0.3中由于asm默认内存分配较少,导致在asm启动之时出现shared pool不足(本身默认值小,而且shared pool需求大,从而出现了ORA-04031就不奇怪了),因为运行root.sh过程中asm无法正常启动,从而使得root.sh运行失败。
处理办法:临时disable部分cpu,然后重新执行root.sh,修改asm内存分配,再enable cpu.
特别说明:此故障acs的兄弟遇到过,所以这次我能够快速反应,感谢acs兄弟们的帮忙,另外有权限的朋友可以看看:3-10479952701和3-7976215751等sr描述