联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
客户由于光纤链路故障导致表决盘异常从而使得主机重启,主机重启之后,集群没有正常启动
操作系统和crs版本
[root@rac1 ~]# cat /etc/redhat-release CentOS release 6.9 (Final) [root@rac1 ~]# sqlplus -v SQL*Plus: Release 11.2.0.4.0 Production
人工启动crs hang住一段时间然后报错
[root@rac1 ~]# crsctl start crs CRS-4640: Oracle High Availability Services is already active CRS-4000: Command Start failed, or completed with errors.
查看启动进程
[grid@rac1 ~]$ ps -ef|grep d.bin root 7043 1 0 11:48 ? 00:00:00 /u01/app/grid/product/11.2.0/bin/ohasd.bin reboot root 8311 1 0 11:53 ? 00:00:00 /u01/app/grid/product/11.2.0/bin/ohasd.bin reboot grid 10984 10954 0 12:10 pts/2 00:00:00 grep d.bin
根据经验这个故障很可能就是BUG:17229230 – DURING REBOOT, “OHASD.BIN REBOOT” REMAINS SLEEPING,临时解决方案,一个会话启动crs,然后在另外一个会话发起
/bin/dd if=/var/tmp/.oracle/npohasd of=/dev/null bs=1024 count=1
后续crs启动正常
[root@rac1 ~]# crsctl start crs CRS-4123: Oracle High Availability Services has been started. [root@rac1 ~]# crsctl status res -t -init -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm 1 ONLINE OFFLINE Instance Shutdown ora.cluster_interconnect.haip 1 ONLINE OFFLINE ora.crf 1 ONLINE ONLINE rac1 ora.crsd 1 ONLINE OFFLINE ora.cssd 1 ONLINE OFFLINE STARTING ora.cssdmonitor 1 ONLINE ONLINE rac1 ora.ctssd 1 ONLINE OFFLINE ora.diskmon 1 OFFLINE OFFLINE ora.evmd 1 ONLINE OFFLINE ora.gipcd 1 ONLINE ONLINE rac1 ora.gpnpd 1 ONLINE ONLINE rac1 ora.mdnsd 1 ONLINE ONLINE rac1
终止dd命令,集群启动正常