私网直连后遗症:一节点无法启动导致另外节点haip无法启动

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:私网直连后遗症:一节点无法启动导致另外节点haip无法启动

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

该案例为两节点rac(11.2.0.4),private 网络使用直连方式,其中一个节点主机异常无法启动,另外一个节点集群启动发现haip无法正常启动

# crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
     1        ONLINE  ONLINE       xifenfei1                  Started
ora.cluster_interconnect.haip                                                      >>>>  OFFLINE
     1        ONLINE  OFFLINE
ora.crf
     1        ONLINE  ONLINE       xifenfei1
ora.crsd
     1        ONLINE  OFFLINE                                                      >>>>  OFFLINE
ora.cssd
     1        ONLINE  ONLINE       xifenfei1
ora.cssdmonitor
     1        ONLINE  ONLINE       xifenfei1
ora.ctssd
     1        ONLINE  ONLINE       xifenfei1                  OBSERVER
ora.diskmon
     1        OFFLINE OFFLINE
ora.drivers.acfs
     1        ONLINE  ONLINE       xifenfei1
ora.evmd
     1        ONLINE  INTERMEDIATE xifenfei1
ora.gipcd
     1        ONLINE  ONLINE       xifenfei1
ora.gpnpd
     1        ONLINE  ONLINE       xifenfei1
ora.mdnsd
     1        ONLINE  ONLINE       xifenfei1

alerthostname日志

2018-09-02 10:38:56.767:
[/u01/app/11.2.0/grid/bin/orarootagent.bin(7866)]CRS-5818:Aborted command 'start' for resource 'ora.cluster_interconnect.haip'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/agent/ohasd/orarootagent_root/orarootagent_root.log.
2018-09-02 10:39:00.771:
[ohasd(7495)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.cluster_interconnect.haip'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/ohasd/ohasd.log.
2018-09-02 10:40:00.802:
[/u01/app/11.2.0/grid/bin/orarootagent.bin(7866)]CRS-5818:Aborted command 'start' for resource 'ora.cluster_interconnect.haip'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/agent/ohasd/orarootagent_root/orarootagent_root.log.
2018-09-02 10:40:04.806:
[ohasd(7495)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.cluster_interconnect.haip'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/ohasd/ohasd.log.

orarootagent_root日志

2018-09-02 10:37:56.805: [ USRTHRD][3650455296]{0:0:2} No HAIP info configured in GPNP, using defaults
2018-09-02 10:37:56.805: [ USRTHRD][3650455296]{0:0:2} The final CIDR subnet 169.254/16
2018-09-02 10:37:56.805: [ default][3650455296]clsvactversion:4: Retrieving Active Version from local storage.
2018-09-02 10:37:56.809: [ USRTHRD][3650455296]{0:0:2} HAIP: mbr num is 0.
[   CLWAL][3650455296]clsw_Initialize: OLR initlevel [70000]
2018-09-02 10:37:56.843: [ USRTHRD][3650455296]{0:0:2} HAIP: initializing to 1 interfaces
2018-09-02 10:37:56.844: [ USRTHRD][3650455296]{0:0:2} HAIP: configured to use 1 interfaces

gipcd.log日志

2018-09-02 10:38:56.787: [ CLSINET][2477147904] Returning NETDATA: 0 interfaces
2018-09-02 10:38:56.988: [GIPCDCLT][2477147904] gipcdClientInterfaceRequest: sent local interface list back to client
2018-09-02 10:38:56.822: [GIPCHDEM][2468742912] gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0x1369730 [0000000000000010] { gipchaContext : host 'xifenfei1', name 'gipcd_ha_name', luid '184dd356-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2018-09-02 10:38:56.822: [GIPCDCLT][2477147904] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceRequest, endp 00000000000002cb
2018-09-02 10:38:56.822: [GIPCDCLT][2477147904] gipcdClientInterfaceRequest: Received type(gipcdmsgtypeInterfaceRequest), endp(00000000000002cb), len(1032), buf(0x7fab858b7a78):[hostname(xifenfei1), retStatus(gipcretSuccess)]
2018-09-02 10:38:56.822: [GIPCDCLT][2477147904] gipcdClientInterfaceQueryToMonitor: enqueue local interface query (2) to worklist
2018-09-02 10:38:56.823: [GIPCDCLT][2477147904] gipcdClientInterfaceRequest: sent local interface query
2018-09-02 10:38:56.823: [GIPCDMON][2472945408] gipcdMonitorCheckXfer: set new infQuery
2018-09-02 10:38:56.831: [ GIPCLIB][2477147904] gipclibSetTraceLevel: to set level to 0

ohasd.log日志

2018-09-02 10:38:52.494: [GIPCHDEM][1878710016]gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0x2749eb0 [0000000000000010] { gipchaContext : host 'xifenfei1', name 'CLSFRAME_oracler-cluster', luid '47624c02-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2018-09-02 10:38:57.255: [    AGFW][3305629440]{0:0:2} Received the reply to the message: RESOURCE_START[ora.cluster_interconnect.haip 1 1] ID 4098:502 from the agent /u01/app/11.2.0/grid/bin/orarootagent_root
2018-09-02 10:38:57.255: [    AGFW][3305629440]{0:0:2} Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.cluster_interconnect.haip 1 1] ID 4098:500
2018-09-02 10:38:57.255: [   CRSPE][3295123200]{0:0:2} Received reply to action [Start] message ID: 500
2018-09-02 10:38:57.256: [   CRSPE][3295123200]{0:0:2} Got agent-specific msg: CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error:
Start action for HAIP aborted. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0/grid/log/xifenfei1/agent/ohasd/orarootagent_root/orarootagent_root.log".
2018-09-02 10:38:57.500: [GIPCHDEM][1878710016]gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0x2749eb0 [0000000000000010] { gipchaContext : host 'xifenfei1', name 'CLSFRAME_oracler-cluster', luid '47624c02-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd

检查私网状态,发现eth2网络链路状态为down,由于网络直连,而另外一台机器无法启动

[root@xifenfei1 rules.d]# ethtool eth1
Settings for eth1:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supported pause frame use: Symmetric
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: Yes
        Speed: Unknown!
        Duplex: Unknown! (255)
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: Unknown
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000007 (7)
                               drv probe link
        Link detected: no   ====>网卡链路状态异常
[root@xifenfei1 rules.d]# ifconfig
eth0      Link encap:Ethernet  HWaddr 6C:92:BF:2B:7B:36
          inet addr:10.10.17.42  Bcast:172.17.17.255  Mask:255.255.255.0
          inet6 addr: fe80::6e92:bfff:fe2b:7b36/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1     --------->注意
          RX packets:234424 errors:0 dropped:0 overruns:0 frame:0
          TX packets:160916 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:16926236 (16.1 MiB)  TX bytes:24269882 (23.1 MiB)
          Memory:91160000-91180000
eth1      Link encap:Ethernet  HWaddr 6C:92:BF:2B:7B:37
          inet addr:11.1.1.2  Bcast:11.1.1.255  Mask:255.255.255.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1      --------->注意少了RUNNING
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Memory:91140000-91160000

关于网卡链路异常导致haip无法启动的mos描述请参考:CRSD & HAIP Resources Remain In OFFLINE as Private Network Interface is Partially Up (Doc ID 1529721.1).该案例是11.2集群私网使用直连引起的直接后遗症(非常不建议集群私网使用直连方式)

oracle rac 12.2 执行root.sh报CLSRSC-400

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:oracle rac 12.2 执行root.sh报CLSRSC-400

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在redhat 7.3版本中安装oracle rac 12.2的过程中,执行root.sh脚本的第14步的时候报如下错误,导致无法继续
CLSRSC-400: A system reboot is required to continue installing.
The command ‘/u01/app/grid/product/12.2.0/grid/perl/bin/perl -I/u01/app/grid/product/12.2.0/grid/perl/lib
-I/u01/app/grid/product/12.2.0/grid/crs/install /u01/app/grid/product/12.2.0/grid/crs/install/rootcrs.pl ‘ execution failed
os版本信息

[grid@xifenfei01 ~]$ more /etc/redhat-release
Red Hat Enterprise Linux Server release 7.3 (Maipo)
[grid@xifenfei01 ~]$ uname -a
Linux xifenfei01 3.10.0-514.el7.x86_64 #1 SMP Wed Oct 19 11:24:13 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

root.sh报错

[root@xifenfei01 ~]# /u01/app/grid/oraInventory/orainstRoot.sh
Changing permissions of /u01/app/grid/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.
Changing groupname of /u01/app/grid/oraInventory to oinstall.
The execution of the script is complete.
[root@xifenfei01 ~]# /u01/app/grid/product/12.2.0/grid/root.sh
Performing root user operation.
The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/grid/product/12.2.0/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...
Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Relinking oracle with rac_on option
Using configuration parameter file: /u01/app/grid/product/12.2.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/grid/grid_bash/crsdata/xifenfei01/crsconfig/rootcrs_xifenfei01_2017-06-11_09-52-55AM.log
2017/06/11 09:53:00 CLSRSC-594: Executing installation step 1 of 19: 'SetupTFA'.
2017/06/11 09:53:00 CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector.
2017/06/11 09:53:27 CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector.
2017/06/11 09:53:27 CLSRSC-594: Executing installation step 2 of 19: 'ValidateEnv'.
2017/06/11 09:53:30 CLSRSC-363: User ignored prerequisites during installation
2017/06/11 09:53:30 CLSRSC-594: Executing installation step 3 of 19: 'CheckFirstNode'.
2017/06/11 09:53:31 CLSRSC-594: Executing installation step 4 of 19: 'GenSiteGUIDs'.
2017/06/11 09:53:32 CLSRSC-594: Executing installation step 5 of 19: 'SaveParamFile'.
2017/06/11 09:53:37 CLSRSC-594: Executing installation step 6 of 19: 'SetupOSD'.
2017/06/11 09:53:38 CLSRSC-594: Executing installation step 7 of 19: 'CheckCRSConfig'.
2017/06/11 09:53:38 CLSRSC-594: Executing installation step 8 of 19: 'SetupLocalGPNP'.
2017/06/11 09:53:51 CLSRSC-594: Executing installation step 9 of 19: 'ConfigOLR'.
2017/06/11 09:53:56 CLSRSC-594: Executing installation step 10 of 19: 'ConfigCHMOS'.
2017/06/11 09:53:56 CLSRSC-594: Executing installation step 11 of 19: 'CreateOHASD'.
2017/06/11 09:54:00 CLSRSC-594: Executing installation step 12 of 19: 'ConfigOHASD'.
2017/06/11 09:54:15 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.service'
2017/06/11 09:54:44 CLSRSC-594: Executing installation step 13 of 19: 'InstallAFD'.
2017/06/11 09:54:48 CLSRSC-594: Executing installation step 14 of 19: 'InstallACFS'.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'xifenfei01'
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'xifenfei01' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
2017/06/11 09:55:15 CLSRSC-400: A system reboot is required to continue installing.
The command '/u01/app/grid/product/12.2.0/grid/perl/bin/perl -I/u01/app/grid/product/12.2.0/grid/perl/lib
-I/u01/app/grid/product/12.2.0/grid/crs/install /u01/app/grid/product/12.2.0/grid/crs/install/rootcrs.pl'execution failed

主要报错信息:
2017/06/11 09:55:15 CLSRSC-400: A system reboot is required to continue installing.
The command ‘/u01/app/grid/product/12.2.0/grid/perl/bin/perl -I/u01/app/grid/product/12.2.0/grid/perl/lib -I/u01/app/grid/product/12.2.0/grid/crs/install /u01/app/grid/product/12.2.0/grid/crs/install/rootcrs.pl ‘ execution failed
查询mos发下:ACFS Drivers Install reports CLSRSC-400: A system reboot is required to continue installing (Doc ID 2025056.1),主要是由于12c gi开始,acfs默认是安装的,由于acfs在redhat 7.3中不支持导致上述的错误信息.

[grid@xifenfei01 ~]$ acfsdriverstate -orahome $ORACLE_HOME supported
ACFS-9459: ADVM/ACFS is not supported on this OS version: '3.10.0-514.el7.x86_64'
ACFS-9201: Not Supported

处理方法
停掉crs,kill 进程(如果有不能停掉的,通过kill处理),执行root.sh

[root@xifenfei01 ~]# /u01/app/grid/product/12.2.0/grid/bin/crsctl status res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        OFFLINE OFFLINE                               STABLE
ora.cluster_interconnect.haip
      1        OFFLINE OFFLINE                               STABLE
ora.crf
      1        OFFLINE OFFLINE                               STABLE
ora.crsd
      1        OFFLINE OFFLINE                               STABLE
ora.cssd
      1        OFFLINE OFFLINE                               STABLE
ora.cssdmonitor
      1        OFFLINE OFFLINE                               STABLE
ora.ctssd
      1        OFFLINE OFFLINE                               STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.drivers.acfs
      1        OFFLINE OFFLINE                               STABLE
ora.evmd
      1        OFFLINE OFFLINE                               STABLE
ora.gipcd
      1        OFFLINE OFFLINE                               STABLE
ora.gpnpd
      1        OFFLINE OFFLINE                               STABLE
ora.mdnsd
      1        OFFLINE OFFLINE                               STABLE
ora.storage
      1        OFFLINE OFFLINE                               STABLE
--------------------------------------------------------------------------------
[root@xifenfei01 ~]# /u01/app/grid/product/12.2.0/grid/bin/crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'xifenfei01'
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'xifenfei02' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@xifenfei02 ~]# ps -ef|grep d.bin
root      29155  11754  0 10:46 pts/0    00:00:00 grep --color=auto d.bin
[root@xifenfei01 ~]# /u01/app/grid/product/12.2.0/grid/root.sh
Performing root user operation.
The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/grid/product/12.2.0/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Relinking oracle with rac_on option
Using configuration parameter file: /u01/app/grid/product/12.2.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/grid/grid_bash/crsdata/xifenfei01/crsconfig/rootcrs_xifenfei01_2017-06-11_10-33-57AM.log
2017/06/11 10:33:59 CLSRSC-594: Executing installation step 1 of 19: 'SetupTFA'.
2017/06/11 10:33:59 CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector.
2017/06/11 10:34:00 CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector.
2017/06/11 10:34:00 CLSRSC-594: Executing installation step 2 of 19: 'ValidateEnv'.
2017/06/11 10:34:01 CLSRSC-363: User ignored prerequisites during installation
2017/06/11 10:34:01 CLSRSC-594: Executing installation step 3 of 19: 'CheckFirstNode'.
2017/06/11 10:34:02 CLSRSC-594: Executing installation step 4 of 19: 'GenSiteGUIDs'.
2017/06/11 10:34:02 CLSRSC-594: Executing installation step 5 of 19: 'SaveParamFile'.
2017/06/11 10:34:03 CLSRSC-594: Executing installation step 6 of 19: 'SetupOSD'.
2017/06/11 10:34:04 CLSRSC-594: Executing installation step 7 of 19: 'CheckCRSConfig'.
2017/06/11 10:34:04 CLSRSC-594: Executing installation step 8 of 19: 'SetupLocalGPNP'.
2017/06/11 10:34:06 CLSRSC-594: Executing installation step 9 of 19: 'ConfigOLR'.
2017/06/11 10:34:06 CLSRSC-594: Executing installation step 10 of 19: 'ConfigCHMOS'.
2017/06/11 10:34:53 CLSRSC-594: Executing installation step 11 of 19: 'CreateOHASD'.
2017/06/11 10:34:54 CLSRSC-594: Executing installation step 12 of 19: 'ConfigOHASD'.
2017/06/11 10:35:09 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.service'
2017/06/11 10:35:31 CLSRSC-594: Executing installation step 13 of 19: 'InstallAFD'.
2017/06/11 10:35:33 CLSRSC-594: Executing installation step 14 of 19: 'InstallACFS'.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.evmd' on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'xifenfei01'
CRS-2677: Stop of 'ora.mdnsd' on 'xifenfei01' succeeded
CRS-2677: Stop of 'ora.evmd' on 'xifenfei01' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'xifenfei01' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'xifenfei01'
CRS-2677: Stop of 'ora.gipcd' on 'xifenfei01' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'xifenfei01' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
2017/06/11 10:35:57 CLSRSC-594: Executing installation step 15 of 19: 'InstallKA'.
2017/06/11 10:36:01 CLSRSC-594: Executing installation step 16 of 19: 'InitConfig'.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'xifenfei01'
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'xifenfei01' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.evmd' on 'xifenfei01'
CRS-2672: Attempting to start 'ora.mdnsd' on 'xifenfei01'
CRS-2676: Start of 'ora.mdnsd' on 'xifenfei01' succeeded
CRS-2676: Start of 'ora.evmd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'xifenfei01'
CRS-2676: Start of 'ora.gpnpd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'xifenfei01'
CRS-2672: Attempting to start 'ora.gipcd' on 'xifenfei01'
CRS-2676: Start of 'ora.cssdmonitor' on 'xifenfei01' succeeded
CRS-2676: Start of 'ora.gipcd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'xifenfei01'
CRS-2672: Attempting to start 'ora.diskmon' on 'xifenfei01'
CRS-2676: Start of 'ora.diskmon' on 'xifenfei01' succeeded
CRS-2676: Start of 'ora.cssd' on 'xifenfei01' succeeded
Disk groups created successfully. Check /u01/app/grid/grid_bash/cfgtoollogs/asmca/asmca-170611AM103637.log for details.
2017/06/11 10:37:40 CLSRSC-482: Running command: '/u01/app/grid/product/12.2.0/grid/bin/ocrconfig -upgrade grid oinstall'
CRS-2672: Attempting to start 'ora.crf' on 'xifenfei01'
CRS-2672: Attempting to start 'ora.storage' on 'xifenfei01'
CRS-2676: Start of 'ora.storage' on 'xifenfei01' succeeded
CRS-2676: Start of 'ora.crf' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'xifenfei01'
CRS-2676: Start of 'ora.crsd' on 'xifenfei01' succeeded
CRS-4256: Updating the profile
Successful addition of voting disk 49af246c7d2e4f5dbf0d9ea09cc047d5.
Successfully replaced voting disk group with +DATA.
CRS-4256: Updating the profile
CRS-4266: Voting file(s) successfully replaced
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   49af246c7d2e4f5dbf0d9ea09cc047d5 (/dev/mapper/data1) [DATA]
Located 1 voting disk(s).
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.crsd' on 'xifenfei01'
CRS-2677: Stop of 'ora.crsd' on 'xifenfei01' succeeded
CRS-2673: Attempting to stop 'ora.storage' on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.crf' on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'xifenfei01'
CRS-2677: Stop of 'ora.crf' on 'xifenfei01' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'xifenfei01' succeeded
CRS-2677: Stop of 'ora.storage' on 'xifenfei01' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'xifenfei01'
CRS-2677: Stop of 'ora.mdnsd' on 'xifenfei01' succeeded
CRS-2677: Stop of 'ora.asm' on 'xifenfei01' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'xifenfei01'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'xifenfei01' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.evmd' on 'xifenfei01'
CRS-2677: Stop of 'ora.ctssd' on 'xifenfei01' succeeded
CRS-2677: Stop of 'ora.evmd' on 'xifenfei01' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'xifenfei01'
CRS-2677: Stop of 'ora.cssd' on 'xifenfei01' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'xifenfei01'
CRS-2677: Stop of 'ora.gipcd' on 'xifenfei01' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'xifenfei01' has completed
CRS-4133: Oracle High Availability Services has been stopped.
2017/06/11 10:38:40 CLSRSC-594: Executing installation step 17 of 19: 'StartCluster'.
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'xifenfei01'
CRS-2672: Attempting to start 'ora.evmd' on 'xifenfei01'
CRS-2676: Start of 'ora.mdnsd' on 'xifenfei01' succeeded
CRS-2676: Start of 'ora.evmd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'xifenfei01'
CRS-2676: Start of 'ora.gpnpd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'xifenfei01'
CRS-2676: Start of 'ora.gipcd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'xifenfei01'
CRS-2674: Start of 'ora.drivers.acfs' on 'xifenfei01' failed
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'xifenfei01'
CRS-2676: Start of 'ora.cssdmonitor' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'xifenfei01'
CRS-2672: Attempting to start 'ora.diskmon' on 'xifenfei01'
CRS-2676: Start of 'ora.diskmon' on 'xifenfei01' succeeded
CRS-2676: Start of 'ora.cssd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'xifenfei01'
CRS-2672: Attempting to start 'ora.ctssd' on 'xifenfei01'
CRS-2676: Start of 'ora.ctssd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'xifenfei01'
CRS-2674: Start of 'ora.drivers.acfs' on 'xifenfei01' failed
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'xifenfei01'
CRS-2676: Start of 'ora.asm' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'xifenfei01'
CRS-2676: Start of 'ora.storage' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'xifenfei01'
CRS-2676: Start of 'ora.crf' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'xifenfei01'
CRS-2676: Start of 'ora.crsd' on 'xifenfei01' succeeded
CRS-6023: Starting Oracle Cluster Ready Services-managed resources
CRS-6017: Processing resource auto-start for servers: xifenfei01
CRS-6016: Resource auto-start has completed for server xifenfei01
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
2017/06/11 10:40:23 CLSRSC-343: Successfully started Oracle Clusterware stack
2017/06/11 10:40:23 CLSRSC-594: Executing installation step 18 of 19: 'ConfigNode'.
CRS-2672: Attempting to start 'ora.ASMNET1LSNR_ASM.lsnr' on 'xifenfei01'
CRS-2676: Start of 'ora.ASMNET1LSNR_ASM.lsnr' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'xifenfei01'
CRS-2676: Start of 'ora.asm' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.DATA.dg' on 'xifenfei01'
CRS-2676: Start of 'ora.DATA.dg' on 'xifenfei01' succeeded
2017/06/11 10:42:19 CLSRSC-594: Executing installation step 19 of 19: 'PostConfig'.
2017/06/11 10:43:16 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster ... succeeded

其他剩余节点也是类似处理,最终跳过acfs安装成功

[grid@xifenfei01 ~]$ crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       xifenfei01               STABLE
               ONLINE  ONLINE       xifenfei02               STABLE
ora.DATA.dg
               ONLINE  ONLINE       xifenfei01               STABLE
               ONLINE  ONLINE       xifenfei02               STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       xifenfei01               STABLE
               ONLINE  ONLINE       xifenfei02               STABLE
ora.chad
               ONLINE  ONLINE       xifenfei01               STABLE
               ONLINE  ONLINE       xifenfei02               STABLE
ora.net1.network
               ONLINE  ONLINE       xifenfei01               STABLE
               ONLINE  ONLINE       xifenfei02               STABLE
ora.ons
               ONLINE  ONLINE       xifenfei01               STABLE
               ONLINE  ONLINE       xifenfei02               STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xifenfei01               STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       xifenfei01               169.254.20.214 192.1
                                                             68.1.20 192.168.2.20
                                                             ,STABLE
ora.asm
      1        ONLINE  ONLINE       xifenfei01               Started,STABLE
      2        ONLINE  ONLINE       xifenfei02               Started,STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       xifenfei01               STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       xifenfei01               Open,STABLE
ora.qosmserver
      1        ONLINE  ONLINE       xifenfei01               STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       xifenfei01               STABLE
ora.xifenfei01.vip
      1        ONLINE  ONLINE       xifenfei01               STABLE
ora.xifenfei02.vip
      1        ONLINE  ONLINE       xifenfei02               STABLE
--------------------------------------------------------------------------------

最新官方处理方案:CLSRSC-400: A system reboot is required to continue installing.

crfclust.bdb文件过大处理

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:crfclust.bdb文件过大处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

grid所在目录空间满

[root@wldb01 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdw2              40G   22G   17G  58% /
tmpfs                  16G  219M   16G   2% /dev/shm
/dev/sdw1              50G   46G  1.5G  97% /u01

使用find命令找出来大文件

[root@wldb01 grid]# find ./ type f -size +1024M
./crf/db/wldb01/crfclust.bdb

由于文件crfclust.bdb是Cluster Health Monitor (CHM) file,他的默认大小是1G,但是有在一些平台和版本中由于bug原因导致过大.
Oracle Cluster Health Monitor (CHM) using large amount of space (more than default) (Doc ID 1343105.1)
Bug 20186278 – crfclust.bdb Becomes Huge Size Due to Sudden Retention Change (Doc ID 20186278.8)

[grid@wldb01 ~]$ /u01/app/11.2.0/grid/bin/oclumon manage -get reppath
CHM Repository Path = /u01/app/11.2.0/grid/crf/db/wldb01
 Done
[root@wldb01 ~]# cd  /u01/app/11.2.0/grid/crf/db/wldb01
[root@wldb01 wldb01]# du -sh
24G     .
[root@wldb01 wldb01]# ls -lhtr
total 24G
-rw-r-----. 1 root root  16M Mar 25 21:38 log.0000047847
-rw-r-----. 1 root root 8.0K Mar 25 21:38 repdhosts.bdb
-rw-r-----. 1 root root  24K Mar 25 21:39 __db.001
-rw-r--r--. 1 root root 115M Mar 25 21:39 wldb01.ldb
-rw-r-----. 1 root root 8.0K Mar 25 21:40 crfconn.bdb
-rw-r-----. 1 root root 329M Mar 25 21:52 crfts.bdb
-rw-r-----. 1 root root 508M Mar 25 21:53 crfloclts.bdb
-rw-r-----. 1 root root  22G Mar 25 21:53 crfclust.bdb
-rw-r-----. 1 root root 392K Mar 25 21:53 __db.002
-rw-r-----. 1 root root  16M Mar 25 21:53 log.0000047848
-rw-r-----. 1 root root 504M Mar 25 21:53 crfhosts.bdb
-rw-r-----. 1 root root 650M Mar 25 21:53 crfcpu.bdb
-rw-r-----. 1 root root 534M Mar 25 21:53 crfalert.bdb
-rw-r-----. 1 root root  56K Mar 25 21:53 __db.006
-rw-r-----. 1 root root 1.2M Mar 25 21:53 __db.005
-rw-r-----. 1 root root 2.1M Mar 25 21:53 __db.004
-rw-r-----. 1 root root 2.6M Mar 25 21:53 __db.003

清理bdb文件

[root@wldb01 wldb01]# /u01/app/11.2.0/grid/bin/crsctl stop res ora.crf -init
CRS-2673: Attempting to stop 'ora.crf' on 'wldb01'
CRS-2677: Stop of 'ora.crf' on 'wldb01' succeeded
[root@wldb01 wldb01]# /u01/app/11.2.0/grid/bin/crsctl status res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       wldb01                   Started
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       wldb01
ora.crf
      1        OFFLINE OFFLINE
ora.crsd
      1        ONLINE  ONLINE       wldb01
ora.cssd
      1        ONLINE  ONLINE       wldb01
ora.cssdmonitor
      1        ONLINE  ONLINE       wldb01
ora.ctssd
      1        ONLINE  ONLINE       wldb01                   ACTIVE:0
ora.diskmon
      1        OFFLINE OFFLINE
ora.drivers.acfs
      1        ONLINE  ONLINE       wldb01
ora.evmd
      1        ONLINE  ONLINE       wldb01
ora.gipcd
      1        ONLINE  ONLINE       wldb01
ora.gpnpd
      1        ONLINE  ONLINE       wldb01
ora.mdnsd
      1        ONLINE  ONLINE       wldb01
[root@wldb01 wldb01]# rm -rf *.bdb
[root@wldb01 wldb01]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdw2              40G   22G   17G  58% /
tmpfs                  16G  219M   16G   2% /dev/shm
/dev/sdw1              50G   22G   26G  46% /u01
[root@wldb01 wldb01]# du -sh
53M     .
[root@wldb01 wldb01]# /u01/app/11.2.0/grid/bin/crsctl start res ora.crf -init
CRS-2672: Attempting to start 'ora.crf' on 'wldb01'
CRS-2676: Start of 'ora.crf' on 'wldb01' succeeded
[root@wldb01 wldb01]# /u01/app/11.2.0/grid/bin/crsctl status res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       wldb01                   Started
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       wldb01
ora.crf
      1        ONLINE  ONLINE       wldb01
ora.crsd
      1        ONLINE  ONLINE       wldb01
ora.cssd
      1        ONLINE  ONLINE       wldb01
ora.cssdmonitor
      1        ONLINE  ONLINE       wldb01
ora.ctssd
      1        ONLINE  ONLINE       wldb01                   ACTIVE:0
ora.diskmon
      1        OFFLINE OFFLINE
ora.drivers.acfs
      1        ONLINE  ONLINE       wldb01
ora.evmd
      1        ONLINE  ONLINE       wldb01
ora.gipcd
      1        ONLINE  ONLINE       wldb01
ora.gpnpd
      1        ONLINE  ONLINE       wldb01
ora.mdnsd
      1        ONLINE  ONLINE       wldb01

修改11.2 RAC 的 SCAN IP

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:修改11.2 RAC 的 SCAN IP

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在某些情况下,由于是C/S架构,从以前的单机系统迁移到现在11.2的rac中,如果修改客户端ip地址工作量太大,而且也不现实,一般建议直接修改scan ip地址和以前一样,从而实现业务直接访问scan ip实现应用不用一个个单独配置.这里通过简单演示,实现修改scan ip的过程(网段不变),主要是把scan名字为scan-xff的ip地址从192.168.137.245修改为192.168.137.248
查看当前scan ip信息

[root-www.xifenfei.com@xff1 ~]# ping xff-scan
PING xff-scan (192.168.137.245) 56(84) bytes of data.
^C
--- xff-scan ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1738ms
[root-www.xifenfei.com@xff1 ~]# crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
ora.LISTENER.lsnr
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
ora.asm
               ONLINE  ONLINE       xff1                     Started
               ONLINE  ONLINE       xff2                     Started
ora.gsd
               OFFLINE OFFLINE      xff1
               OFFLINE OFFLINE      xff2
ora.net1.network
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
ora.ons
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xff1
ora.cvu
      1        ONLINE  ONLINE       xff1
ora.oc4j
      1        ONLINE  ONLINE       xff1
ora.scan1.vip
      1        ONLINE  ONLINE       xff1
ora.xff1.vip
      1        ONLINE  ONLINE       xff1
ora.xff2.vip
      1        ONLINE  ONLINE       xff2
ora.xffdb.db
      1        ONLINE  ONLINE       xff1                     Open
      2        ONLINE  ONLINE       xff2                     Open
[root-www.xifenfei.com@xff1 ~]# srvctl config scan
SCAN name: xff-scan, Network: 1/192.168.137.0/255.255.255.0/eth0
SCAN VIP name: scan1, IP: /xff-scan/192.168.137.245

修改scan ip

[root-www.xifenfei.com@xff1 ~]# srvctl stop scan_listener
[root-www.xifenfei.com@xff1 ~]# srvctl stop scan
[root-www.xifenfei.com@xff1 ~]# crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
ora.LISTENER.lsnr
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
ora.asm
               ONLINE  ONLINE       xff1                     Started
               ONLINE  ONLINE       xff2                     Started
ora.gsd
               OFFLINE OFFLINE      xff1
               OFFLINE OFFLINE      xff2
ora.net1.network
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
ora.ons
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        OFFLINE OFFLINE
ora.cvu
      1        ONLINE  ONLINE       xff1
ora.oc4j
      1        ONLINE  ONLINE       xff1
ora.scan1.vip
      1        OFFLINE OFFLINE
ora.xff1.vip
      1        ONLINE  ONLINE       xff1
ora.xff2.vip
      1        ONLINE  ONLINE       xff2
ora.xffdb.db
      1        ONLINE  ONLINE       xff1                     Open
      2        ONLINE  ONLINE       xff2                     Open
--如果是dns,注意修改dns中scan ip信息,如果是hosts文件注意多个节点都需要修改
[root-www.xifenfei.com@xff1 ~]# ping xff-scan
PING xff-scan (192.168.137.248) 56(84) bytes of data.
^C
--- xff-scan ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1738ms
[root-www.xifenfei.com@xff1 ~]# srvctl modify scan -n xff-scan
[root-www.xifenfei.com@xff1 ~]# srvctl config scan
SCAN name: xff-scan, Network: 1/192.168.137.0/255.255.255.0/eth0
SCAN VIP name: scan1, IP: /xff-scan/192.168.137.248
[root-www.xifenfei.com@xff1 ~]# srvctl start scan
[root-www.xifenfei.com@xff1 ~]# srvctl start scan_listener
[root-www.xifenfei.com@xff1 ~]# crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
ora.LISTENER.lsnr
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
ora.asm
               ONLINE  ONLINE       xff1                     Started
               ONLINE  ONLINE       xff2                     Started
ora.gsd
               OFFLINE OFFLINE      xff1
               OFFLINE OFFLINE      xff2
ora.net1.network
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
ora.ons
               ONLINE  ONLINE       xff1
               ONLINE  ONLINE       xff2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xff2
ora.cvu
      1        ONLINE  ONLINE       xff1
ora.oc4j
      1        ONLINE  ONLINE       xff1
ora.scan1.vip
      1        ONLINE  ONLINE       xff2
ora.xff1.vip
      1        ONLINE  ONLINE       xff1
ora.xff2.vip
      1        ONLINE  ONLINE       xff2
ora.xffdb.db
      1        ONLINE  ONLINE       xff1                     Open
      2        ONLINE  ONLINE       xff2                     Open

查看修改后的scan listener状态

xff2:/home/grid> lsnrctl status LISTENER_SCAN1
LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 12-MAR-2016 17:02:32
Copyright (c) 1991, 2013, Oracle.  All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_SCAN1
Version                   TNSLSNR for Linux: Version 11.2.0.4.0 - Production
Start Date                12-MAR-2016 16:59:05
Uptime                    0 days 0 hr. 3 min. 27 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/oracle/app/grid/network/admin/listener.ora
Listener Log File         /u01/oracle/app/grid/log/diag/tnslsnr/xff2/listener_scan1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.137.248)(PORT=1521)))
The listener supports no services
The command completed successfully
[root-www.xifenfei.com@xff2 ~]# su - oracle
xff2:/home/oracle> sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Sat Mar 12 17:01:11 2016
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> alter system register;
System altered.
SQL> /
System altered.
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
xff2:/home/oracle> exit
logout
[root-www.xifenfei.com@xff2 ~]# su - grid
xff2:/home/grid> lsnrctl status LISTENER_SCAN1
LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 12-MAR-2016 17:01:24
Copyright (c) 1991, 2013, Oracle.  All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_SCAN1
Version                   TNSLSNR for Linux: Version 11.2.0.4.0 - Production
Start Date                12-MAR-2016 16:59:05
Uptime                    0 days 0 hr. 2 min. 18 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/oracle/app/grid/network/admin/listener.ora
Listener Log File         /u01/oracle/app/grid/log/diag/tnslsnr/xff2/listener_scan1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.137.248)(PORT=1521)))
The listener supports no services
The command completed successfully

这里发现修改scan ip之后,scan listener没有正确或者到监听的动态注册信息,哪怕人工执行了alter system register;也不行.

通过重启数据库,解决修改scan ip后的动态监听注册问题

[root-www.xifenfei.com@xff2 ~]# su - oracle
xff2:/home/oracle> srvctl stop database -d xffdb
xff2:/home/oracle> srvctl start database -d xffdb
xff2:/home/oracle> exit
logout
[root-www.xifenfei.com@xff2 ~]# su - grid
xff2:/home/grid> lsnrctl status LISTENER_SCAN1
LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 12-MAR-2016 17:06:17
Copyright (c) 1991, 2013, Oracle.  All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_SCAN1
Version                   TNSLSNR for Linux: Version 11.2.0.4.0 - Production
Start Date                12-MAR-2016 16:59:05
Uptime                    0 days 0 hr. 7 min. 11 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/oracle/app/grid/network/admin/listener.ora
Listener Log File         /u01/oracle/app/grid/log/diag/tnslsnr/xff2/listener_scan1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.137.248)(PORT=1521)))
Services Summary...
Service "xffdb" has 2 instance(s).
  Instance "xffdb1", status READY, has 1 handler(s) for this service...
  Instance "xffdb2", status READY, has 1 handler(s) for this service...
Service "xffdbXDB" has 2 instance(s).
  Instance "xffdb1", status READY, has 1 handler(s) for this service...
  Instance "xffdb2", status READY, has 1 handler(s) for this service...
The command completed successfully

ora.crf资源异常—临时停止和禁用

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ora.crf资源异常—临时停止和禁用

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

检查发现运行在win 2008平台的11.2.0.3 rac的crs的alert日志里面出现大量类似记录
CRS-2765错误

2015-09-04 00:12:10.431
[ohasd(3844)]CRS-2765:资源 'ora.crf' 已失败 (在服务器 'rac2' 上)。
2015-09-04 00:16:46.047
[ohasd(3844)]CRS-2765:资源 'ora.crf' 已失败 (在服务器 'rac2' 上)。
2015-09-04 00:21:21.479
[ohasd(3844)]CRS-2765:资源 'ora.crf' 已失败 (在服务器 'rac2' 上)。
2015-09-04 00:25:57.365
[ohasd(3844)]CRS-2765:资源 'ora.crf' 已失败 (在服务器 'rac2' 上)。

查看crfmond.log日志发现类似记录

2015-09-04 00:07:35.607: [    GPNP][19080] clsgpnp_getCachedProfileEx: [at clsgpnp.c:613] Result: (26)
 CLSGPNP_NO_PROFILE. Can't get offline GPnP service profile: local gpnpd is up and running. Use getProfile instead.
2015-09-04 00:07:35.607: [    GPNP][19080] clsgpnp_getCachedProfileEx: [at clsgpnp.c:623] Result:
(26) CLSGPNP_NO_PROFILE. Failed to get offline GPnP service profile.
2015-09-04 00:07:35.732: [ CRFMOND][19080]Sysmond coming up...
2015-09-04 00:07:35.732: [ CRFMOND][19080]Failed to load init file ret=1
2015-09-04 00:07:35.732: [ CRFMOND][19080]OSD error: op="scrfosm_loadInitFile" loc="read fail1"
other="crfhome="D:\app\11.2.0\grid" and gipath="D:\app\11.2.0\grid\crf\admin\crf.ora"" dep="2"
2015-09-04 00:07:37.095: [ COMMCRS][19696]clsc_send_msg: (00000000058C98E0) NS err (12571, 12560), transport (533, 57, 0)
[  clsdmc][19676]Fail to connect (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=61022)) with status 9
[  clsdmt][19712]Listening to (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=61022))
2015-09-04 00:07:37.201: [  clsdmt][19712]PID for the Process [19672], connkey 5
2015-09-04 00:07:37.201: [  clsdmt][19712]Creating PID [19672] file for home D:\app\11.2.0\grid
host rac2 bin osysmond to D:\app\11.2.0\grid\osysmond\init\
2015-09-04 00:07:37.202: [  clsdmt][19712]Writing PID [19672] to the file [D:\app\11.2.0\grid\osysmond\init\rac2.pid]
2015-09-04 00:07:37.734: [ CRFMOND][19676]mond_init: clsdms init successful
[   CLWAL][19676]clsw_Initialize: OLR initlevel [70000]
2015-09-04 00:12:10.050: [    GPNP][19676] clsgpnp_getCachedProfileEx: [at clsgpnp.c:613] Result: (26)
CLSGPNP_NO_PROFILE. Can't get offline GPnP service profile: local gpnpd is up and running. Use getProfile instead.
2015-09-04 00:12:10.051: [    GPNP][19676] clsgpnp_getCachedProfileEx: [at clsgpnp.c:623] Result:
(26) CLSGPNP_NO_PROFILE. Failed to get offline GPnP service profile.
2015-09-04 00:12:10.197: [ CRFMOND][19676]Sysmond coming up...
2015-09-04 00:12:10.197: [ CRFMOND][19676]Failed to load init file ret=1
2015-09-04 00:12:10.197: [ CRFMOND][19676]OSD error: op="scrfosm_loadInitFile" loc="read fail1"
other="crfhome="D:\app\11.2.0\grid" and gipath="D:\app\11.2.0\grid\crf\admin\crf.ora"" dep="2"
2015-09-04 00:12:11.557: [ COMMCRS][18376]clsc_send_msg: (00000000059498E0) NS err (12571, 12560), transport (533, 57, 0)

查询mos发现匹配文章Windows: CRS-2765:Resource ‘ora.crf’ has failed on server (文档 ID 1480263.1),从文中说明看是由于unpublished bug 14010695导致该问题,给出来建议是打psu到最新,但是升级psu需要停机窗口。临时想通过禁用ora.crf资源的方式来解决,在禁用该资源之前,我们先看下该资源的用途,确定是否可以禁用。

ora.crf用途
资源对应的功能是CHM.Cluster Health Monitor(以下简称CHM)是一个Oracle提供的工具,用来自动收集操作系统的资源(CPU、内存、SWAP、进程、I/O以及网络等)的使用情况。CHM会每秒收集一次数据。这些系统资源数据对于诊断集群系统的节点重启、Hang、实例驱逐(Eviction)、性能问题等是非常有帮助的。另外,用户可以使用CHM来及早发现一些系统负载高、内存异常等问题,从而避免产生更严重的问题。CHM会自动安装在下面的软件:
11.2.0.2 及更高版本的 Oracle Grid Infrastructure for Linux (不包括Linux Itanium) 、Solaris (Sparc 64 和 x86-64)
11.2.0.3 及更高版本 Oracle Grid Infrastructure for AIX 、 Windows (不包括Windows Itanium)。
根据上述描述可知ora.crf资源主要是用来收集信息的,而且在11.2.0.2之后才有,因此可以停止并禁用它

停止ora.crf资源

C:\Users\Administrator>crsctl status res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       rac2                     Started
ora.crf
      1        ONLINE  ONLINE       rac2
ora.crsd
      1        ONLINE  ONLINE       rac2
ora.cssd
      1        ONLINE  ONLINE       rac2
ora.cssdmonitor
      1        ONLINE  ONLINE       rac2
ora.ctssd
      1        ONLINE  ONLINE       rac2                     OBSERVER
ora.drivers.acfs
      1        ONLINE  ONLINE       rac2
ora.evmd
      1        ONLINE  ONLINE       rac2
ora.gipcd
      1        ONLINE  ONLINE       rac2
ora.gpnpd
      1        ONLINE  ONLINE       rac2
ora.mdnsd
      1        ONLINE  ONLINE       rac2
C:\Users\Administrator>crsctl stop res ora.crf -init
CRS-2673: 尝试停止 'ora.crf' (在 'rac2' 上)
CRS-2677: 成功停止 'ora.crf' (在 'rac2' 上)
C:\Users\Administrator>crsctl status res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       rac2                     Started
ora.crf
      1        OFFLINE OFFLINE
ora.crsd
      1        ONLINE  ONLINE       rac2
ora.cssd
      1        ONLINE  ONLINE       rac2
ora.cssdmonitor
      1        ONLINE  ONLINE       rac2
ora.ctssd
      1        ONLINE  ONLINE       rac2                     OBSERVER
ora.drivers.acfs
      1        ONLINE  ONLINE       rac2
ora.evmd
      1        ONLINE  ONLINE       rac2
ora.gipcd
      1        ONLINE  ONLINE       rac2
ora.gpnpd
      1        ONLINE  ONLINE       rac2
ora.mdnsd
      1        ONLINE  ONLINE       rac2

禁用ora.crf资源

C:\Users\Administrator>crsctl stat res ora.crf -init
NAME=ora.crf
TYPE=ora.crf.type
TARGET=OFFLINE
STATE=OFFLINE
C:\Users\Administrator>crsctl modify resource "ora.crf" -attr "AUTO_START=0" -init
C:\Users\Administrator>crsctl stat res ora.crf -init
NAME=ora.crf
TYPE=ora.crf.type
TARGET=OFFLINE
STATE=OFFLINE

多cpu环境中运行root.sh失败,asm报ORA-04031

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:多cpu环境中运行root.sh失败,asm报ORA-04031

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友和我反馈,说他们在装linux 6.5上面装11.2.0.3的rac出现异常,root.sh在第一个节点执行就失败了,请求帮助
root.sh-asm-fail


根据上面记录,查看asmca日志

[main] [ 2015-07-24 12:49:35.885 CST ] [SQLEngine.reInitialize:738]  Reinitializing SQLEngine...
[main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:889]  OracleHome.getVersion called.  Current Version: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:957]  Current Version From Inventory: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:889]  OracleHome.getVersion called.  Current Version: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:957]  Current Version From Inventory: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:889]  OracleHome.getVersion called.  Current Version: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:957]  Current Version From Inventory: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.886 CST ] [SQLPlusEngine.getCmmdParams:222]  m_home 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.887 CST ] [SQLPlusEngine.getCmmdParams:223]  version > 112 true
[main] [ 2015-07-24 12:49:35.887 CST ] [SQLEngine.getEnvParams:555]  Default NLS_LANG: AMERICAN_AMERICA.AL32UTF8
[main] [ 2015-07-24 12:49:35.887 CST ] [SQLEngine.getEnvParams:565]  NLS_LANG: AMERICAN_AMERICA.AL32UTF8
[main] [ 2015-07-24 12:49:35.888 CST ] [SQLEngine.initialize:325]  Execing SQLPLUS/SVRMGR process...
[main] [ 2015-07-24 12:49:35.900 CST ] [SQLEngine.initialize:362]  m_bReaderStarted: false
[main] [ 2015-07-24 12:49:35.900 CST ] [SQLEngine.initialize:366]  Starting Reader Thread...
[main] [ 2015-07-24 12:49:35.901 CST ] [SQLEngine.initialize:415]  Waiting for m_bReaderStarted to be true
[main] [ 2015-07-24 12:49:35.972 CST ] [SQLEngine.done:2189]  Done called
[main] [ 2015-07-24 12:49:35.972 CST ] [UsmcaLogger.logException:173]  SEVERE:method oracle.sysman.assistants.usmca.backend.USMInstance:configureLocalASM
[main] [ 2015-07-24 12:49:35.973 CST ] [UsmcaLogger.logException:174]  ORA-01012: not logged on
[main] [ 2015-07-24 12:49:35.973 CST ] [UsmcaLogger.logException:175]  oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-01012: not logged on
oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeImpl(SQLEngine.java:1658)
oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeQuery(SQLEngine.java:831)
oracle.sysman.assistants.usmca.backend.USMInstance.configureLocalASM(USMInstance.java:3036)
oracle.sysman.assistants.usmca.service.UsmcaService.configureLocalASM(UsmcaService.java:1049)
oracle.sysman.assistants.usmca.model.UsmcaModel.performConfigureLocalASM(UsmcaModel.java:944)
oracle.sysman.assistants.usmca.model.UsmcaModel.performOperation(UsmcaModel.java:797)
oracle.sysman.assistants.usmca.Usmca.execute(Usmca.java:174)
oracle.sysman.assistants.usmca.Usmca.main(Usmca.java:369)
[main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:173]  SEVERE:method oracle.sysman.assistants.usmca.backend.USMInstance:configureLocalASM
[main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:174]  ORA-03113: end-of-file on communication channel
[main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:175]  oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-03113: end-of-file on communication channel

这里可以看出来,asm实例无法登陆(ORA-01012和ORA-03113),根据这样的错误,分析asm日志

Reconfiguration complete
Fri Jul 24 12:49:29 2015
LCK0 started with pid=22, OS id=46913
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lmd0_46887.trc  (incident=81):
ORA-04031: unable to allocate 7072 bytes of shared memory ("shared pool","unknown object","sga heap(1,1)","ges resource ")
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_81/+ASM1_lmd0_46887_i81.trc
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lck0_46913.trc  (incident=177):
ORA-04031: unable to allocate 760 bytes of shared memory ("shared pool","unknown object","KKSSP^1343","kglss")
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_177/+ASM1_lck0_46913_i177.trc
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lmon_46885.trc  (incident=73):
ORA-04031: unable to allocate 632 bytes of shared memory ("shared pool","unknown object","sga heap(1,1)","name-service ")
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_73/+ASM1_lmon_46885_i73.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lck0_46913.trc:
ORA-04031: unable to allocate 760 bytes of shared memory ("shared pool","unknown object","KKSSP^1343","kglss")
System state dump requested by (instance=1, osid=46913 (LCK0)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_diag_46879.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
LCK0 (ospid: 46913): terminating the instance due to error 4031
Fri Jul 24 12:49:35 2015
ORA-1092 : opitsk aborting process
Instance terminated by LCK0, pid = 46913

进一步分析asm日志,发现是大家熟悉的asm的ORA-4031问题,那就是说明数据库在执行root.sh的时候使用默认参数文件启动asm的时候shared pool不够大(根据ORACLE最佳实践,建议memory_target=1536M及其以上值),从而出现该问题。类似Bug 14292825 ORA-4031 in ASM as default memory parameters values for 11.2 ASM instances low,根据官方描述该问题在11.2.0.4中修复
BUG-14292825


通过asm日志发现相关默认值配置

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options.
ORACLE_HOME = /u01/app/11.2.0/grid
System name:	Linux
Node name:	RAC01
Release:	2.6.32-358.el6.x86_64
Version:	#1 SMP Tue Jan 29 11:47:41 EST 2013
Machine:	x86_64
Using parameter settings in client-side pfile /u01/app/11.2.0/grid/dbs/init+ASM1.ora on machine RAC01
System parameters with non-default values:
  large_pool_size          = 16M
  instance_type            = "asm"
  remote_login_passwordfile= "EXCLUSIVE"
  asm_power_limit          = 1
  diagnostic_dest          = "/u01/app/grid"
Cluster communication is configured to use the following interface(s) for this instance
  10.10.10.31
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Fri Jul 24 12:49:27 2015

通过查询/proc/cpuinfo,检查cpu数量

processor	: 191
vendor_id	: GenuineIntel
cpu family	: 6
model		: 62
model name	: Intel(R) Xeon(R) CPU E7-8850 v2 @ 2.30GHz
stepping	: 7
cpu MHz		: 1200.000
cache size	: 24576 KB
physical id	: 7
siblings	: 24
core id		: 13
cpu cores	: 12
apicid		: 251
initial apicid	: 251
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes

而根据How To Determine The Default Number Of Subpools Allocated During Startup (Doc ID 455179.1)中描述
最多7个subpool(这里一共有192个cpu,因此subpool数量为7)
1


每个suppool最少512m内存,因此shared pool最小需要3.5G(而默认值几百M,远远不够)
2


由于cpu多,导致shared pool的Subpools 更加多,使得shared pool的需求量更加大。至此本次故障原因可以总结:
由于cpu较多,需要更多的shared pool,而11.2.0.3中由于asm默认内存分配较少,导致在asm启动之时出现shared pool不足(本身默认值小,而且shared pool需求大,从而出现了ORA-04031就不奇怪了),因为运行root.sh过程中asm无法正常启动,从而使得root.sh运行失败。
处理办法:临时disable部分cpu,然后重新执行root.sh,修改asm内存分配,再enable cpu.
特别说明:此故障acs的兄弟遇到过,所以这次我能够快速反应,感谢acs兄弟们的帮忙,另外有权限的朋友可以看看:3-10479952701和3-7976215751等sr描述

记录解决一次Listener状态为Not All Endpoints Registered的故障

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:记录解决一次Listener状态为Not All Endpoints Registered的故障

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户反馈系统异常无法正常访问,检查发现监听异常

C:\Users\Administrator>crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       rac2
ora.LISTENER.lsnr
               ONLINE  INTERMEDIATE rac2                     Not All Endpoints R
                                                             egistered
ora.asm
               ONLINE  ONLINE       rac2                     Started
ora.gsd
               OFFLINE OFFLINE      rac2
ora.net1.network
               ONLINE  ONLINE       rac2
ora.ons
               ONLINE  ONLINE       rac2
ora.registry.acfs
               ONLINE  ONLINE       rac2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  INTERMEDIATE rac2                     Not All Endpoints R
                                                             egistered
ora.cvu
      1        ONLINE  ONLINE       rac2
ora.oc4j
      1        ONLINE  ONLINE       rac2
ora.rac.db
      1        ONLINE  ONLINE       rac2                     Open
      2        ONLINE  OFFLINE
ora.rac1.vip
      1        ONLINE  OFFLINE
ora.rac2.vip
      1        ONLINE  OFFLINE
ora.scan1.vip
      1        ONLINE  OFFLINE
C:\Users\Administrator>lsnrctl status
LSNRCTL for 64-bit Windows: Version 11.2.0.3.0 - Production on 12-6月 -2015 15:50:43
Copyright (c) 1991, 2011, Oracle.  All rights reserved.
正在连接到 (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
LISTENER 的 STATUS
------------------------
别名                      LISTENER
版本                      TNSLSNR for 64-bit Windows: Version 11.2.0.3.0 - Production
启动日期                  12-6月 -2015 15:31:30
正常运行时间              0 天 0 小时 19 分 20 秒
跟踪级别                  off
安全性                    ON: Local OS Authentication
SNMP                      OFF
监听程序参数文件          D:\app\11.2.0\grid\network\admin\listener.ora
监听程序日志文件          D:\app\11.2.0\grid\log\diag\tnslsnr\rac2\listener\alert\log.xml
监听端点概要...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=\\.\pipe\LISTENERipc)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.63.64.70)(PORT=1521)))
监听程序不支持服务
命令执行成功

通过这里可以看到LISTENER和LISTENER_SCAN1为Not All Endpoints Registered状态,而且这个RAC只有一个节点rac2,rac1节点未加入到集群中.进一步检查IP和hosts文件

C:\Users\Administrator>ipconfig -all
Windows IP 配置
   主机名  . . . . . . . . . . . . . : rac2
   主 DNS 后缀 . . . . . . . . . . . :
   节点类型  . . . . . . . . . . . . : 混合
   IP 路由已启用 . . . . . . . . . . : 否
   WINS 代理已启用 . . . . . . . . . : 否
以太网适配器 pub:
   连接特定的 DNS 后缀 . . . . . . . :
   描述. . . . . . . . . . . . . . . : Intel(R) 82576 Gigabit Dual Port Network Connection #2
   物理地址. . . . . . . . . . . . . : 00-25-90-5A-0F-47
   DHCP 已启用 . . . . . . . . . . . : 否
   自动配置已启用. . . . . . . . . . : 是
   本地链接 IPv6 地址. . . . . . . . : fe80::c5ef:663f:7333:45f2%12(首选)
   IPv4 地址 . . . . . . . . . . . . : 10.63.64.70(首选)
   子网掩码  . . . . . . . . . . . . : 255.255.255.192
   默认网关. . . . . . . . . . . . . : 10.63.64.126
   DHCPv6 IAID . . . . . . . . . . . : 301999504
   DHCPv6 客户端 DUID  . . . . . . . : 00-01-00-01-1A-5C-19-A1-00-25-90-5A-0F-46
   DNS 服务器  . . . . . . . . . . . : 218.30.19.40
   TCPIP 上的 NetBIOS  . . . . . . . : 已启用
以太网适配器 priv:
   连接特定的 DNS 后缀 . . . . . . . :
   描述. . . . . . . . . . . . . . . : Intel(R) 82576 Gigabit Dual Port Network Connection
   物理地址. . . . . . . . . . . . . : 00-25-90-5A-0F-46
   DHCP 已启用 . . . . . . . . . . . : 否
   自动配置已启用. . . . . . . . . . : 是
   本地链接 IPv6 地址. . . . . . . . : fe80::c88d:78ff:d2e8:bde1%11(首选)
   IPv4 地址 . . . . . . . . . . . . : 10.10.1.2(首选)
   子网掩码  . . . . . . . . . . . . : 255.255.255.0
   默认网关. . . . . . . . . . . . . :
   DHCPv6 IAID . . . . . . . . . . . : 234890640
   DHCPv6 客户端 DUID  . . . . . . . : 00-01-00-01-1A-5C-19-A1-00-25-90-5A-0F-46
   DNS 服务器  . . . . . . . . . . . : fec0:0:0:ffff::1%1
                                       fec0:0:0:ffff::2%1
                                       fec0:0:0:ffff::3%1
   TCPIP 上的 NetBIOS  . . . . . . . : 已启用
--hosts文件
10.63.64.69		rac1
10.63.64.70		rac2
10.63.64.71		rac1-vip
10.63.64.72		rac2-vip
10.63.64.73		scan-cluster
10.10.1.1		rac1-priv
10.10.1.2		rac2-priv

这里可以看到主机之上的pub网卡只有一个ip 10.63.64.70,不太符合我们对rac的理解(一般来说其上应该有vip,部分情况下甚至可能有scan ip),尝试ping vip和scan ip

C:\Users\Administrator>ping 10.63.64.72
正在 Ping 10.63.64.72 具有 32 字节的数据:
来自 10.63.64.72 的回复: 字节=32 时间<1ms TTL=128
来自 10.63.64.72 的回复: 字节=32 时间<1ms TTL=128
来自 10.63.64.72 的回复: 字节=32 时间<1ms TTL=128
10.63.64.72 的 Ping 统计信息:
    数据包: 已发送 = 3,已接收 = 3,丢失 = 0 (0% 丢失),
往返行程的估计时间(以毫秒为单位):
    最短 = 0ms,最长 = 0ms,平均 = 0ms
Control-C
^C
C:\Users\Administrator>ping 10.63.64.73
正在 Ping 10.63.64.73 具有 32 字节的数据:
来自 10.63.64.73 的回复: 字节=32 时间<1ms TTL=128
来自 10.63.64.73 的回复: 字节=32 时间<1ms TTL=128
来自 10.63.64.73 的回复: 字节=32 时间<1ms TTL=128
10.63.64.73 的 Ping 统计信息:
    数据包: 已发送 = 3,已接收 = 3,丢失 = 0 (0% 丢失),
往返行程的估计时间(以毫秒为单位):
    最短 = 0ms,最长 = 0ms,平均 = 0ms

这里发现一个异常问题:crs显示只有rac2在集群之中,而该主机ip中又不存在vip和scan ip属于异常情况,但是这两个ip又可以ping通,基于这样情况,我第一反应就是vip和scanip可能飘到rac1中了,而rac1又未正常加入到crs中(因为这个库以前处理过,由于rac1的hba卡有问题,数据库无法正常启动,crs起来也无法提供工作),检查rac1机器情况

C:\Users\Administrator>crsctl status res -t
CRS-4535: 无法与集群就绪服务通信
CRS-4000: 命令 Status 失败, 或已完成但出现错误。
C:\Users\Administrator>crsctl status res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       rac1                     Started
ora.crf
      1        ONLINE  ONLINE       rac1
ora.crsd
      1        ONLINE  OFFLINE
ora.cssd
      1        ONLINE  ONLINE       rac1
ora.cssdmonitor
      1        ONLINE  ONLINE       rac1
ora.ctssd
      1        ONLINE  ONLINE       rac1                     OBSERVER
ora.drivers.acfs
      1        ONLINE  ONLINE       rac1
ora.evmd
      1        ONLINE  ONLINE       rac1
ora.gipcd
      1        ONLINE  ONLINE       rac1
ora.gpnpd
      1        ONLINE  ONLINE       rac1
ora.mdnsd
      1        ONLINE  ONLINE       rac1
以太网适配器 pub:
   连接特定的 DNS 后缀 . . . . . . . :
   描述. . . . . . . . . . . . . . . : Intel(R) 82576 Gigabit Dual Port Network Connection
   物理地址. . . . . . . . . . . . . : 00-25-90-5A-0E-E7
   DHCP 已启用 . . . . . . . . . . . : 否
   自动配置已启用. . . . . . . . . . : 是
   本地链接 IPv6 地址. . . . . . . . : fe80::409d:8c2e:446b:af42%11(首选)
   IPv4 地址 . . . . . . . . . . . . : 10.63.64.69(首选)
   子网掩码  . . . . . . . . . . . . : 255.255.255.192
   IPv4 地址 . . . . . . . . . . . . : 10.63.64.71(首选)
   子网掩码  . . . . . . . . . . . . : 255.255.255.192
   IPv4 地址 . . . . . . . . . . . . : 10.63.64.72(首选)
   子网掩码  . . . . . . . . . . . . : 255.255.255.192
   IPv4 地址 . . . . . . . . . . . . : 10.63.64.73(首选)
   子网掩码  . . . . . . . . . . . . : 255.255.255.192
   默认网关. . . . . . . . . . . . . : 10.63.64.126
   DHCPv6 IAID . . . . . . . . . . . : 234890640
   DHCPv6 客户端 DUID  . . . . . . . : 00-01-00-01-1A-5C-19-0A-00-25-90-5A-0E-E7
   DNS 服务器  . . . . . . . . . . . : 8.8.8.8
   TCPIP 上的 NetBIOS  . . . . . . . : 已启用
以太网适配器 priv:
   连接特定的 DNS 后缀 . . . . . . . :
   描述. . . . . . . . . . . . . . . : Intel(R) 82576 Gigabit Dual Port Network Connection #2
   物理地址. . . . . . . . . . . . . : 00-25-90-5A-0E-E6
   DHCP 已启用 . . . . . . . . . . . : 否
   自动配置已启用. . . . . . . . . . : 是
   本地链接 IPv6 地址. . . . . . . . : fe80::154:dad7:f9e3:bea3%13(首选)
   IPv4 地址 . . . . . . . . . . . . : 10.10.1.1(首选)
   子网掩码  . . . . . . . . . . . . : 255.255.255.0
   默认网关. . . . . . . . . . . . . :
   DHCPv6 IAID . . . . . . . . . . . : 301999504
   DHCPv6 客户端 DUID  . . . . . . . : 00-01-00-01-1A-5C-19-0A-00-25-90-5A-0E-E7
   DNS 服务器  . . . . . . . . . . . : fec0:0:0:ffff::1%1
                                       fec0:0:0:ffff::2%1
                                       fec0:0:0:ffff::3%1
   TCPIP 上的 NetBIOS  . . . . . . . : 已启用

果然这里rac2的vip和scan ip都漂到rac1中,但是crs状态属于不正常情况,由于rac1无法正常使用,关闭该主机,并重启rac2(由于rac2处于异常情况无法正常工作),后续rac2恢复正常

C:\Users\Administrator>crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       rac2
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac2
ora.asm
               ONLINE  ONLINE       rac2                     Started
ora.gsd
               OFFLINE OFFLINE      rac2
ora.net1.network
               ONLINE  ONLINE       rac2
ora.ons
               ONLINE  ONLINE       rac2
ora.registry.acfs
               ONLINE  ONLINE       rac2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac2
ora.cvu
      1        ONLINE  ONLINE       rac2
ora.oc4j
      1        ONLINE  ONLINE       rac2
ora.rac.db
      1        OFFLINE OFFLINE                               Instance Shutdown
      2        ONLINE  ONLINE       rac2                     Open
ora.rac1.vip
      1        ONLINE  INTERMEDIATE rac2                     FAILED OVER
ora.rac2.vip
      1        ONLINE  ONLINE       rac2
ora.scan1.vip
      1        ONLINE  ONLINE       rac2
C:\Users\Administrator>lsnrctl status
LSNRCTL for 64-bit Windows: Version 11.2.0.3.0 - Production on 12-6月 -2015 17:02:46
Copyright (c) 1991, 2011, Oracle.  All rights reserved.
正在连接到 (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
LISTENER 的 STATUS
------------------------
别名                      LISTENER
版本                      TNSLSNR for 64-bit Windows: Version 11.2.0.3.0 - Production
启动日期                  12-6月 -2015 16:44:43
正常运行时间              0 天 0 小时 18 分 3 秒
跟踪级别                  off
安全性                    ON: Local OS Authentication
SNMP                      OFF
监听程序参数文件          D:\app\11.2.0\grid\network\admin\listener.ora
监听程序日志文件          D:\app\11.2.0\grid\log\diag\tnslsnr\rac2\listener\alert\log.xml
监听端点概要...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=\\.\pipe\LISTENERipc)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.63.64.70)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.63.64.72)(PORT=1521)))
服务摘要..
服务 "+ASM" 包含 1 个实例。
  实例 "+asm2", 状态 READY, 包含此服务的 1 个处理程序...
服务 "rac" 包含 1 个实例。
  实例 "rac2", 状态 READY, 包含此服务的 1 个处理程序...
服务 "racXDB" 包含 1 个实例。
  实例 "rac2", 状态 READY, 包含此服务的 1 个处理程序...
命令执行成功

出现该问题的原因至此可以总结出来:由于rac1和rac2的集群处于异常状态,rac1持有了vip和scan ip,但是又未正常加入crs,导致rac2无法获得vip和scan ip,从而使得LISTENER和LISTENER_SCAN1为Not All Endpoints Registered状态.另外对于不能正常工作的集群节点,建议关闭crs,甚至可以考虑关闭主机,减少异常节点对正常节点的影响.关于该类问题的分析,可以从Scan Listener In INTERMEDIATE Mode Not All Endpoints Registered (Doc ID 1667873.1)中找到依据,证明是由于IP被占用导致.

init.cssd startcheck—HP Service Guard未启动导致CRS无法正常启动

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:init.cssd startcheck—HP Service Guard未启动导致CRS无法正常启动

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

早上到客户现场,客户告知有一套环境替换OCR和VOTEDISK之后,crs无法启动,让我看看。环境:HP RAC(只用一个节点)+10.2.0.5 Oracle 数据库
start crs显示正常,但是无法启动

# /app/oracle/product/10.2.0/crs/bin/crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly
# ps -ef|grep crs
    root  6461     1  0  May 19  ?         0:00 /bin/sh /sbin/init.d/init.crsd run
    root 29719 23678  0 10:04:51 pts/tc    0:00 grep crs

也无任何日志

[xifenfei01][orawj][/root/xifenfei]#ls -ltr
total 148
drwxr-x---   2 oracle     dba             96 May 15  2014 admin
drwxr-x---   2 root       dba             96 May 15  2014 crsd
drwxr-x---   2 oracle     dba             96 May 15  2014 evmd
drwxrwxr-t   5 oracle     dba           1024 Jun  4  2014 racg
drwxr-x---   5 oracle     dba           1024 May 17 22:50 cssd
-rw-rw-r--   1 root       dba          61568 May 24 15:26 alertxifenfei01.log
drwxr-x---   2 oracle     dba           3072 May 24 15:43 client
[xifenfei01][orawj][/root/xifenfei]#date
Mon, May 25, 2015 11:30:09 AM

表决磁盘和OCR信息

[xifenfei01][orawj][/root/xifenfei]#ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :    1441492
         Used space (kbytes)      :       5972
         Available space (kbytes) :    1435520
         ID                       : 1714667730
         Device/File Name         : /dev/vgc01/rCMPR_VGC01_OCR1
                                    Device/File integrity check succeeded
         Device/File Name         : /dev/vgc02/rCMPR_VGC02_OCR2
                                    Device/File integrity check succeeded
         Cluster registry integrity check succeeded
[xifenfei01][orawj][/root/xifenfei]#crsctl query css votedisk
 0.     0    /dev/vgc01/rCMPR_VGC01_VOTE1
 1.     0    /dev/vgc02/rCMPR_VGC02_VOTE2
 2.     0    /dev/vgc03/rCMPR_VGC03_VOTE3
located 3 votedisk(s).

ocr.loc文件路径

# more /var/opt/oracle/ocr.loc
#Device/file /dev/vgc02/rCMPR_VGC02_OCR2 getting replaced by device /dev/vgc02/rCMPR_VGC02_OCR2
ocrconfig_loc=/dev/vgc01/rCMPR_VGC01_OCR1
ocrmirrorconfig_loc=/dev/vgc02/rCMPR_VGC02_OCR2
local_only=false

这里可以看出来表决磁盘和ocr等相关信息正常

显示init.cssd startcheck进程

[xifenfei01][orawj][/root/xifenfei]#ps -ef|grep init
    root     1     0  0  May 19  ?         0:03 init
    root   119     0  0  May 19  ?         0:00 pagetable_init_daemon
    root   115     0  0  May 19  ?         0:00 mdep_initiator_thread
    root 26820 26792  0 10:49:53 ?         0:00 /bin/sh /sbin/init.d/init.cssd startcheck
    root 26791     1  0 10:49:53 ?         0:00 /bin/sh /sbin/init.d/init.crsd run
    root 27183 23698  0 10:50:23 ?         0:00 /bin/sh /sbin/init.d/init.cssd startcheck
    root 26792     1  0 10:49:53 ?         0:00 /bin/sh /sbin/init.d/init.cssd fatal
    root 23698     1  0 10:45:23 ?         0:00 /bin/sh /sbin/init.d/init.evmd run
    root 26816 26791  0 10:49:53 ?         0:00 /bin/sh /sbin/init.d/init.cssd startcheck
  oracle 20534 11033  0 11:30:35 pts/ta    0:00 grep init

这里的init.cssd startcheck大部分情况下,是由于不能访问存储或者第三方集群件无法访问导致

查看vg状态

VG Name                     /dev/vgc01
VG Write Access             read/write
VG Status                   available
Max LV                      255
Cur LV                      9
Open LV                     9
Max PV                      255
Cur PV                      1
Act PV                      1
Max PE per PV               3200
VGDA                        2
PE Size (Mbytes)            32
Total PE                    3199
Alloc PE                    736
Free PE                     2463
Total PVG                   0
Total Spare PVs             0
Total Spare PVs in use      0
VG Version                  1.0
VG Max Size                 25500g
VG Max Extents              816000
VG Name                     /dev/vgc02
VG Write Access             read/write
VG Status                   available
Max LV                      255
Cur LV                      9
Open LV                     9
Max PV                      255
Cur PV                      1
Act PV                      1
Max PE per PV               3200
VGDA                        2
PE Size (Mbytes)            32
Total PE                    3199
Alloc PE                    736
Free PE                     2463
Total PVG                   0
Total Spare PVs             0
Total Spare PVs in use      0
VG Version                  1.0
VG Max Size                 25500g
VG Max Extents              816000
VG Name                     /dev/vgc03
VG Write Access             read/write
VG Status                   available
Max LV                      255
Cur LV                      6
Open LV                     6
Max PV                      255
Cur PV                      1
Act PV                      1
Max PE per PV               3200
VGDA                        2
PE Size (Mbytes)            32
Total PE                    3199
Alloc PE                    448
Free PE                     2751
Total PVG                   0
Total Spare PVs             0
Total Spare PVs in use      0
VG Version                  1.0
VG Max Size                 25500g
VG Max Extents              816000

这里可以看到,三个存放表决磁盘和ocr的vg都是available的

看votedisk和ocr权限

# ls -l /dev/vgc0*/rCMPR*|grep -v .dbf|grep -v .log|grep -v .ctl
crw-r-----   1 oracle     dba         64 0x020008 May 24 14:40 /dev/vgc01/rCMPR_VGC01_OCR1
crw-r-----   1 oracle     dba         64 0x020009 May 24 14:41 /dev/vgc01/rCMPR_VGC01_VOTE1
crw-r-----   1 oracle     dba         64 0x030008 May 24 14:41 /dev/vgc02/rCMPR_VGC02_OCR2
crw-r-----   1 oracle     dba         64 0x030009 May 24 14:41 /dev/vgc02/rCMPR_VGC02_VOTE2
crw-r-----   1 oracle     dba         64 0x040006 May 24 14:41 /dev/vgc03/rCMPR_VGC03_VOTE3

直接修改权限为777,然后尝试

# chmod 777 /dev/vgc0*/rCMPR*|grep -v .dbf|grep -v .log|grep -v .ctl
#  ls -l /dev/vgc0*/rCMPR*|grep -v .dbf|grep -v .log|grep -v .ctl
crwxrwxrwx   1 oracle     dba         64 0x020008 May 24 14:40 /dev/vgc01/rCMPR_VGC01_OCR1
crwxrwxrwx   1 oracle     dba         64 0x020009 May 24 14:41 /dev/vgc01/rCMPR_VGC01_VOTE1
crwxrwxrwx   1 oracle     dba         64 0x030008 May 24 14:41 /dev/vgc02/rCMPR_VGC02_OCR2
crwxrwxrwx   1 oracle     dba         64 0x030009 May 24 14:41 /dev/vgc02/rCMPR_VGC02_VOTE2
crwxrwxrwx   1 oracle     dba         64 0x040006 May 24 14:41 /dev/vgc03/rCMPR_VGC03_VOTE3

kill相关进程重试

# ps -ef|grep init
    root     1     0  0  May 19  ?         0:03 init
    root   119     0  0  May 19  ?         0:00 pagetable_init_daemon
    root   115     0  0  May 19  ?         0:00 mdep_initiator_thread
    root  6458     1  0  May 19  ?         0:00 /bin/sh /sbin/init.d/init.evmd run
    root 20975     1  0 10:40:11 ?         0:00 /bin/sh /sbin/init.d/init.crsd run
    root 20976     1  0 10:40:11 ?         0:00 /bin/sh /sbin/init.d/init.cssd fatal
    root 21006 20976  0 10:40:11 ?         0:00 /bin/sh /sbin/init.d/init.cssd startcheck
    root 20997 20975  0 10:40:11 ?         0:00 /bin/sh /sbin/init.d/init.cssd startcheck
    root 21152 23678  0 10:40:18 pts/tc    0:00 grep init
vi /etc/inittab
#h1:3:respawn:/sbin/init.d/init.evmd run >/dev/null 2>&1 </dev/null
#h2:3:respawn:/sbin/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null
#h3:3:respawn:/sbin/init.d/init.crsd run >/dev/null 2>&1 </dev/null
# /sbin/init q
# ps -ef|grep init.c | grep -v grep | awk '{print $2}' |xargs kill -9
# ps -ef|grep init
    root     1     0  0  May 19  ?         0:03 init
    root   119     0  0  May 19  ?         0:00 pagetable_init_daemon
    root   115     0  0  May 19  ?         0:00 mdep_initiator_thread
    root 21744 23678  1 10:42:31 pts/tc    0:00 grep init

重新启动init进程

vi /etc/inittab
h1:3:respawn:/sbin/init.d/init.evmd run >/dev/null 2>&1 </dev/null
h2:3:respawn:/sbin/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null
h3:3:respawn:/sbin/init.d/init.crsd run >/dev/null 2>&1 </dev/null
~
# /sbin/init q
# ps -ef|grep init
    root     1     0  0  May 19  ?         0:03 init
    root   119     0  0  May 19  ?         0:00 pagetable_init_daemon
    root   115     0  0  May 19  ?         0:00 mdep_initiator_thread
    root 23737 23706  0 10:45:23 ?         0:00 /bin/sh /sbin/init.d/init.cssd startcheck
    root 23731 23698  0 10:45:23 ?         0:00 /bin/sh /sbin/init.d/init.cssd startcheck
    root 23706     1  0 10:45:23 ?         0:00 /bin/sh /sbin/init.d/init.crsd run
    root 23698     1  0 10:45:23 ?         0:00 /bin/sh /sbin/init.d/init.evmd run
    root 23887 23678  1 10:45:28 pts/tc    0:00 grep init
    root 23746 23700  0 10:45:23 ?         0:00 /bin/sh /sbin/init.d/init.cssd startcheck
    root 23700     1  0 10:45:23 ?         0:00 /bin/sh /sbin/init.d/init.cssd fatal

证明修改lv权限,问题依旧,不是votedisk和ocr的权限和所有者导致,通过dd和strings读相关文件,发现都OK.

调试/sbin/init.d/init.cssd startcheck进程

[xifenfei01][orawj][/root/xifenfei]#sh -x  /sbin/init.d/init.cssd startcheck
+ ORA_CRS_HOME=/app/oracle/product/10.2.0/crs
+ ORACLE_USER=oracle
+ ORACLE_HOME=/app/oracle/product/10.2.0/crs
+ export ORACLE_HOME
+ export ORA_CRS_HOME
+ export ORACLE_USER
+ DISABLE_OPROCD=false
+ OPROCD_DEFAULT_TIMEOUT=1000
+ OPROCD_DEFAULT_MARGIN=500
+ OPROCD_CHECK_TIMEOUT=2000
+ OPROCD_STOP_TIMEOUT=2000
+ OPROCD_DEFAULT_HISTORGRAM=
+ HOSTN=/bin/hostname
+ EXPRN=/usr/bin/expr
+ CUT=/usr/bin/cut
+ AWK=/bin/awk
+ ECHO=echo
+ TR=/bin/tr
+ /bin/uname
+ [ SunOS = HP-UX ]
+ /bin/uname
+ [ Linux = HP-UX ]
+ + /bin/hostname
HOST=xifenfei01
+ + /usr/bin/expr xifenfei01 : .*
len1=8
+ + /usr/bin/expr match xifenfei01 [0-9]*\.[0-9]*\.[0-9]*\.[0-9]*
len2=0
+ [ 8 != 0 ]
+ + echo xifenfei01
+ /usr/bin/cut -d. -f1
HOST=xifenfei01
+ + echo xifenfei01
+ /bin/tr [:upper:] [:lower:]
HOST=xifenfei01
+ PS=/bin/ps
+ PSE=/bin/ps -e
+ PSEF=/bin/ps -ef
+ HEAD=/bin/head
+ GREP=/bin/grep
+ KILL=/bin/kill
+ KILLTERM=/bin/kill -TERM
+ KILLDIE=/bin/kill -9
+ KILLCHECK=/bin/kill -0 5852
+ SLEEP=/bin/sleep
+ NULL=/dev/null
+ UNAME=/bin/uname
+ CAT=/bin/cat
………………
+ eval /bin/true
+ /bin/true
+ [ 0 != 0 ]
+ eval /bin/ps -ef | /bin/grep '/usr/lbin/cm[g]msd' 1>/dev/null 2>/dev/null
+ /bin/grep /usr/lbin/cm[g]msd
+ /bin/ps -ef
+ 1> /dev/null 2> /dev/null
+ RC=1
+ [ 1 -ne 0 ]
+ /bin/logger -puser.err Oracle Cluster Ready Services waiting for HP-UX Service Guard to start.
+ /bin/sleep 60

这里可以通过-x调试shell脚本,发现crs在等待HP-UX Service Guard启动,从而可以确定是由于HP-UX Service Guard未启动

检查HP-UX Service Guard是否启动

[xifenfei01][orawj][/root/xifenfei]#cmviewcl
CLUSTER           STATUS
crmdb_b_cluster   down
  NODE           STATUS       STATE
  xifenfei01       down         unknown
  crmdbb02       down         unknown
UNOWNED_PACKAGES
    PACKAGE        STATUS           STATE            AUTO_RUN    NODE
    pkg1           down             halted           enabled     unowned
    pkg2           down             halted           enabled     unowned

通过这里,结合客户描述(只启动了一个节点,另外一个节点的vg未激活),可以判断出来由于只使用一个节点,在未启动Service Guard的情况下,直接激活vg,由于Service Guard未启动导致crs无法启动

hosts无效内容未注释导致RAC安装OUI的Network Interface Usage报INS-41112

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:hosts无效内容未注释导致RAC安装OUI的Network Interface Usage报INS-41112

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

安装AIX RAC过程中,在OUI的Network Interface Usage报INS-41112,从而使得安装无法继续
ins-41112
2


错误记录摘要

Cause - Installer has detected that network interface en6 does not maintain connectivity on all cluster nodes.
Action - Ensure that the chosen interface has been configured across all cluster nodes.  Additional Information:
Summary of the failed nodes xifenfei01  
- PRVF-4190 : Verification of the hosts config file failed

hosts解析配置

10.70.89.68     xifenfei01
10.70.89.69     xifenfei01-vip
10.70.89.100    xifenfei01-priv
10.70.89.71     xifenfei02
10.70.89.72     xifenfei02-vip
10.70.89.101    xifenfei02-priv
10.70.89.79     xifenfei-scan

网卡配置

xifenfei01:/u01/soft/grid> ifconfig -a
en7: flags=1e084863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
        inet 10.70.89.100 netmask 0xffffffe0 broadcast 10.70.89.127
         tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
en6: flags=1e084863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
        inet 10.70.89.68 netmask 0xffffffe0 broadcast 10.70.89.95
         tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
lo0: flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LARGESEND,CHAIN>
        inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
        inet6 ::1%1/0
         tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
xifenfei02/#ifconfig -a
en6: flags=1e084863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
        inet 10.70.89.71 netmask 0xffffffe0 broadcast 10.70.89.95
         tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
en7: flags=1e084863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
        inet 10.70.89.101 netmask 0xffffffe0 broadcast 10.70.89.127
         tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
lo0: flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LARGESEND,CHAIN>
        inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
        inet6 ::1%1/0
         tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
xifenfei01/asmdisks#netstat -in
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en7 1500 link#2 0.11.25.bd.b8.7a 107 0 132 2 0
en7 1500 10.70.89.96 10.70.89.100 107 0 132 2 0
en6 1500 link#3 0.11.25.bd.a8.93 50015 0 36963 2 0
en6 1500 10.70.89.64 10.70.89.68 50015 0 36963 2 0
lo0 16896 link#1 1589 0 1588 0 0
lo0 16896 127 127.0.0.1 1589 0 1588 0 0
lo0 16896 ::1%1 1589 0 1588 0 0
xifenfei02/asmdisks#netstat -in
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en6 1500 link#2 0.11.25.bd.a8.a9 5401 0 3660 2 0
en6 1500 10.70.89.64 10.70.89.71 5401 0 3660 2 0
en7 1500 link#3 0.11.25.bd.51.d2 129 0 123 2 0
en7 1500 10.70.89.96 10.70.89.101 129 0 123 2 0
lo0 16896 link#1 1249 0 1249 0 0
lo0 16896 127 127.0.0.1 1249 0 1249 0 0
lo0 16896 ::1%1 1249 0 1249 0 0

这里可以看到网卡和IP配置是匹配,但是两台主机网卡显示顺序不一样.查看EtherChannel配置,也是正常的
4
3


网络ping测试

xifenfei01:/u01/soft/grid> ping xifenfei01-priv
PING xifenfei01-priv: (10.70.89.100): 56 data bytes
64 bytes from 10.70.89.100: icmp_seq=0 ttl=255 time=0 ms
64 bytes from 10.70.89.100: icmp_seq=1 ttl=255 time=0 ms
xifenfei02/#ping xifenfei01-priv
PING xifenfei01-priv: (10.70.89.100): 56 data bytes
64 bytes from 10.70.89.100: icmp_seq=0 ttl=255 time=0 ms
64 bytes from 10.70.89.100: icmp_seq=1 ttl=255 time=0 ms

使用runcluvfy.sh检测网络配置

./runcluvfy.sh comp nodecon -i en7 -n xifenfei01-priv,xifenfei02-priv -verbose
xifenfei01:/u01/soft/grid> ./runcluvfy.sh comp nodecon -i en7 -n xifenfei01-priv,xifenfei02-priv -verbose
Verifying node connectivity
Checking node connectivity...
Checking hosts config file...
  Node Name     Status                    Comment
  ------------  ------------------------  ------------------------
  xifenfei02-priv  passed                    successful
  xifenfei01-priv  failed                    Invalid Entry
ERROR:
PRVF-4190 : Verification of the hosts config file failed
Interface information for node "xifenfei02-priv"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 en6    10.70.89.71     10.70.89.64     10.70.89.71     10.70.89.65     00:11:25:BD:A8:A9 1500
 en7    10.70.89.101    10.70.89.96     10.70.89.101    10.70.89.65     00:11:25:BD:51:D2 1500
Interface information for node "xifenfei01-priv"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 en7    10.70.89.100    10.70.89.96     10.70.89.100    10.70.89.65     00:11:25:BD:B8:7A 1500
 en6    10.70.89.68     10.70.89.64     10.70.89.68     10.70.89.65     00:11:25:BD:A8:93 1500
Check: Node connectivity for interface "en7"
  Source                          Destination                     Connected?
  ------------------------------  ------------------------------  ----------------
  xifenfei02-priv[10.70.89.101]     xifenfei01-priv[10.70.89.100]     yes
Result: Node connectivity passed for interface "en7"
Check: TCP connectivity of subnet "10.70.89.96"
  Source                          Destination                     Connected?
  ------------------------------  ------------------------------  ----------------
  xifenfei01:10.70.89.68            xifenfei02-priv:10.70.89.101      passed
  xifenfei01:10.70.89.68            xifenfei01-priv:10.70.89.100      passed
Result: TCP connectivity check passed for subnet "10.70.89.96"
Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "10.70.89.64".
Subnet mask consistency check passed for subnet "10.70.89.96".
Subnet mask consistency check passed.
Result: Node connectivity check failed
Verification of node connectivity was unsuccessful.
Checks did not pass for the following node(s):
        xifenfei01-priv

这里显示xifenfei01-priv主机检测失败,报错为PRVF-4190,检查xifenfei01主机的hosts文件发现一处错误记录

xifenfei01/#vi /etc/hosts
"/etc/hosts" 113 lines, 3556 characters
# @(#)47        1.1  src/bos/usr/sbin/netstart/hosts, cmdnet, bos530 7/24/91 10:
00:46
# IBM_PROLOG_BEGIN_TAG
# This is an automatically generated prolog.
#
# bos530 src/bos/usr/sbin/netstart/hosts 1.1
#
# Licensed Materials - Property of IBM
#
# (C) COPYRIGHT International Business Machines Corp. 1985,1989
# All Rights Reserved
#
# US Government Users Restricted Rights - Use, duplication or
# disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

这里发现”00:46″是一个新行,而且是无效记录,除掉该行记录,继续runcluvfy.sh测试

xifenfei01:/u01/soft/grid> ./runcluvfy.sh comp nodecon -i en7 -n xifenfei01-priv,xifenfei02-priv -verbose
Verifying node connectivity
Checking node connectivity...
Checking hosts config file...
  Node Name                             Status
  ------------------------------------  ------------------------
  xifenfei02-priv                         passed
  xifenfei01-priv                         passed
Verification of the hosts config file successful
Interface information for node "xifenfei02-priv"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 en6    10.70.89.71     10.70.89.64     10.70.89.71     10.70.89.65     00:11:25:BD:A8:A9 1500
 en7    10.70.89.101    10.70.89.96     10.70.89.101    10.70.89.65     00:11:25:BD:51:D2 1500
Interface information for node "xifenfei01-priv"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 en7    10.70.89.100    10.70.89.96     10.70.89.100    10.70.89.65     00:11:25:BD:B8:7A 1500
 en6    10.70.89.68     10.70.89.64     10.70.89.68     10.70.89.65     00:11:25:BD:A8:93 1500
Check: Node connectivity for interface "en7"
  Source                          Destination                     Connected?
  ------------------------------  ------------------------------  ----------------
  xifenfei02-priv[10.70.89.101]     xifenfei01-priv[10.70.89.100]     yes
Result: Node connectivity passed for interface "en7"
Check: TCP connectivity of subnet "10.70.89.96"
  Source                          Destination                     Connected?
  ------------------------------  ------------------------------  ----------------
  xifenfei01:10.70.89.68            xifenfei02-priv:10.70.89.101      passed
  xifenfei01:10.70.89.68            xifenfei01-priv:10.70.89.100      passed
Result: TCP connectivity check passed for subnet "10.70.89.96"
Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "10.70.89.64".
Subnet mask consistency check passed for subnet "10.70.89.96".
Subnet mask consistency check passed.
Result: Node connectivity check passed
Verification of node connectivity was successful.

除掉无效记录后,runcluvfy检查通过.OUI继续安装一切正常.
果然是由于/etc/hosts中出现无效记录,从而使得RAC安装检查无法通过,再次提醒各位安装RAC需要小心hosts文件
参考文档:PRVF-4190 Verification of the Hosts Config File Failed (Doc ID 1056025.1)
[INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes. (Doc ID 1427202.1)

11.2.0.4 GI单独安装tfa

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:11.2.0.4 GI单独安装tfa

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在11.2.0.4安装rac执行root.sh之时需要在root的环境变量中指定可以直接执行unzip命令(在非Linux,Win环境下root用户默认环境变量无unzip命令),如果忘记执行配置相应的PATH环境变量,将导致tfa不被安装,事后安装步骤
未安装tfa现状

--/etc/inittab中只有这一条和gi相关
h1:2:respawn:/etc/init.ohasd run >/dev/null 2>&1 </dev/null
--只启动了init.ohasd
xifenf01/oradata/sys/soft#ps -ef|grep init
    root        1        0   0 13:12:11      -  0:00 /etc/init
    root 30539998        1   0 18:58:17      -  0:00 /bin/sh /etc/init.ohasd run
    root 31391906  5177692   0 19:44:15  pts/1  0:00 grep init

使用tfa_setup.sh脚本安装

xifenf01/#export PATH=$PATH:/u01/oracle/app/grid/bin
xifenf01/#/u01/oracle/app/grid/crs/install/tfa_setup.sh -silent -crshome /u01/oracle/app/grid
Starting TFA installation
TFA requires BASH shell. Please install bash and try again.

提示缺少bash shell,下载相关包安装安装(系统AIX 7.1)

xifenf01/oradata/sys/soft#rpm -ivh bash-4.2-3.aix6.1.ppc.rpm
bash                        ##################################################

重新安装tfa

xifenf01/oradata/sys/soft#/u01/oracle/app/grid/crs/install/tfa_setup.sh -silent -crshome /u01/oracle/app/grid
Starting TFA installation
Using JAVA_HOME : /u01/oracle/app/grid/jdk/jre
Running Auto Setup for TFA as user root...
The following installation requires temporary use of SSH.
If SSH is not configured already then we will remove SSH
when complete.
Installing TFA now...
TFA Will be Installed on xifenf01...
TFA will scan the following Directories
++++++++++++++++++++++++++++++++++++++++++++
.-----------------------------------------------------.
|                       xifenf01                      |
+------------------------------------------+----------+
| Trace Directory                          | Resource |
+------------------------------------------+----------+
| /u01/oracle/app/grid/OPatch/crs/log      | CRS      |
| /u01/oracle/app/grid/cfgtoollogs         | INSTALL  |
| /u01/oracle/app/grid/crs/log             | CRS      |
| /u01/oracle/app/grid/cv/log              | CRS      |
| /u01/oracle/app/grid/evm/admin/log       | CRS      |
| /u01/oracle/app/grid/evm/admin/logger    | CRS      |
| /u01/oracle/app/grid/evm/log             | CRS      |
| /u01/oracle/app/grid/install             | INSTALL  |
| /u01/oracle/app/grid/log                 | CRS      |
| /u01/oracle/app/grid/log/                | CRS      |
| /u01/oracle/app/grid/network/log         | CRS      |
| /u01/oracle/app/grid/oc4j/j2ee/home/log  | CRSOC4J  |
| /u01/oracle/app/grid/opmn/logs           | CRS      |
| /u01/oracle/app/grid/racg/log            | CRS      |
| /u01/oracle/app/grid/rdbms/log           | ASM      |
| /u01/oracle/app/grid/scheduler/log       | CRS      |
| /u01/oracle/app/grid/srvm/log            | CRS      |
| /u01/oracle/app/oraInventory/ContentsXML | INSTALL  |
| /u01/oracle/app/oraInventory/logs        | INSTALL  |
'------------------------------------------+----------'
Installing TFA on xifenf01
HOST: xifenf01  TFA_HOME: /u01/oracle/app/grid/tfa/xifenf01/tfa_home
.-----------------------------------------------------.
| Host     | Status of TFA | PID     | Port | Version |
+----------+---------------+---------+------+---------+
| xifenf01 | RUNNING       | 7536914 | 5000 | 2.5.1.5 |
'----------+---------------+---------+------+---------'
Summary of TFA Installation:
.------------------------------------------------------------------.
|                             xifenf01                             |
+---------------------+--------------------------------------------+
| Parameter           | Value                                      |
+---------------------+--------------------------------------------+
| Install location    | /u01/oracle/app/grid/tfa/xifenf01/tfa_home |
| Repository location | /u01/oracle/app/oracle/tfa/repository      |
| Repository usage    | 0 MB out of 10240 MB                       |
'---------------------+--------------------------------------------'
TFA is successfully installed..
Usage : /u01/oracle/app/grid/tfa/bin/tfactl <command> [options]
<command> =
         print        Print requested details
         purge        Delete collections from TFA repository
         directory    Add or Remove or Modify directory in TFA
         host         Add or Remove host in TFA
         set          Turn ON/OFF or Modify various TFA features
         diagcollect  Collect logs from across nodes in cluster
For help with a command: /u01/oracle/app/grid/tfa/bin/tfactl <command> -help

安装tfa成功后

--/etc/inittab
h1:2:respawn:/etc/init.ohasd run >/dev/null 2>&1 </dev/null
htfa:2:respawn:/etc/init.tfa run >/dev/null 2>&1 </dev/null
--init.tfa进程存在
xifenf01/oradata/sys/soft#ps -ef|grep init
    root        1        0   0 13:12:11      -  0:00 /etc/init
    root 30277638        1   0 19:26:37      -  0:00 /bin/sh /etc/init.tfa run
    root 30539998        1   0 18:58:17      -  0:00 /bin/sh /etc/init.ohasd run
    root 31391906  5177692   0 19:44:15  pts/1  0:00 grep init

另外一个节点安装tfa

xifenf02/#export PATH=$PATH:/u01/oracle/app/grid/bin
xifenf02/oradata/sys/soft#rpm -ivh bash-4.2-3.aix6.1.ppc.rpm.rpm
bash                        ##################################################
xifenf02/#/u01/oracle/app/grid/crs/install/tfa_setup.sh -silent -crshome /u01/oracle/app/grid
Starting TFA installation
Using JAVA_HOME : /u01/oracle/app/grid/jdk/jre
Running Auto Setup for TFA as user root...
The following installation requires temporary use of SSH.
If SSH is not configured already then we will remove SSH
when complete.
Installing TFA now...
TFA Will be Installed on xifenf02...
TFA will scan the following Directories
++++++++++++++++++++++++++++++++++++++++++++
.-----------------------------------------------------.
|                       xifenf02                      |
+------------------------------------------+----------+
| Trace Directory                          | Resource |
+------------------------------------------+----------+
| /u01/oracle/app/grid/OPatch/crs/log      | CRS      |
| /u01/oracle/app/grid/cfgtoollogs         | INSTALL  |
| /u01/oracle/app/grid/crs/log             | CRS      |
| /u01/oracle/app/grid/cv/log              | CRS      |
| /u01/oracle/app/grid/evm/admin/log       | CRS      |
| /u01/oracle/app/grid/evm/admin/logger    | CRS      |
| /u01/oracle/app/grid/evm/log             | CRS      |
| /u01/oracle/app/grid/install             | INSTALL  |
| /u01/oracle/app/grid/log                 | CRS      |
| /u01/oracle/app/grid/log/                | CRS      |
| /u01/oracle/app/grid/network/log         | CRS      |
| /u01/oracle/app/grid/oc4j/j2ee/home/log  | CRSOC4J  |
| /u01/oracle/app/grid/opmn/logs           | CRS      |
| /u01/oracle/app/grid/racg/log            | CRS      |
| /u01/oracle/app/grid/rdbms/log           | ASM      |
| /u01/oracle/app/grid/scheduler/log       | CRS      |
| /u01/oracle/app/grid/srvm/log            | CRS      |
| /u01/oracle/app/oraInventory/ContentsXML | INSTALL  |
| /u01/oracle/app/oraInventory/logs        | INSTALL  |
'------------------------------------------+----------'
Installing TFA on xifenf02
HOST: xifenf02  TFA_HOME: /u01/oracle/app/grid/tfa/xifenf02/tfa_home
.-----------------------------------------------------.
| Host     | Status of TFA | PID     | Port | Version |
+----------+---------------+---------+------+---------+
| xifenf02 | RUNNING       | 5898636 | 5000 | 2.5.1.5 |
| xifenf01 | RUNNING       | 7536914 | 5000 | 2.5.1.5 |
'----------+---------------+---------+------+---------'
Summary of TFA Installation:
.------------------------------------------------------------------.
|                             xifenf02                             |
+---------------------+--------------------------------------------+
| Parameter           | Value                                      |
+---------------------+--------------------------------------------+
| Install location    | /u01/oracle/app/grid/tfa/xifenf02/tfa_home |
| Repository location | /u01/oracle/app/oracle/tfa/repository      |
| Repository usage    | 0 MB out of 10240 MB                       |
'---------------------+--------------------------------------------'
TFA is successfully installed..
Usage : /u01/oracle/app/grid/tfa/bin/tfactl <command> [options]
<command> =
         print        Print requested details
         purge        Delete collections from TFA repository
         directory    Add or Remove or Modify directory in TFA
         host         Add or Remove host in TFA
         set          Turn ON/OFF or Modify various TFA features
         diagcollect  Collect logs from across nodes in cluster
For help with a command: /u01/oracle/app/grid/tfa/bin/tfactl <command> -help