多cpu环境中运行root.sh失败,asm报ORA-04031

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:多cpu环境中运行root.sh失败,asm报ORA-04031

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友和我反馈,说他们在装linux 6.5上面装11.2.0.3的rac出现异常,root.sh在第一个节点执行就失败了,请求帮助
root.sh-asm-fail


根据上面记录,查看asmca日志

[main] [ 2015-07-24 12:49:35.885 CST ] [SQLEngine.reInitialize:738]  Reinitializing SQLEngine...
[main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:889]  OracleHome.getVersion called.  Current Version: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:957]  Current Version From Inventory: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:889]  OracleHome.getVersion called.  Current Version: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:957]  Current Version From Inventory: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:889]  OracleHome.getVersion called.  Current Version: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:957]  Current Version From Inventory: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.886 CST ] [SQLPlusEngine.getCmmdParams:222]  m_home 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.887 CST ] [SQLPlusEngine.getCmmdParams:223]  version > 112 true
[main] [ 2015-07-24 12:49:35.887 CST ] [SQLEngine.getEnvParams:555]  Default NLS_LANG: AMERICAN_AMERICA.AL32UTF8
[main] [ 2015-07-24 12:49:35.887 CST ] [SQLEngine.getEnvParams:565]  NLS_LANG: AMERICAN_AMERICA.AL32UTF8
[main] [ 2015-07-24 12:49:35.888 CST ] [SQLEngine.initialize:325]  Execing SQLPLUS/SVRMGR process...
[main] [ 2015-07-24 12:49:35.900 CST ] [SQLEngine.initialize:362]  m_bReaderStarted: false
[main] [ 2015-07-24 12:49:35.900 CST ] [SQLEngine.initialize:366]  Starting Reader Thread...
[main] [ 2015-07-24 12:49:35.901 CST ] [SQLEngine.initialize:415]  Waiting for m_bReaderStarted to be true
[main] [ 2015-07-24 12:49:35.972 CST ] [SQLEngine.done:2189]  Done called
[main] [ 2015-07-24 12:49:35.972 CST ] [UsmcaLogger.logException:173]  SEVERE:method oracle.sysman.assistants.usmca.backend.USMInstance:configureLocalASM
[main] [ 2015-07-24 12:49:35.973 CST ] [UsmcaLogger.logException:174]  ORA-01012: not logged on
[main] [ 2015-07-24 12:49:35.973 CST ] [UsmcaLogger.logException:175]  oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-01012: not logged on
oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeImpl(SQLEngine.java:1658)
oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeQuery(SQLEngine.java:831)
oracle.sysman.assistants.usmca.backend.USMInstance.configureLocalASM(USMInstance.java:3036)
oracle.sysman.assistants.usmca.service.UsmcaService.configureLocalASM(UsmcaService.java:1049)
oracle.sysman.assistants.usmca.model.UsmcaModel.performConfigureLocalASM(UsmcaModel.java:944)
oracle.sysman.assistants.usmca.model.UsmcaModel.performOperation(UsmcaModel.java:797)
oracle.sysman.assistants.usmca.Usmca.execute(Usmca.java:174)
oracle.sysman.assistants.usmca.Usmca.main(Usmca.java:369)
[main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:173]  SEVERE:method oracle.sysman.assistants.usmca.backend.USMInstance:configureLocalASM
[main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:174]  ORA-03113: end-of-file on communication channel
[main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:175]  oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-03113: end-of-file on communication channel

这里可以看出来,asm实例无法登陆(ORA-01012和ORA-03113),根据这样的错误,分析asm日志

Reconfiguration complete
Fri Jul 24 12:49:29 2015
LCK0 started with pid=22, OS id=46913
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lmd0_46887.trc  (incident=81):
ORA-04031: unable to allocate 7072 bytes of shared memory ("shared pool","unknown object","sga heap(1,1)","ges resource ")
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_81/+ASM1_lmd0_46887_i81.trc
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lck0_46913.trc  (incident=177):
ORA-04031: unable to allocate 760 bytes of shared memory ("shared pool","unknown object","KKSSP^1343","kglss")
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_177/+ASM1_lck0_46913_i177.trc
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lmon_46885.trc  (incident=73):
ORA-04031: unable to allocate 632 bytes of shared memory ("shared pool","unknown object","sga heap(1,1)","name-service ")
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_73/+ASM1_lmon_46885_i73.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lck0_46913.trc:
ORA-04031: unable to allocate 760 bytes of shared memory ("shared pool","unknown object","KKSSP^1343","kglss")
System state dump requested by (instance=1, osid=46913 (LCK0)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_diag_46879.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
LCK0 (ospid: 46913): terminating the instance due to error 4031
Fri Jul 24 12:49:35 2015
ORA-1092 : opitsk aborting process
Instance terminated by LCK0, pid = 46913

进一步分析asm日志,发现是大家熟悉的asm的ORA-4031问题,那就是说明数据库在执行root.sh的时候使用默认参数文件启动asm的时候shared pool不够大(根据ORACLE最佳实践,建议memory_target=1536M及其以上值),从而出现该问题。类似Bug 14292825 ORA-4031 in ASM as default memory parameters values for 11.2 ASM instances low,根据官方描述该问题在11.2.0.4中修复
BUG-14292825


通过asm日志发现相关默认值配置

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options.
ORACLE_HOME = /u01/app/11.2.0/grid
System name:	Linux
Node name:	RAC01
Release:	2.6.32-358.el6.x86_64
Version:	#1 SMP Tue Jan 29 11:47:41 EST 2013
Machine:	x86_64
Using parameter settings in client-side pfile /u01/app/11.2.0/grid/dbs/init+ASM1.ora on machine RAC01
System parameters with non-default values:
  large_pool_size          = 16M
  instance_type            = "asm"
  remote_login_passwordfile= "EXCLUSIVE"
  asm_power_limit          = 1
  diagnostic_dest          = "/u01/app/grid"
Cluster communication is configured to use the following interface(s) for this instance
  10.10.10.31
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Fri Jul 24 12:49:27 2015

通过查询/proc/cpuinfo,检查cpu数量

processor	: 191
vendor_id	: GenuineIntel
cpu family	: 6
model		: 62
model name	: Intel(R) Xeon(R) CPU E7-8850 v2 @ 2.30GHz
stepping	: 7
cpu MHz		: 1200.000
cache size	: 24576 KB
physical id	: 7
siblings	: 24
core id		: 13
cpu cores	: 12
apicid		: 251
initial apicid	: 251
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes

而根据How To Determine The Default Number Of Subpools Allocated During Startup (Doc ID 455179.1)中描述
最多7个subpool(这里一共有192个cpu,因此subpool数量为7)
1


每个suppool最少512m内存,因此shared pool最小需要3.5G(而默认值几百M,远远不够)
2


由于cpu多,导致shared pool的Subpools 更加多,使得shared pool的需求量更加大。至此本次故障原因可以总结:
由于cpu较多,需要更多的shared pool,而11.2.0.3中由于asm默认内存分配较少,导致在asm启动之时出现shared pool不足(本身默认值小,而且shared pool需求大,从而出现了ORA-04031就不奇怪了),因为运行root.sh过程中asm无法正常启动,从而使得root.sh运行失败。
处理办法:临时disable部分cpu,然后重新执行root.sh,修改asm内存分配,再enable cpu.
特别说明:此故障acs的兄弟遇到过,所以这次我能够快速反应,感谢acs兄弟们的帮忙,另外有权限的朋友可以看看:3-10479952701和3-7976215751等sr描述

记录解决一次Listener状态为Not All Endpoints Registered的故障

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:记录解决一次Listener状态为Not All Endpoints Registered的故障

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户反馈系统异常无法正常访问,检查发现监听异常

C:\Users\Administrator>crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       rac2
ora.LISTENER.lsnr
               ONLINE  INTERMEDIATE rac2                     Not All Endpoints R
                                                             egistered
ora.asm
               ONLINE  ONLINE       rac2                     Started
ora.gsd
               OFFLINE OFFLINE      rac2
ora.net1.network
               ONLINE  ONLINE       rac2
ora.ons
               ONLINE  ONLINE       rac2
ora.registry.acfs
               ONLINE  ONLINE       rac2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  INTERMEDIATE rac2                     Not All Endpoints R
                                                             egistered
ora.cvu
      1        ONLINE  ONLINE       rac2
ora.oc4j
      1        ONLINE  ONLINE       rac2
ora.rac.db
      1        ONLINE  ONLINE       rac2                     Open
      2        ONLINE  OFFLINE
ora.rac1.vip
      1        ONLINE  OFFLINE
ora.rac2.vip
      1        ONLINE  OFFLINE
ora.scan1.vip
      1        ONLINE  OFFLINE
C:\Users\Administrator>lsnrctl status
LSNRCTL for 64-bit Windows: Version 11.2.0.3.0 - Production on 12-6月 -2015 15:50:43
Copyright (c) 1991, 2011, Oracle.  All rights reserved.
正在连接到 (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
LISTENER 的 STATUS
------------------------
别名                      LISTENER
版本                      TNSLSNR for 64-bit Windows: Version 11.2.0.3.0 - Production
启动日期                  12-6月 -2015 15:31:30
正常运行时间              0 天 0 小时 19 分 20 秒
跟踪级别                  off
安全性                    ON: Local OS Authentication
SNMP                      OFF
监听程序参数文件          D:\app\11.2.0\grid\network\admin\listener.ora
监听程序日志文件          D:\app\11.2.0\grid\log\diag\tnslsnr\rac2\listener\alert\log.xml
监听端点概要...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=\\.\pipe\LISTENERipc)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.63.64.70)(PORT=1521)))
监听程序不支持服务
命令执行成功

通过这里可以看到LISTENER和LISTENER_SCAN1为Not All Endpoints Registered状态,而且这个RAC只有一个节点rac2,rac1节点未加入到集群中.进一步检查IP和hosts文件

C:\Users\Administrator>ipconfig -all
Windows IP 配置
   主机名  . . . . . . . . . . . . . : rac2
   主 DNS 后缀 . . . . . . . . . . . :
   节点类型  . . . . . . . . . . . . : 混合
   IP 路由已启用 . . . . . . . . . . : 否
   WINS 代理已启用 . . . . . . . . . : 否
以太网适配器 pub:
   连接特定的 DNS 后缀 . . . . . . . :
   描述. . . . . . . . . . . . . . . : Intel(R) 82576 Gigabit Dual Port Network Connection #2
   物理地址. . . . . . . . . . . . . : 00-25-90-5A-0F-47
   DHCP 已启用 . . . . . . . . . . . : 否
   自动配置已启用. . . . . . . . . . : 是
   本地链接 IPv6 地址. . . . . . . . : fe80::c5ef:663f:7333:45f2%12(首选)
   IPv4 地址 . . . . . . . . . . . . : 10.63.64.70(首选)
   子网掩码  . . . . . . . . . . . . : 255.255.255.192
   默认网关. . . . . . . . . . . . . : 10.63.64.126
   DHCPv6 IAID . . . . . . . . . . . : 301999504
   DHCPv6 客户端 DUID  . . . . . . . : 00-01-00-01-1A-5C-19-A1-00-25-90-5A-0F-46
   DNS 服务器  . . . . . . . . . . . : 218.30.19.40
   TCPIP 上的 NetBIOS  . . . . . . . : 已启用
以太网适配器 priv:
   连接特定的 DNS 后缀 . . . . . . . :
   描述. . . . . . . . . . . . . . . : Intel(R) 82576 Gigabit Dual Port Network Connection
   物理地址. . . . . . . . . . . . . : 00-25-90-5A-0F-46
   DHCP 已启用 . . . . . . . . . . . : 否
   自动配置已启用. . . . . . . . . . : 是
   本地链接 IPv6 地址. . . . . . . . : fe80::c88d:78ff:d2e8:bde1%11(首选)
   IPv4 地址 . . . . . . . . . . . . : 10.10.1.2(首选)
   子网掩码  . . . . . . . . . . . . : 255.255.255.0
   默认网关. . . . . . . . . . . . . :
   DHCPv6 IAID . . . . . . . . . . . : 234890640
   DHCPv6 客户端 DUID  . . . . . . . : 00-01-00-01-1A-5C-19-A1-00-25-90-5A-0F-46
   DNS 服务器  . . . . . . . . . . . : fec0:0:0:ffff::1%1
                                       fec0:0:0:ffff::2%1
                                       fec0:0:0:ffff::3%1
   TCPIP 上的 NetBIOS  . . . . . . . : 已启用
--hosts文件
10.63.64.69		rac1
10.63.64.70		rac2
10.63.64.71		rac1-vip
10.63.64.72		rac2-vip
10.63.64.73		scan-cluster
10.10.1.1		rac1-priv
10.10.1.2		rac2-priv

这里可以看到主机之上的pub网卡只有一个ip 10.63.64.70,不太符合我们对rac的理解(一般来说其上应该有vip,部分情况下甚至可能有scan ip),尝试ping vip和scan ip

C:\Users\Administrator>ping 10.63.64.72
正在 Ping 10.63.64.72 具有 32 字节的数据:
来自 10.63.64.72 的回复: 字节=32 时间<1ms TTL=128
来自 10.63.64.72 的回复: 字节=32 时间<1ms TTL=128
来自 10.63.64.72 的回复: 字节=32 时间<1ms TTL=128
10.63.64.72 的 Ping 统计信息:
    数据包: 已发送 = 3,已接收 = 3,丢失 = 0 (0% 丢失),
往返行程的估计时间(以毫秒为单位):
    最短 = 0ms,最长 = 0ms,平均 = 0ms
Control-C
^C
C:\Users\Administrator>ping 10.63.64.73
正在 Ping 10.63.64.73 具有 32 字节的数据:
来自 10.63.64.73 的回复: 字节=32 时间<1ms TTL=128
来自 10.63.64.73 的回复: 字节=32 时间<1ms TTL=128
来自 10.63.64.73 的回复: 字节=32 时间<1ms TTL=128
10.63.64.73 的 Ping 统计信息:
    数据包: 已发送 = 3,已接收 = 3,丢失 = 0 (0% 丢失),
往返行程的估计时间(以毫秒为单位):
    最短 = 0ms,最长 = 0ms,平均 = 0ms

这里发现一个异常问题:crs显示只有rac2在集群之中,而该主机ip中又不存在vip和scan ip属于异常情况,但是这两个ip又可以ping通,基于这样情况,我第一反应就是vip和scanip可能飘到rac1中了,而rac1又未正常加入到crs中(因为这个库以前处理过,由于rac1的hba卡有问题,数据库无法正常启动,crs起来也无法提供工作),检查rac1机器情况

C:\Users\Administrator>crsctl status res -t
CRS-4535: 无法与集群就绪服务通信
CRS-4000: 命令 Status 失败, 或已完成但出现错误。
C:\Users\Administrator>crsctl status res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       rac1                     Started
ora.crf
      1        ONLINE  ONLINE       rac1
ora.crsd
      1        ONLINE  OFFLINE
ora.cssd
      1        ONLINE  ONLINE       rac1
ora.cssdmonitor
      1        ONLINE  ONLINE       rac1
ora.ctssd
      1        ONLINE  ONLINE       rac1                     OBSERVER
ora.drivers.acfs
      1        ONLINE  ONLINE       rac1
ora.evmd
      1        ONLINE  ONLINE       rac1
ora.gipcd
      1        ONLINE  ONLINE       rac1
ora.gpnpd
      1        ONLINE  ONLINE       rac1
ora.mdnsd
      1        ONLINE  ONLINE       rac1
以太网适配器 pub:
   连接特定的 DNS 后缀 . . . . . . . :
   描述. . . . . . . . . . . . . . . : Intel(R) 82576 Gigabit Dual Port Network Connection
   物理地址. . . . . . . . . . . . . : 00-25-90-5A-0E-E7
   DHCP 已启用 . . . . . . . . . . . : 否
   自动配置已启用. . . . . . . . . . : 是
   本地链接 IPv6 地址. . . . . . . . : fe80::409d:8c2e:446b:af42%11(首选)
   IPv4 地址 . . . . . . . . . . . . : 10.63.64.69(首选)
   子网掩码  . . . . . . . . . . . . : 255.255.255.192
   IPv4 地址 . . . . . . . . . . . . : 10.63.64.71(首选)
   子网掩码  . . . . . . . . . . . . : 255.255.255.192
   IPv4 地址 . . . . . . . . . . . . : 10.63.64.72(首选)
   子网掩码  . . . . . . . . . . . . : 255.255.255.192
   IPv4 地址 . . . . . . . . . . . . : 10.63.64.73(首选)
   子网掩码  . . . . . . . . . . . . : 255.255.255.192
   默认网关. . . . . . . . . . . . . : 10.63.64.126
   DHCPv6 IAID . . . . . . . . . . . : 234890640
   DHCPv6 客户端 DUID  . . . . . . . : 00-01-00-01-1A-5C-19-0A-00-25-90-5A-0E-E7
   DNS 服务器  . . . . . . . . . . . : 8.8.8.8
   TCPIP 上的 NetBIOS  . . . . . . . : 已启用
以太网适配器 priv:
   连接特定的 DNS 后缀 . . . . . . . :
   描述. . . . . . . . . . . . . . . : Intel(R) 82576 Gigabit Dual Port Network Connection #2
   物理地址. . . . . . . . . . . . . : 00-25-90-5A-0E-E6
   DHCP 已启用 . . . . . . . . . . . : 否
   自动配置已启用. . . . . . . . . . : 是
   本地链接 IPv6 地址. . . . . . . . : fe80::154:dad7:f9e3:bea3%13(首选)
   IPv4 地址 . . . . . . . . . . . . : 10.10.1.1(首选)
   子网掩码  . . . . . . . . . . . . : 255.255.255.0
   默认网关. . . . . . . . . . . . . :
   DHCPv6 IAID . . . . . . . . . . . : 301999504
   DHCPv6 客户端 DUID  . . . . . . . : 00-01-00-01-1A-5C-19-0A-00-25-90-5A-0E-E7
   DNS 服务器  . . . . . . . . . . . : fec0:0:0:ffff::1%1
                                       fec0:0:0:ffff::2%1
                                       fec0:0:0:ffff::3%1
   TCPIP 上的 NetBIOS  . . . . . . . : 已启用

果然这里rac2的vip和scan ip都漂到rac1中,但是crs状态属于不正常情况,由于rac1无法正常使用,关闭该主机,并重启rac2(由于rac2处于异常情况无法正常工作),后续rac2恢复正常

C:\Users\Administrator>crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       rac2
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac2
ora.asm
               ONLINE  ONLINE       rac2                     Started
ora.gsd
               OFFLINE OFFLINE      rac2
ora.net1.network
               ONLINE  ONLINE       rac2
ora.ons
               ONLINE  ONLINE       rac2
ora.registry.acfs
               ONLINE  ONLINE       rac2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac2
ora.cvu
      1        ONLINE  ONLINE       rac2
ora.oc4j
      1        ONLINE  ONLINE       rac2
ora.rac.db
      1        OFFLINE OFFLINE                               Instance Shutdown
      2        ONLINE  ONLINE       rac2                     Open
ora.rac1.vip
      1        ONLINE  INTERMEDIATE rac2                     FAILED OVER
ora.rac2.vip
      1        ONLINE  ONLINE       rac2
ora.scan1.vip
      1        ONLINE  ONLINE       rac2
C:\Users\Administrator>lsnrctl status
LSNRCTL for 64-bit Windows: Version 11.2.0.3.0 - Production on 12-6月 -2015 17:02:46
Copyright (c) 1991, 2011, Oracle.  All rights reserved.
正在连接到 (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
LISTENER 的 STATUS
------------------------
别名                      LISTENER
版本                      TNSLSNR for 64-bit Windows: Version 11.2.0.3.0 - Production
启动日期                  12-6月 -2015 16:44:43
正常运行时间              0 天 0 小时 18 分 3 秒
跟踪级别                  off
安全性                    ON: Local OS Authentication
SNMP                      OFF
监听程序参数文件          D:\app\11.2.0\grid\network\admin\listener.ora
监听程序日志文件          D:\app\11.2.0\grid\log\diag\tnslsnr\rac2\listener\alert\log.xml
监听端点概要...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=\\.\pipe\LISTENERipc)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.63.64.70)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.63.64.72)(PORT=1521)))
服务摘要..
服务 "+ASM" 包含 1 个实例。
  实例 "+asm2", 状态 READY, 包含此服务的 1 个处理程序...
服务 "rac" 包含 1 个实例。
  实例 "rac2", 状态 READY, 包含此服务的 1 个处理程序...
服务 "racXDB" 包含 1 个实例。
  实例 "rac2", 状态 READY, 包含此服务的 1 个处理程序...
命令执行成功

出现该问题的原因至此可以总结出来:由于rac1和rac2的集群处于异常状态,rac1持有了vip和scan ip,但是又未正常加入crs,导致rac2无法获得vip和scan ip,从而使得LISTENER和LISTENER_SCAN1为Not All Endpoints Registered状态.另外对于不能正常工作的集群节点,建议关闭crs,甚至可以考虑关闭主机,减少异常节点对正常节点的影响.关于该类问题的分析,可以从Scan Listener In INTERMEDIATE Mode Not All Endpoints Registered (Doc ID 1667873.1)中找到依据,证明是由于IP被占用导致.

init.cssd startcheck—HP Service Guard未启动导致CRS无法正常启动

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:init.cssd startcheck—HP Service Guard未启动导致CRS无法正常启动

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

早上到客户现场,客户告知有一套环境替换OCR和VOTEDISK之后,crs无法启动,让我看看。环境:HP RAC(只用一个节点)+10.2.0.5 Oracle 数据库
start crs显示正常,但是无法启动

# /app/oracle/product/10.2.0/crs/bin/crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly
# ps -ef|grep crs
    root  6461     1  0  May 19  ?         0:00 /bin/sh /sbin/init.d/init.crsd run
    root 29719 23678  0 10:04:51 pts/tc    0:00 grep crs

也无任何日志

[xifenfei01][orawj][/root/xifenfei]#ls -ltr
total 148
drwxr-x---   2 oracle     dba             96 May 15  2014 admin
drwxr-x---   2 root       dba             96 May 15  2014 crsd
drwxr-x---   2 oracle     dba             96 May 15  2014 evmd
drwxrwxr-t   5 oracle     dba           1024 Jun  4  2014 racg
drwxr-x---   5 oracle     dba           1024 May 17 22:50 cssd
-rw-rw-r--   1 root       dba          61568 May 24 15:26 alertxifenfei01.log
drwxr-x---   2 oracle     dba           3072 May 24 15:43 client
[xifenfei01][orawj][/root/xifenfei]#date
Mon, May 25, 2015 11:30:09 AM

表决磁盘和OCR信息

[xifenfei01][orawj][/root/xifenfei]#ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :    1441492
         Used space (kbytes)      :       5972
         Available space (kbytes) :    1435520
         ID                       : 1714667730
         Device/File Name         : /dev/vgc01/rCMPR_VGC01_OCR1
                                    Device/File integrity check succeeded
         Device/File Name         : /dev/vgc02/rCMPR_VGC02_OCR2
                                    Device/File integrity check succeeded
         Cluster registry integrity check succeeded
[xifenfei01][orawj][/root/xifenfei]#crsctl query css votedisk
 0.     0    /dev/vgc01/rCMPR_VGC01_VOTE1
 1.     0    /dev/vgc02/rCMPR_VGC02_VOTE2
 2.     0    /dev/vgc03/rCMPR_VGC03_VOTE3
located 3 votedisk(s).

ocr.loc文件路径

# more /var/opt/oracle/ocr.loc
#Device/file /dev/vgc02/rCMPR_VGC02_OCR2 getting replaced by device /dev/vgc02/rCMPR_VGC02_OCR2
ocrconfig_loc=/dev/vgc01/rCMPR_VGC01_OCR1
ocrmirrorconfig_loc=/dev/vgc02/rCMPR_VGC02_OCR2
local_only=false

这里可以看出来表决磁盘和ocr等相关信息正常

显示init.cssd startcheck进程

[xifenfei01][orawj][/root/xifenfei]#ps -ef|grep init
    root     1     0  0  May 19  ?         0:03 init
    root   119     0  0  May 19  ?         0:00 pagetable_init_daemon
    root   115     0  0  May 19  ?         0:00 mdep_initiator_thread
    root 26820 26792  0 10:49:53 ?         0:00 /bin/sh /sbin/init.d/init.cssd startcheck
    root 26791     1  0 10:49:53 ?         0:00 /bin/sh /sbin/init.d/init.crsd run
    root 27183 23698  0 10:50:23 ?         0:00 /bin/sh /sbin/init.d/init.cssd startcheck
    root 26792     1  0 10:49:53 ?         0:00 /bin/sh /sbin/init.d/init.cssd fatal
    root 23698     1  0 10:45:23 ?         0:00 /bin/sh /sbin/init.d/init.evmd run
    root 26816 26791  0 10:49:53 ?         0:00 /bin/sh /sbin/init.d/init.cssd startcheck
  oracle 20534 11033  0 11:30:35 pts/ta    0:00 grep init

这里的init.cssd startcheck大部分情况下,是由于不能访问存储或者第三方集群件无法访问导致

查看vg状态

VG Name                     /dev/vgc01
VG Write Access             read/write
VG Status                   available
Max LV                      255
Cur LV                      9
Open LV                     9
Max PV                      255
Cur PV                      1
Act PV                      1
Max PE per PV               3200
VGDA                        2
PE Size (Mbytes)            32
Total PE                    3199
Alloc PE                    736
Free PE                     2463
Total PVG                   0
Total Spare PVs             0
Total Spare PVs in use      0
VG Version                  1.0
VG Max Size                 25500g
VG Max Extents              816000
VG Name                     /dev/vgc02
VG Write Access             read/write
VG Status                   available
Max LV                      255
Cur LV                      9
Open LV                     9
Max PV                      255
Cur PV                      1
Act PV                      1
Max PE per PV               3200
VGDA                        2
PE Size (Mbytes)            32
Total PE                    3199
Alloc PE                    736
Free PE                     2463
Total PVG                   0
Total Spare PVs             0
Total Spare PVs in use      0
VG Version                  1.0
VG Max Size                 25500g
VG Max Extents              816000
VG Name                     /dev/vgc03
VG Write Access             read/write
VG Status                   available
Max LV                      255
Cur LV                      6
Open LV                     6
Max PV                      255
Cur PV                      1
Act PV                      1
Max PE per PV               3200
VGDA                        2
PE Size (Mbytes)            32
Total PE                    3199
Alloc PE                    448
Free PE                     2751
Total PVG                   0
Total Spare PVs             0
Total Spare PVs in use      0
VG Version                  1.0
VG Max Size                 25500g
VG Max Extents              816000

这里可以看到,三个存放表决磁盘和ocr的vg都是available的

看votedisk和ocr权限

# ls -l /dev/vgc0*/rCMPR*|grep -v .dbf|grep -v .log|grep -v .ctl
crw-r-----   1 oracle     dba         64 0x020008 May 24 14:40 /dev/vgc01/rCMPR_VGC01_OCR1
crw-r-----   1 oracle     dba         64 0x020009 May 24 14:41 /dev/vgc01/rCMPR_VGC01_VOTE1
crw-r-----   1 oracle     dba         64 0x030008 May 24 14:41 /dev/vgc02/rCMPR_VGC02_OCR2
crw-r-----   1 oracle     dba         64 0x030009 May 24 14:41 /dev/vgc02/rCMPR_VGC02_VOTE2
crw-r-----   1 oracle     dba         64 0x040006 May 24 14:41 /dev/vgc03/rCMPR_VGC03_VOTE3

直接修改权限为777,然后尝试

# chmod 777 /dev/vgc0*/rCMPR*|grep -v .dbf|grep -v .log|grep -v .ctl
#  ls -l /dev/vgc0*/rCMPR*|grep -v .dbf|grep -v .log|grep -v .ctl
crwxrwxrwx   1 oracle     dba         64 0x020008 May 24 14:40 /dev/vgc01/rCMPR_VGC01_OCR1
crwxrwxrwx   1 oracle     dba         64 0x020009 May 24 14:41 /dev/vgc01/rCMPR_VGC01_VOTE1
crwxrwxrwx   1 oracle     dba         64 0x030008 May 24 14:41 /dev/vgc02/rCMPR_VGC02_OCR2
crwxrwxrwx   1 oracle     dba         64 0x030009 May 24 14:41 /dev/vgc02/rCMPR_VGC02_VOTE2
crwxrwxrwx   1 oracle     dba         64 0x040006 May 24 14:41 /dev/vgc03/rCMPR_VGC03_VOTE3

kill相关进程重试

# ps -ef|grep init
    root     1     0  0  May 19  ?         0:03 init
    root   119     0  0  May 19  ?         0:00 pagetable_init_daemon
    root   115     0  0  May 19  ?         0:00 mdep_initiator_thread
    root  6458     1  0  May 19  ?         0:00 /bin/sh /sbin/init.d/init.evmd run
    root 20975     1  0 10:40:11 ?         0:00 /bin/sh /sbin/init.d/init.crsd run
    root 20976     1  0 10:40:11 ?         0:00 /bin/sh /sbin/init.d/init.cssd fatal
    root 21006 20976  0 10:40:11 ?         0:00 /bin/sh /sbin/init.d/init.cssd startcheck
    root 20997 20975  0 10:40:11 ?         0:00 /bin/sh /sbin/init.d/init.cssd startcheck
    root 21152 23678  0 10:40:18 pts/tc    0:00 grep init
vi /etc/inittab
#h1:3:respawn:/sbin/init.d/init.evmd run >/dev/null 2>&1 </dev/null
#h2:3:respawn:/sbin/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null
#h3:3:respawn:/sbin/init.d/init.crsd run >/dev/null 2>&1 </dev/null
# /sbin/init q
# ps -ef|grep init.c | grep -v grep | awk '{print $2}' |xargs kill -9
# ps -ef|grep init
    root     1     0  0  May 19  ?         0:03 init
    root   119     0  0  May 19  ?         0:00 pagetable_init_daemon
    root   115     0  0  May 19  ?         0:00 mdep_initiator_thread
    root 21744 23678  1 10:42:31 pts/tc    0:00 grep init

重新启动init进程

vi /etc/inittab
h1:3:respawn:/sbin/init.d/init.evmd run >/dev/null 2>&1 </dev/null
h2:3:respawn:/sbin/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null
h3:3:respawn:/sbin/init.d/init.crsd run >/dev/null 2>&1 </dev/null
~
# /sbin/init q
# ps -ef|grep init
    root     1     0  0  May 19  ?         0:03 init
    root   119     0  0  May 19  ?         0:00 pagetable_init_daemon
    root   115     0  0  May 19  ?         0:00 mdep_initiator_thread
    root 23737 23706  0 10:45:23 ?         0:00 /bin/sh /sbin/init.d/init.cssd startcheck
    root 23731 23698  0 10:45:23 ?         0:00 /bin/sh /sbin/init.d/init.cssd startcheck
    root 23706     1  0 10:45:23 ?         0:00 /bin/sh /sbin/init.d/init.crsd run
    root 23698     1  0 10:45:23 ?         0:00 /bin/sh /sbin/init.d/init.evmd run
    root 23887 23678  1 10:45:28 pts/tc    0:00 grep init
    root 23746 23700  0 10:45:23 ?         0:00 /bin/sh /sbin/init.d/init.cssd startcheck
    root 23700     1  0 10:45:23 ?         0:00 /bin/sh /sbin/init.d/init.cssd fatal

证明修改lv权限,问题依旧,不是votedisk和ocr的权限和所有者导致,通过dd和strings读相关文件,发现都OK.

调试/sbin/init.d/init.cssd startcheck进程

[xifenfei01][orawj][/root/xifenfei]#sh -x  /sbin/init.d/init.cssd startcheck
+ ORA_CRS_HOME=/app/oracle/product/10.2.0/crs
+ ORACLE_USER=oracle
+ ORACLE_HOME=/app/oracle/product/10.2.0/crs
+ export ORACLE_HOME
+ export ORA_CRS_HOME
+ export ORACLE_USER
+ DISABLE_OPROCD=false
+ OPROCD_DEFAULT_TIMEOUT=1000
+ OPROCD_DEFAULT_MARGIN=500
+ OPROCD_CHECK_TIMEOUT=2000
+ OPROCD_STOP_TIMEOUT=2000
+ OPROCD_DEFAULT_HISTORGRAM=
+ HOSTN=/bin/hostname
+ EXPRN=/usr/bin/expr
+ CUT=/usr/bin/cut
+ AWK=/bin/awk
+ ECHO=echo
+ TR=/bin/tr
+ /bin/uname
+ [ SunOS = HP-UX ]
+ /bin/uname
+ [ Linux = HP-UX ]
+ + /bin/hostname
HOST=xifenfei01
+ + /usr/bin/expr xifenfei01 : .*
len1=8
+ + /usr/bin/expr match xifenfei01 [0-9]*\.[0-9]*\.[0-9]*\.[0-9]*
len2=0
+ [ 8 != 0 ]
+ + echo xifenfei01
+ /usr/bin/cut -d. -f1
HOST=xifenfei01
+ + echo xifenfei01
+ /bin/tr [:upper:] [:lower:]
HOST=xifenfei01
+ PS=/bin/ps
+ PSE=/bin/ps -e
+ PSEF=/bin/ps -ef
+ HEAD=/bin/head
+ GREP=/bin/grep
+ KILL=/bin/kill
+ KILLTERM=/bin/kill -TERM
+ KILLDIE=/bin/kill -9
+ KILLCHECK=/bin/kill -0 5852
+ SLEEP=/bin/sleep
+ NULL=/dev/null
+ UNAME=/bin/uname
+ CAT=/bin/cat
………………
+ eval /bin/true
+ /bin/true
+ [ 0 != 0 ]
+ eval /bin/ps -ef | /bin/grep '/usr/lbin/cm[g]msd' 1>/dev/null 2>/dev/null
+ /bin/grep /usr/lbin/cm[g]msd
+ /bin/ps -ef
+ 1> /dev/null 2> /dev/null
+ RC=1
+ [ 1 -ne 0 ]
+ /bin/logger -puser.err Oracle Cluster Ready Services waiting for HP-UX Service Guard to start.
+ /bin/sleep 60

这里可以通过-x调试shell脚本,发现crs在等待HP-UX Service Guard启动,从而可以确定是由于HP-UX Service Guard未启动

检查HP-UX Service Guard是否启动

[xifenfei01][orawj][/root/xifenfei]#cmviewcl
CLUSTER           STATUS
crmdb_b_cluster   down
  NODE           STATUS       STATE
  xifenfei01       down         unknown
  crmdbb02       down         unknown
UNOWNED_PACKAGES
    PACKAGE        STATUS           STATE            AUTO_RUN    NODE
    pkg1           down             halted           enabled     unowned
    pkg2           down             halted           enabled     unowned

通过这里,结合客户描述(只启动了一个节点,另外一个节点的vg未激活),可以判断出来由于只使用一个节点,在未启动Service Guard的情况下,直接激活vg,由于Service Guard未启动导致crs无法启动

hosts无效内容未注释导致RAC安装OUI的Network Interface Usage报INS-41112

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:hosts无效内容未注释导致RAC安装OUI的Network Interface Usage报INS-41112

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

安装AIX RAC过程中,在OUI的Network Interface Usage报INS-41112,从而使得安装无法继续
ins-41112
2


错误记录摘要

Cause - Installer has detected that network interface en6 does not maintain connectivity on all cluster nodes.
Action - Ensure that the chosen interface has been configured across all cluster nodes.  Additional Information:
Summary of the failed nodes xifenfei01  
- PRVF-4190 : Verification of the hosts config file failed

hosts解析配置

10.70.89.68     xifenfei01
10.70.89.69     xifenfei01-vip
10.70.89.100    xifenfei01-priv
10.70.89.71     xifenfei02
10.70.89.72     xifenfei02-vip
10.70.89.101    xifenfei02-priv
10.70.89.79     xifenfei-scan

网卡配置

xifenfei01:/u01/soft/grid> ifconfig -a
en7: flags=1e084863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
        inet 10.70.89.100 netmask 0xffffffe0 broadcast 10.70.89.127
         tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
en6: flags=1e084863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
        inet 10.70.89.68 netmask 0xffffffe0 broadcast 10.70.89.95
         tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
lo0: flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LARGESEND,CHAIN>
        inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
        inet6 ::1%1/0
         tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
xifenfei02/#ifconfig -a
en6: flags=1e084863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
        inet 10.70.89.71 netmask 0xffffffe0 broadcast 10.70.89.95
         tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
en7: flags=1e084863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
        inet 10.70.89.101 netmask 0xffffffe0 broadcast 10.70.89.127
         tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
lo0: flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LARGESEND,CHAIN>
        inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
        inet6 ::1%1/0
         tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
xifenfei01/asmdisks#netstat -in
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en7 1500 link#2 0.11.25.bd.b8.7a 107 0 132 2 0
en7 1500 10.70.89.96 10.70.89.100 107 0 132 2 0
en6 1500 link#3 0.11.25.bd.a8.93 50015 0 36963 2 0
en6 1500 10.70.89.64 10.70.89.68 50015 0 36963 2 0
lo0 16896 link#1 1589 0 1588 0 0
lo0 16896 127 127.0.0.1 1589 0 1588 0 0
lo0 16896 ::1%1 1589 0 1588 0 0
xifenfei02/asmdisks#netstat -in
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en6 1500 link#2 0.11.25.bd.a8.a9 5401 0 3660 2 0
en6 1500 10.70.89.64 10.70.89.71 5401 0 3660 2 0
en7 1500 link#3 0.11.25.bd.51.d2 129 0 123 2 0
en7 1500 10.70.89.96 10.70.89.101 129 0 123 2 0
lo0 16896 link#1 1249 0 1249 0 0
lo0 16896 127 127.0.0.1 1249 0 1249 0 0
lo0 16896 ::1%1 1249 0 1249 0 0

这里可以看到网卡和IP配置是匹配,但是两台主机网卡显示顺序不一样.查看EtherChannel配置,也是正常的
4
3


网络ping测试

xifenfei01:/u01/soft/grid> ping xifenfei01-priv
PING xifenfei01-priv: (10.70.89.100): 56 data bytes
64 bytes from 10.70.89.100: icmp_seq=0 ttl=255 time=0 ms
64 bytes from 10.70.89.100: icmp_seq=1 ttl=255 time=0 ms
xifenfei02/#ping xifenfei01-priv
PING xifenfei01-priv: (10.70.89.100): 56 data bytes
64 bytes from 10.70.89.100: icmp_seq=0 ttl=255 time=0 ms
64 bytes from 10.70.89.100: icmp_seq=1 ttl=255 time=0 ms

使用runcluvfy.sh检测网络配置

./runcluvfy.sh comp nodecon -i en7 -n xifenfei01-priv,xifenfei02-priv -verbose
xifenfei01:/u01/soft/grid> ./runcluvfy.sh comp nodecon -i en7 -n xifenfei01-priv,xifenfei02-priv -verbose
Verifying node connectivity
Checking node connectivity...
Checking hosts config file...
  Node Name     Status                    Comment
  ------------  ------------------------  ------------------------
  xifenfei02-priv  passed                    successful
  xifenfei01-priv  failed                    Invalid Entry
ERROR:
PRVF-4190 : Verification of the hosts config file failed
Interface information for node "xifenfei02-priv"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 en6    10.70.89.71     10.70.89.64     10.70.89.71     10.70.89.65     00:11:25:BD:A8:A9 1500
 en7    10.70.89.101    10.70.89.96     10.70.89.101    10.70.89.65     00:11:25:BD:51:D2 1500
Interface information for node "xifenfei01-priv"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 en7    10.70.89.100    10.70.89.96     10.70.89.100    10.70.89.65     00:11:25:BD:B8:7A 1500
 en6    10.70.89.68     10.70.89.64     10.70.89.68     10.70.89.65     00:11:25:BD:A8:93 1500
Check: Node connectivity for interface "en7"
  Source                          Destination                     Connected?
  ------------------------------  ------------------------------  ----------------
  xifenfei02-priv[10.70.89.101]     xifenfei01-priv[10.70.89.100]     yes
Result: Node connectivity passed for interface "en7"
Check: TCP connectivity of subnet "10.70.89.96"
  Source                          Destination                     Connected?
  ------------------------------  ------------------------------  ----------------
  xifenfei01:10.70.89.68            xifenfei02-priv:10.70.89.101      passed
  xifenfei01:10.70.89.68            xifenfei01-priv:10.70.89.100      passed
Result: TCP connectivity check passed for subnet "10.70.89.96"
Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "10.70.89.64".
Subnet mask consistency check passed for subnet "10.70.89.96".
Subnet mask consistency check passed.
Result: Node connectivity check failed
Verification of node connectivity was unsuccessful.
Checks did not pass for the following node(s):
        xifenfei01-priv

这里显示xifenfei01-priv主机检测失败,报错为PRVF-4190,检查xifenfei01主机的hosts文件发现一处错误记录

xifenfei01/#vi /etc/hosts
"/etc/hosts" 113 lines, 3556 characters
# @(#)47        1.1  src/bos/usr/sbin/netstart/hosts, cmdnet, bos530 7/24/91 10:
00:46
# IBM_PROLOG_BEGIN_TAG
# This is an automatically generated prolog.
#
# bos530 src/bos/usr/sbin/netstart/hosts 1.1
#
# Licensed Materials - Property of IBM
#
# (C) COPYRIGHT International Business Machines Corp. 1985,1989
# All Rights Reserved
#
# US Government Users Restricted Rights - Use, duplication or
# disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

这里发现”00:46″是一个新行,而且是无效记录,除掉该行记录,继续runcluvfy.sh测试

xifenfei01:/u01/soft/grid> ./runcluvfy.sh comp nodecon -i en7 -n xifenfei01-priv,xifenfei02-priv -verbose
Verifying node connectivity
Checking node connectivity...
Checking hosts config file...
  Node Name                             Status
  ------------------------------------  ------------------------
  xifenfei02-priv                         passed
  xifenfei01-priv                         passed
Verification of the hosts config file successful
Interface information for node "xifenfei02-priv"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 en6    10.70.89.71     10.70.89.64     10.70.89.71     10.70.89.65     00:11:25:BD:A8:A9 1500
 en7    10.70.89.101    10.70.89.96     10.70.89.101    10.70.89.65     00:11:25:BD:51:D2 1500
Interface information for node "xifenfei01-priv"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 en7    10.70.89.100    10.70.89.96     10.70.89.100    10.70.89.65     00:11:25:BD:B8:7A 1500
 en6    10.70.89.68     10.70.89.64     10.70.89.68     10.70.89.65     00:11:25:BD:A8:93 1500
Check: Node connectivity for interface "en7"
  Source                          Destination                     Connected?
  ------------------------------  ------------------------------  ----------------
  xifenfei02-priv[10.70.89.101]     xifenfei01-priv[10.70.89.100]     yes
Result: Node connectivity passed for interface "en7"
Check: TCP connectivity of subnet "10.70.89.96"
  Source                          Destination                     Connected?
  ------------------------------  ------------------------------  ----------------
  xifenfei01:10.70.89.68            xifenfei02-priv:10.70.89.101      passed
  xifenfei01:10.70.89.68            xifenfei01-priv:10.70.89.100      passed
Result: TCP connectivity check passed for subnet "10.70.89.96"
Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "10.70.89.64".
Subnet mask consistency check passed for subnet "10.70.89.96".
Subnet mask consistency check passed.
Result: Node connectivity check passed
Verification of node connectivity was successful.

除掉无效记录后,runcluvfy检查通过.OUI继续安装一切正常.
果然是由于/etc/hosts中出现无效记录,从而使得RAC安装检查无法通过,再次提醒各位安装RAC需要小心hosts文件
参考文档:PRVF-4190 Verification of the Hosts Config File Failed (Doc ID 1056025.1)
[INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes. (Doc ID 1427202.1)

11.2.0.4 GI单独安装tfa

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:11.2.0.4 GI单独安装tfa

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在11.2.0.4安装rac执行root.sh之时需要在root的环境变量中指定可以直接执行unzip命令(在非Linux,Win环境下root用户默认环境变量无unzip命令),如果忘记执行配置相应的PATH环境变量,将导致tfa不被安装,事后安装步骤
未安装tfa现状

--/etc/inittab中只有这一条和gi相关
h1:2:respawn:/etc/init.ohasd run >/dev/null 2>&1 </dev/null
--只启动了init.ohasd
xifenf01/oradata/sys/soft#ps -ef|grep init
    root        1        0   0 13:12:11      -  0:00 /etc/init
    root 30539998        1   0 18:58:17      -  0:00 /bin/sh /etc/init.ohasd run
    root 31391906  5177692   0 19:44:15  pts/1  0:00 grep init

使用tfa_setup.sh脚本安装

xifenf01/#export PATH=$PATH:/u01/oracle/app/grid/bin
xifenf01/#/u01/oracle/app/grid/crs/install/tfa_setup.sh -silent -crshome /u01/oracle/app/grid
Starting TFA installation
TFA requires BASH shell. Please install bash and try again.

提示缺少bash shell,下载相关包安装安装(系统AIX 7.1)

xifenf01/oradata/sys/soft#rpm -ivh bash-4.2-3.aix6.1.ppc.rpm
bash                        ##################################################

重新安装tfa

xifenf01/oradata/sys/soft#/u01/oracle/app/grid/crs/install/tfa_setup.sh -silent -crshome /u01/oracle/app/grid
Starting TFA installation
Using JAVA_HOME : /u01/oracle/app/grid/jdk/jre
Running Auto Setup for TFA as user root...
The following installation requires temporary use of SSH.
If SSH is not configured already then we will remove SSH
when complete.
Installing TFA now...
TFA Will be Installed on xifenf01...
TFA will scan the following Directories
++++++++++++++++++++++++++++++++++++++++++++
.-----------------------------------------------------.
|                       xifenf01                      |
+------------------------------------------+----------+
| Trace Directory                          | Resource |
+------------------------------------------+----------+
| /u01/oracle/app/grid/OPatch/crs/log      | CRS      |
| /u01/oracle/app/grid/cfgtoollogs         | INSTALL  |
| /u01/oracle/app/grid/crs/log             | CRS      |
| /u01/oracle/app/grid/cv/log              | CRS      |
| /u01/oracle/app/grid/evm/admin/log       | CRS      |
| /u01/oracle/app/grid/evm/admin/logger    | CRS      |
| /u01/oracle/app/grid/evm/log             | CRS      |
| /u01/oracle/app/grid/install             | INSTALL  |
| /u01/oracle/app/grid/log                 | CRS      |
| /u01/oracle/app/grid/log/                | CRS      |
| /u01/oracle/app/grid/network/log         | CRS      |
| /u01/oracle/app/grid/oc4j/j2ee/home/log  | CRSOC4J  |
| /u01/oracle/app/grid/opmn/logs           | CRS      |
| /u01/oracle/app/grid/racg/log            | CRS      |
| /u01/oracle/app/grid/rdbms/log           | ASM      |
| /u01/oracle/app/grid/scheduler/log       | CRS      |
| /u01/oracle/app/grid/srvm/log            | CRS      |
| /u01/oracle/app/oraInventory/ContentsXML | INSTALL  |
| /u01/oracle/app/oraInventory/logs        | INSTALL  |
'------------------------------------------+----------'
Installing TFA on xifenf01
HOST: xifenf01  TFA_HOME: /u01/oracle/app/grid/tfa/xifenf01/tfa_home
.-----------------------------------------------------.
| Host     | Status of TFA | PID     | Port | Version |
+----------+---------------+---------+------+---------+
| xifenf01 | RUNNING       | 7536914 | 5000 | 2.5.1.5 |
'----------+---------------+---------+------+---------'
Summary of TFA Installation:
.------------------------------------------------------------------.
|                             xifenf01                             |
+---------------------+--------------------------------------------+
| Parameter           | Value                                      |
+---------------------+--------------------------------------------+
| Install location    | /u01/oracle/app/grid/tfa/xifenf01/tfa_home |
| Repository location | /u01/oracle/app/oracle/tfa/repository      |
| Repository usage    | 0 MB out of 10240 MB                       |
'---------------------+--------------------------------------------'
TFA is successfully installed..
Usage : /u01/oracle/app/grid/tfa/bin/tfactl <command> [options]
<command> =
         print        Print requested details
         purge        Delete collections from TFA repository
         directory    Add or Remove or Modify directory in TFA
         host         Add or Remove host in TFA
         set          Turn ON/OFF or Modify various TFA features
         diagcollect  Collect logs from across nodes in cluster
For help with a command: /u01/oracle/app/grid/tfa/bin/tfactl <command> -help

安装tfa成功后

--/etc/inittab
h1:2:respawn:/etc/init.ohasd run >/dev/null 2>&1 </dev/null
htfa:2:respawn:/etc/init.tfa run >/dev/null 2>&1 </dev/null
--init.tfa进程存在
xifenf01/oradata/sys/soft#ps -ef|grep init
    root        1        0   0 13:12:11      -  0:00 /etc/init
    root 30277638        1   0 19:26:37      -  0:00 /bin/sh /etc/init.tfa run
    root 30539998        1   0 18:58:17      -  0:00 /bin/sh /etc/init.ohasd run
    root 31391906  5177692   0 19:44:15  pts/1  0:00 grep init

另外一个节点安装tfa

xifenf02/#export PATH=$PATH:/u01/oracle/app/grid/bin
xifenf02/oradata/sys/soft#rpm -ivh bash-4.2-3.aix6.1.ppc.rpm.rpm
bash                        ##################################################
xifenf02/#/u01/oracle/app/grid/crs/install/tfa_setup.sh -silent -crshome /u01/oracle/app/grid
Starting TFA installation
Using JAVA_HOME : /u01/oracle/app/grid/jdk/jre
Running Auto Setup for TFA as user root...
The following installation requires temporary use of SSH.
If SSH is not configured already then we will remove SSH
when complete.
Installing TFA now...
TFA Will be Installed on xifenf02...
TFA will scan the following Directories
++++++++++++++++++++++++++++++++++++++++++++
.-----------------------------------------------------.
|                       xifenf02                      |
+------------------------------------------+----------+
| Trace Directory                          | Resource |
+------------------------------------------+----------+
| /u01/oracle/app/grid/OPatch/crs/log      | CRS      |
| /u01/oracle/app/grid/cfgtoollogs         | INSTALL  |
| /u01/oracle/app/grid/crs/log             | CRS      |
| /u01/oracle/app/grid/cv/log              | CRS      |
| /u01/oracle/app/grid/evm/admin/log       | CRS      |
| /u01/oracle/app/grid/evm/admin/logger    | CRS      |
| /u01/oracle/app/grid/evm/log             | CRS      |
| /u01/oracle/app/grid/install             | INSTALL  |
| /u01/oracle/app/grid/log                 | CRS      |
| /u01/oracle/app/grid/log/                | CRS      |
| /u01/oracle/app/grid/network/log         | CRS      |
| /u01/oracle/app/grid/oc4j/j2ee/home/log  | CRSOC4J  |
| /u01/oracle/app/grid/opmn/logs           | CRS      |
| /u01/oracle/app/grid/racg/log            | CRS      |
| /u01/oracle/app/grid/rdbms/log           | ASM      |
| /u01/oracle/app/grid/scheduler/log       | CRS      |
| /u01/oracle/app/grid/srvm/log            | CRS      |
| /u01/oracle/app/oraInventory/ContentsXML | INSTALL  |
| /u01/oracle/app/oraInventory/logs        | INSTALL  |
'------------------------------------------+----------'
Installing TFA on xifenf02
HOST: xifenf02  TFA_HOME: /u01/oracle/app/grid/tfa/xifenf02/tfa_home
.-----------------------------------------------------.
| Host     | Status of TFA | PID     | Port | Version |
+----------+---------------+---------+------+---------+
| xifenf02 | RUNNING       | 5898636 | 5000 | 2.5.1.5 |
| xifenf01 | RUNNING       | 7536914 | 5000 | 2.5.1.5 |
'----------+---------------+---------+------+---------'
Summary of TFA Installation:
.------------------------------------------------------------------.
|                             xifenf02                             |
+---------------------+--------------------------------------------+
| Parameter           | Value                                      |
+---------------------+--------------------------------------------+
| Install location    | /u01/oracle/app/grid/tfa/xifenf02/tfa_home |
| Repository location | /u01/oracle/app/oracle/tfa/repository      |
| Repository usage    | 0 MB out of 10240 MB                       |
'---------------------+--------------------------------------------'
TFA is successfully installed..
Usage : /u01/oracle/app/grid/tfa/bin/tfactl <command> [options]
<command> =
         print        Print requested details
         purge        Delete collections from TFA repository
         directory    Add or Remove or Modify directory in TFA
         host         Add or Remove host in TFA
         set          Turn ON/OFF or Modify various TFA features
         diagcollect  Collect logs from across nodes in cluster
For help with a command: /u01/oracle/app/grid/tfa/bin/tfactl <command> -help

11gR2 GI打PSU ORACLE_HOME空间不足案例—需要20G以上空闲空间

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:11gR2 GI打PSU ORACLE_HOME空间不足案例—需要20G以上空闲空间

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

GI打auto打patch命令

#/u01/oracle/app/grid/OPatch/opatch auto /u01/soft/18706472 -ocmrf /u01/soft/ocm.rsp
Executing /u01/oracle/app/grid/perl/bin/perl /u01/oracle/app/grid/OPatch/crs/patch11203.pl
-patchdir /u01/soft -patchn 18706472 -ocmrf /u01/soft/ocm.rsp -paramfile /u01/oracle/app/grid/crs/install/crsconfig_params
This is the main log file: /u01/oracle/app/grid/cfgtoollogs/opatchauto2014-10-15_01-15-28.log
This file will show your detected configuration and all the steps that opatchauto attempted to do on your system:
/u01/oracle/app/grid/cfgtoollogs/opatchauto2014-10-15_01-15-28.report.log
2014-10-15 01:15:28: Starting Clusterware Patch Setup
Using configuration parameter file: /u01/oracle/app/grid/crs/install/crsconfig_params
Stopping CRS...
Stopped CRS successfully
patch /u01/soft/18706472/18522509  apply successful for home  /u01/oracle/app/grid
patch /u01/soft/18706472/18522515  apply failed  for home  /u01/oracle/app/grid
Starting CRS...
Installing Trace File Analyzer
CRS-4123: Oracle High Availability Services has been started.
opatch auto succeeded.

这里可以看到,打patch 18522509成功,后续patch都失败,而不是仅仅18522515失败

分析错误日志

Composite patch 18522509 successfully applied.
 OPatch Session completed with warnings.
 Log file location: /u01/oracle/app/grid/cfgtoollogs/opatch/opatch2014-10-15_01-19-00AM_1.log
 OPatch completed with warnings.
2014-10-15 01:21:44: patch /u01/soft/18706472/18522509  apply successful for home  /u01/oracle/app/grid
2014-10-15 01:21:44: Executing command /u01/oracle/app/grid/OPatch/opatch napply /u01/soft/18706472/18522515 -local -silent -ocmrf /
u01/soft/ocm.rsp -oh /u01/oracle/app/grid -invPtrLoc /u01/oracle/app/grid/oraInst.loc as grid
2014-10-15 01:21:44: Running as user grid: /u01/oracle/app/grid/OPatch/opatch napply /u01/soft/18706472/18522515 -local -silent -ocm
rf /u01/soft/ocm.rsp -oh /u01/oracle/app/grid -invPtrLoc /u01/oracle/app/grid/oraInst.loc
2014-10-15 01:21:44: s_run_as_user2: Running /bin/su grid -c ' /u01/oracle/app/grid/OPatch/opatch napply /u01/soft/18706472/18522515
 -local -silent -ocmrf /u01/soft/ocm.rsp -oh /u01/oracle/app/grid -invPtrLoc /u01/oracle/app/grid/oraInst.loc '
2014-10-15 01:21:49: Removing file /tmp/uaa7jC7eu
2014-10-15 01:21:49: Successfully removed file: /tmp/uaa7jC7eu
2014-10-15 01:21:49: /bin/su exited with rc=73
2014-10-15 01:21:49: status of apply patch is 18688
2014-10-15 01:21:49: The apply patch output is Oracle Interim Patch Installer version 11.2.0.3.6
 Copyright (c) 2013, Oracle Corporation.  All rights reserved.
 Oracle Home       : /u01/oracle/app/grid
 Central Inventory : /u01/oracle/app/oraInventory
    from           : /u01/oracle/app/grid/oraInst.loc
 OPatch version    : 11.2.0.3.6
 OUI version       : 11.2.0.4.0
 Log file location : /u01/oracle/app/grid/cfgtoollogs/opatch/opatch2014-10-15_01-21-44AM_1.log
 Verifying environment and performing prerequisite checks...
 Prerequisite check "CheckSystemSpace" failed.
 The details are:
 Required amount of space(24021.839MB) is not available.
 UtilSession failed:
 Prerequisite check "CheckSystemSpace" failed.
 Log file location: /u01/oracle/app/grid/cfgtoollogs/opatch/opatch2014-10-15_01-21-44AM_1.log
 OPatch failed with error code 73

这里可以看出来,patch 18522509 成功,但是18522515在check的时候失败,由于$ORACLE_HOME(GI)需要可用空间为24021.839MB,现在空闲空间不足

查看空间并腾空间

# df -g
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4          10.00      9.56    5%    12443     1% /
/dev/hd2          15.00     11.42   24%    60339     3% /usr
/dev/hd9var       10.00      9.91    1%     3310     1% /var
/dev/hd3          10.00      9.65    4%      364     1% /tmp
/dev/hd1          10.00     10.00    1%       32     1% /home
/dev/hd11admin      0.25      0.25    1%        5     1% /admin
/proc                 -         -    -         -     -  /proc
/dev/hd10opt      10.00      8.98   11%    12401     1% /opt
/dev/fslv00       50.00     12.94   75%    59137     2% /u01
/dev/odm           0.00      0.00   -1%        6   100% /dev/odm
/dev/vx/dsk/crs_dg/crs_vol      1.96      1.75   12%        5     1% /crsdata
/dev/vx/dsk/ora_dg/data01_vol   2048.00   2031.32    1%        4     1% /oradata/data01
/dev/vx/dsk/ora_dg/sys_vol    400.00    350.29   13%     3702     1% /oradata/sys
# cd /u01
# ls
lost+found  oracle      soft
# cd soft
# du -sg
17.34   .
# mv /u01/soft /oradata/sys/
# df -g
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4          10.00      9.56    5%    12443     1% /
/dev/hd2          15.00     11.42   24%    60339     3% /usr
/dev/hd9var       10.00      9.91    1%     3310     1% /var
/dev/hd3          10.00      9.65    4%      364     1% /tmp
/dev/hd1          10.00     10.00    1%       32     1% /home
/dev/hd11admin      0.25      0.25    1%        5     1% /admin
/proc                 -         -    -         -     -  /proc
/dev/hd10opt      10.00      8.98   11%    12401     1% /opt
/dev/fslv00       50.00     27.56   45%    59137     2% /u01
/dev/odm           0.00      0.00   -1%        6   100% /dev/odm
/dev/vx/dsk/crs_dg/crs_vol      1.96      1.75   12%        5     1% /crsdata
/dev/vx/dsk/ora_dg/data01_vol   2048.00   2031.32    1%        4     1% /oradata/data01
/dev/vx/dsk/ora_dg/sys_vol    400.00    333.54   14%     3702     1% /oradata/sys
# cd soft
# ls -ltr
total 12464202
drwxr-xr-x    5 grid     dba            1024 Jul 07 21:06 18706472
-rw-rw-r--    1 grid     dba            2123 Jul 15 20:06 PatchSearch.xml
-rw-r--r--    1 root     system   1801653734 Oct 14 14:39 p13390677_112040_aix64-5l_1of7.zip
-rw-r--r--    1 root     system   1170882875 Oct 14 14:40 p13390677_112040_aix64-5l_2of7.zip
-rw-r--r--    1 root     system   2127071138 Oct 14 14:41 p13390677_112040_aix64-5l_3of7.zip
-rw-r--r--    1 root     system   1242090126 Oct 14 14:43 p18706472_112040_AIX64-5L.zip
-rw-r--r--    1 root     system     34964928 Oct 14 14:43 p6880880_112000_AIX64-5L.zip
-rwxr-xr-x    1 root     system       127965 Oct 14 14:48 unzip_aix
-rw-r--r--    1 root     system      4871110 Oct 14 14:50 ONEOFF_112043_AIX64.zip
drwxr-xr-x    8 grid     dba            1024 Oct 14 14:55 grid
drwxr-xr-x    8 oracle   dba            1024 Oct 14 14:57 database
-rw-r--r--    1 grid     dba             621 Oct 15 00:11 ocm.rsp

确实/u01空间不足,通过把RAC安装文件迁移到其他目录,确保/u01有足够空间用来打patch

回退命令

# /u01/oracle/app/grid/OPatch/opatch auto /u01/soft/18706472 -rollback -ocmrf /u01/soft/ocm.rsp

重新打patch

#/u01/oracle/app/grid/OPatch/opatch auto /u01/soft/18706472 -ocmrf /u01/soft/ocm.rsp                 <
Executing /u01/oracle/app/grid/perl/bin/perl /u01/oracle/app/grid/OPatch/crs/patch11203.pl -patchdir /oradata/sys/soft
-patchn 18706472 -ocmrf /oradata/sys/soft/ocm.rsp -paramfile /u01/oracle/app/grid/crs/install/crsconfig_params
This is the main log file: /u01/oracle/app/grid/cfgtoollogs/opatchauto2014-10-15_01-57-42.log
This file will show your detected configuration and all the steps that opatchauto attempted to do on your system:
/u01/oracle/app/grid/cfgtoollogs/opatchauto2014-10-15_01-57-42.report.log
2014-10-15 01:57:42: Starting Clusterware Patch Setup
Using configuration parameter file: /u01/oracle/app/grid/crs/install/crsconfig_params
Stopping CRS...
Stopped CRS successfully
patch /oradata/sys/soft/18706472/18522509  apply successful for home  /u01/oracle/app/grid
patch /oradata/sys/soft/18706472/18522515  apply successful for home  /u01/oracle/app/grid
patch /oradata/sys/soft/18706472/18522514  apply successful for home  /u01/oracle/app/grid
Starting CRS...
Installing Trace File Analyzer
CRS-4123: Oracle High Availability Services has been started.
opatch auto succeeded.

检查确认patch成功

# su - grid
xifenfei1:/home/grid> opatch lspatches
18522514;ACFS PATCH SET UPDATE : 11.2.0.4.3 (18522514)
18522515;OCW Patch Set Update : 11.2.0.4.3 (18522515)
18522509;Database Patch Set Update : 11.2.0.4.3 (18522509)
xifenfei1:/home/grid> crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       xifenfei1
               ONLINE  ONLINE       xifenfei2
ora.asm
               OFFLINE OFFLINE      xifenfei1
               OFFLINE OFFLINE      xifenfei2
ora.gsd
               OFFLINE OFFLINE      xifenfei1
               OFFLINE OFFLINE      xifenfei2
ora.net1.network
               ONLINE  ONLINE       xifenfei1
               ONLINE  ONLINE       xifenfei2
ora.ons
               ONLINE  ONLINE       xifenfei1
               ONLINE  ONLINE       xifenfei2
ora.registry.acfs
               OFFLINE OFFLINE      xifenfei1
               OFFLINE OFFLINE      xifenfei2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xifenfei2
ora.cvu
      1        ONLINE  ONLINE       xifenfei2
ora.xifenfei1.vip
      1        ONLINE  ONLINE       xifenfei1
ora.xifenfei2.vip
      1        ONLINE  ONLINE       xifenfei2
ora.oc4j
      1        ONLINE  ONLINE       xifenfei2
ora.scan1.vip
      1        ONLINE  ONLINE       xifenfei2

ORACLE 12C RAC主要gc等待事件

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORACLE 12C RAC主要gc等待事件

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

ORACLE 12C RAC中有很多GC等待事件,这里重点介绍一些常见的GC等待事件,以便在以后遇到类似问题方便分析

RAC等待事件

在本节中我将讨论重要的RAC等待事件。这还不是全部等待事件的完整列表,只是一个最常见的等待事件的列表。

GC当前块2-Way/3-Way

当前块的等待事件意味着被传输的块的版本是块的最新版本。这个等待事件读取和写入活动中都可能遇到。如果该块被访问是以读取活动进行,那么该资源上锁以KJUSERPR(PR)模式获得。刚才我在“资源和锁定”一节中所讨论的示例展示了KJUSERPR模式的锁。
在下面的例子中,我从表t_one中查询一行,造成连接到节点2的磁盘读。查看SQL跟踪文件,没有全局缓存等待事件。原因是该块被本地掌控(本地实例是master),所以FG进程中可直接获取该资源上锁,而不会产生任何全局缓存等待。这种类型的锁也被称为亲和力锁定(相似度锁定)方案。动态掌握资源(DRM,Dynamic Resource Mastering)的部分将详细讨论相似度锁定(亲和力锁定)。
RS@ORCL2:2> @tc_one_row.sql
N1 FNO BLOCK OBJ V1
———- ———- ———- ———- ———-
100 4 180 75742 250
Trace file:
nam=’db file sequential read’ ela= 563 file#=4 block#=180 blocks=1 obj#=75742
我连接到实例1(注释5)查询同一行数据。由于该块已经缓存在实例2,因此该块从实例2传输到实例1。跟踪显示,等待事件gc current block 2-way由于file_id=4,BLOCK_ID=180块而被遇到了。这个块传输是一个两路的块传输,因为该资源的拥有实例和资源主实例(实例2)是相同的。
SYS@ORCL1:1> @tc_one_row.sql
N1 FNO BLOCK OBJ V1
———- ———- ———- ———- ———-
100 4 180 75742 250
Trace file:
nam=’gc current block 2-way’ ela= 629 p1=4 p2=180 p3=1 obj#=75742 tim=1350440823837572
接下来,我将创建一个3路等待事件的条件,但首先让我在所有三个实例上刷新缓冲区,开始用干净的缓冲区。在以下示例中:
1.我将连接到实例1查询(这将加载块到实例1的缓冲区高速缓存中)这行数据。
2.我将连接到一个实例,并从实例3查询同一行。我会话的FG进程将为这一数据块请求运行在实例2上的LMS进程(因为实例2是该资源的掌控者)。
3.实例2的LMS进程将请求转发给实例1的LMS进程。
4.实例1的LMS进程将发送块给运行在实例3上的FG进程。
从本质上讲,三个实例参与一个数据块的传输,因此,这是一个三路块传输。
–alter system flush buffer_cache; –在所有实例上
RS@ORCL1:1> @tc_one_row.sql
N1 FNO BLOCK OBJ V1
———- ———- ———- ———- ———-
100 4 180 75742 250
RS@ORCL3:3> @tc_one_row.sql
N1 FNO BLOCK OBJ V1
———- ———- ———- ———- ———-
100 4 180 75742 250
Trace file:
nam=’gc current block 3-way’ ela= 798 p1=4 p2=180 p3=1 obj#=75742
连接到实例2查看资源锁,我们看到在该资源上有两个锁被持,分别是实例1和实例3(owner_node等于0和2)。
RS@ORCL2:2> SELECT resource_name1, grant_level, state, owner_node
FROM v$ges_enqueue
WHERE resource_name1 LIKE ‘[0xb4][0x4],[BL]%’;
RESOURCE_NAME1 GRANT_LEV STATE OWNER_NODE
—————————— ——— ————- ———-
[0xb4][0x4],[BL][ext 0x0,0x0] KJUSERPR GRANTED 2
[0xb4][0x4],[BL][ext 0x0,0x0] KJUSERPR GRANTED 0
在gc current block 2-way or gc current block 3-way等待事件上的过多等待,通常要么是由于(a)一种低效的执行计划,导致了大量的块访问,或者(b)应用数据相似度(应用亲和力)没有被实施。如果对象访问本地化,考虑实施应用亲和力(应用数据的相似度)。此外,使用前面一节中讨论的技术“所有等待事件的通用分析”。

GC CR Block 2-Way/3-Way

CR模式的块传输发生在只读访问的请求中。考虑这样一个场景:一个块以CURRENT模式驻留在实例2上,实例2以独占模式保持了该资源的BL锁。另一个会话连接到实例1来请求该块,阻止。由于Oracle数据库中“其他人看不到未提交的更改”,SELECT语句请求一个查询开始时间的块的特定版本。SCN被用于标识块版本,本质上,SELECT语句请求的版本与块的SCN一致。LMS进程维护实例2的请求,以CURRENT的模式克隆该块到缓冲区,验证SCN版本与请求是一致的,然后发送该块的CR拷贝给FG进程。
这些CR模式传输和CURRENT模式传输之间的主要区别在于,在CR模式传输的情况下,在GRD中没有资源或锁来维护CR缓冲区。从本质上讲,CR模式块不需要全局缓存资源或锁。接收到的CR副本只能由提出请求的会话使用,并且只适用于这个特定的SQL执行中。这就是为什么Oracle数据库不对CR传输获取BL资源的任何锁。
由于没有全局缓存锁保护缓冲区,连接到实例1再次执行这个SQL语句访问那个块时将遭遇gc cr block 2-way 或者 gc cr block 3-way等待事件。
因此,每次从实例1访问该块都将触发新CR缓冲区的构造。即使在实例2的缓冲区没有发生该块修改,在实例1中的FG进程仍然会遭遇到CR等待事件。驻留在实例1的CR缓冲区是不能重复使用的,因为每SQL执行请求查询时的SCN会有所不同。
下面的跟踪显示了一个块从资源主实例以0.6毫秒的延迟被传输到请求实例。另外,file_id,BLOCK_ID,和跟踪文件中的object_id信息,可以被用来识别正在遭遇这两个等待事件的对象。当然,也可以通过查询ASH数据来识别该对象。
nam=’gc cr block 2-way’ ela= 627 p1=7 p2=6852 p3=1 obj#=76483 tim=37221074057
执行tc_one_row.sql五次,然后查询缓冲区头信息后,你可以看到在两个实例1和2有五个该块的CR缓冲区。请注意,CR_SCN_BAS和CR_SCN_WRP 列(注释6)显示了每个CR缓冲副本不同的值。查询GV$ ges_resource和GV$ ges_enqueue,你也可以看到有没有GC锁保护这些缓冲区。
清单10-10 缓冲区状态

SELECT

DECODE(state,0,’free’,1,’xcur’,2,’scur’,3,’cr’, 4,’read’,5,’mrec’,

6,’irec’,7,’write’,8,’pi’, 9,’memory’,10,’mwrite’,

11,’donated’, 12,’protected’, 13,’securefile’, 14,’siop’,

15,’recckpt’, 16, ‘flashfree’, 17, ‘flashcur’, 18, ‘flashna’) state,

mode_held, le_addr, dbarfil, dbablk, cr_scn_bas, cr_scn_wrp , class

FROM sys.x$bh

WHERE obj= &&obj

AND dbablk= &&block

AND state!=0 ;

Enter value for obj: 75742

Enter value for block: 180

STATE MODE_HELD LE DBARFIL DBABLK CR_SCN_BAS CR_SCN_WRP CLASS

———- ———- — ————– ———- ———- ———- ———-

cr 0 00 1 75742 649314930 3015 1

cr 0 00 1 75742 648947873 3015 1

cr 0 00 1 75742 648926281 3015 1

cr 0 00 1 75742 648810300 3015 1

cr 0 00 1 75742 1177328436 3013 1
CR缓冲区的创建是一个特例,它并不需要获取全局缓存锁来保护CR缓冲区。如果频繁访问的对象上有长时间未提交的事物就可能会出现CR风暴。因此,明智的做法是把更新大量表数据的批处理安排在一个不太繁忙的时间段。

GC CR Grant 2-Way/Gc Current Grant 2-Way

如果被请求的块没有驻留在任何缓冲区中,就会遭遇gc cr grant 2-way 和 gc current grant 2-way等待事件。FG进程向LMS进程请求一个块,但块没有驻留在任何缓冲区。因此,LMS进程回复一条授权FG进程从磁盘读取的块的消息。FG进程从磁盘读取该块并继续后续的处理。
下面一行显示了为了访问file_id=4和lock_id=180数据块,FG进程接收到来自LMS进程的授权响应。下一行显示了,执行了一个从磁盘读取块的物理读。
nam=’gc cr grant 2-way’ ela= 402 p1=4 p2=180 p3=1 obj#=75742
nam=’db file sequential read’ ela= 553 file#=4 block#=180 blocks=1 obj#=75742
过多的此类等待意味着,要么缓冲区高速缓存太小,要么SQL语句的执行过分的刷新了缓冲区高速缓存。识别正在遭遇此类等待事件的SQL语句和对象,并优化这些SQL语句。
DRM功能的设计就是为了减少发生这类授权相关的等待事件。

GC CR Block Busy/GC Current Block Busy

繁忙事件(Busy events)表明,LMS执行了额外的工作去处理并发相关的问题。例如,要建立一个CR块,LMS进程可能要应用撤消记录(undo records),重构一个与查询的SCN一致的块。在把该块回传给FG过程时,LMS将标记块传输是否遇到gc cr block busy或gc current block busy等待事件,这取决于块传输的类型。

GC CR Block Congested/GC Current Block Congested

如果LMS进程在接收到请求后没有在1毫秒内处理该请求,那么LMS进程标记这个响应为:该块正遭遇拥堵相关的等待事件。堵塞相关的等待事件有很多原因,比如说,LMS进程被大量全局高速缓存的请求所淹没。LMS进程正遭遇CPU的调度延迟,LMS进程已经遇到了另一种资源耗尽(如内存)等。
通常情况下,LMS进程运行在实时CPU调度优先级,因此,CPU调度的延迟将是最小的。大量这类的等待此事件表明出现了全局缓存请求的突然飙升,且LMS进程无法快速处理这些请求。服务器内存匮乏也可能导致LMS进程的分页,影响全局缓存的性能。
您可以去检查为什么LMS进程不能够有效地处理请求。

因asm sga_target设置不当导致11gr2 rac无法正常启动

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:因asm sga_target设置不当导致11gr2 rac无法正常启动

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

2014年第一个故障排查和解决:同事反馈给我说solaris 11.2 两节点rac无法启动,让我帮忙看下。通过分析是因为sga_target参数设置不合理导致asm无法正常启动
GI无法正常启动

grid@zwq-rpt1:~$crsctl status resource -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
grid@zwq-rpt1:~$crsctl status resource -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                               Instance Shutdown
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       zwq-rpt1
ora.crf
      1        ONLINE  ONLINE       zwq-rpt1
ora.crsd
      1        ONLINE  OFFLINE
ora.cssd
      1        ONLINE  ONLINE       zwq-rpt1
ora.cssdmonitor
      1        ONLINE  ONLINE       zwq-rpt1
ora.ctssd
      1        ONLINE  ONLINE       zwq-rpt1                 ACTIVE:0
ora.diskmon
      1        OFFLINE OFFLINE
ora.evmd
      1        ONLINE  INTERMEDIATE zwq-rpt1
ora.gipcd
      1        ONLINE  ONLINE       zwq-rpt1
ora.gpnpd
      1        ONLINE  ONLINE       zwq-rpt1
ora.mdnsd
      1        ONLINE  ONLINE       zwq-rpt1

asm未正常启动

GI日志报错

2014-01-01 00:40:47.708
[cssd(1418)]CRS-1605:CSSD voting file is online: /dev/rdsk/emcpower0a; details in /export/home/app/grid/log/zwq-rpt1/cssd/ocssd.log.
2014-01-01 00:40:53.234
[cssd(1418)]CRS-1601:CSSD Reconfiguration complete. Active nodes are zwq-rpt1 zwq-rpt2 .
2014-01-01 00:40:56.659
[ctssd(1483)]CRS-2407:The new Cluster Time Synchronization Service reference node is host zwq-rpt2.
2014-01-01 00:40:56.661
[ctssd(1483)]CRS-2401:The Cluster Time Synchronization Service started on host zwq-rpt1.
2014-01-01 00:41:02.016
[ctssd(1483)]CRS-2408:The clock on host zwq-rpt1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
2014-01-01 00:43:23.874
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".
2014-01-01 00:45:42.837
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".
2014-01-01 00:48:02.087
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".
2014-01-01 00:48:18.836
[ohasd(1083)]CRS-2807:Resource 'ora.asm' failed to start automatically.
2014-01-01 00:48:18.837
[ohasd(1083)]CRS-2807:Resource 'ora.crsd' failed to start automatically.
2014-01-01 01:05:15.396
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [CRSDG], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".
2014-01-01 01:05:45.101
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [CRSDG], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".
2014-01-01 01:06:15.104
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [CRSDG], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".

这里较为明显的看到,因为asm磁盘组异常导致ocr无法被访问导致crs无法正常启动

ORAAGENT日志

2014-01-01 00:43:23.870: [ora.asm][9] {0:0:2} [start] InstConnection::connectInt (2) Exception OCIException
2014-01-01 00:43:23.870: [ora.asm][9] {0:0:2} [start] InstConnection:connect:excp OCIException OCI error 604
2014-01-01 00:43:23.870: [ora.asm][9] {0:0:2} [start] DgpAgent::queryDgStatus excp ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^34f764db","kglHeapInitialize:temp")

报了较为清晰的ORA-4031错误,检查asm日志

ASM日志报错

Wed Jan 01 00:47:33 2014
ORACLE_BASE not set in environment. It is recommended
that ORACLE_BASE be set in the environment
Reusing ORACLE_BASE from an earlier startup = /export/home/app/oracle
Wed Jan 01 00:47:39 2014
Errors in file /export/home/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_1728.trc  (incident=291447):
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^34f764db","kglHeapInitialize:temp")
Incident details in: /export/home/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_291447/+ASM1_ora_1728_i291447.trc
Wed Jan 01 00:47:48 2014
Dumping diagnostic data in directory=[cdmp_20140101004748], requested by (instance=1, osid=1728), summary=[incident=291447].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Jan 01 00:47:53 2014
Errors in file /export/home/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_1730.trc  (incident=291448):
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^34f764db","kglHeapInitialize:temp")
Incident details in: /export/home/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_291448/+ASM1_ora_1730_i291448.trc
Wed Jan 01 00:48:01 2014
Dumping diagnostic data in directory=[cdmp_20140101004801], requested by (instance=1, osid=1730), summary=[incident=291448].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Jan 01 00:48:07 2014
Errors in file /export/home/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_1732.trc  (incident=291449):
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^34f764db","kglHeapInitialize:temp")
Incident details in: /export/home/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_291449/+ASM1_ora_1732_i291449.trc
Wed Jan 01 00:48:16 2014
Dumping diagnostic data in directory=[cdmp_20140101004816], requested by (instance=1, osid=1732), summary=[incident=291449].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Jan 01 00:48:16 2014
License high water mark = 1
USER (ospid: 1736): terminating the instance
Instance terminated by USER, pid = 1736

这里可以清晰的看到,因为shared pool不足,导致asm报ora-4031错误,从而使得asm无法正常启动

分析原因

Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options.
ORACLE_HOME = /export/home/app/grid
System name:	SunOS
Node name:	zwq-rpt1
Release:	5.11
Version:	11.1
Machine:	sun4v
Using parameter settings in server-side spfile +CRSDG/zwq-rpt-cluster/asmparameterfile/registry.253.823992831
System parameters with non-default values:
  sga_max_size             = 2G
  large_pool_size          = 16M
  instance_type            = "asm"
  sga_target               = 0
  remote_login_passwordfile= "EXCLUSIVE"
  asm_diskstring           = "/dev/rdsk/*"
  asm_diskgroups           = "FRADG"
  asm_diskgroups           = "DATADG"
  asm_power_limit          = 1
  diagnostic_dest          = "/export/home/app/oracle"

这里可以看到sga_target被设置为了0,而shared pool又未被配置,这里因为shared pool不足从而出现了ORA-4031,从而导致crs在启动asm的过程失败,从而使得ocr不能被访问,进而使得crs不能正常启动.

处理方法
1.编辑pfile

grid@zwq-rpt1:/export/home/app/oracle/diag/asm/+asm/+ASM1/trace$vi /tmp/asm.pfile
  memory_target = 2G
  large_pool_size          = 16M
  instance_type            = "asm"
  sga_target               = 0
  remote_login_passwordfile= "EXCLUSIVE"
  asm_diskstring           = "/dev/rdsk/*"
  asm_diskgroups           = "FRADG"
  asm_diskgroups           = "DATADG"
  asm_power_limit          = 1
  diagnostic_dest          = "/export/home/app/oracle"

2.启动asm

grid@zwq-rpt1:/export/home/app/oracle/diag/asm/+asm/+ASM1/trace$sqlplus / as sysasm
SQL*Plus: Release 11.2.0.3.0 Production on Wed Jan 1 01:04:10 2014
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup pfile='/tmp/asm.pfile'
ASM instance started
Total System Global Area 2138521600 bytes
Fixed Size                  2161024 bytes
Variable Size            2102806144 bytes
ASM Cache                  33554432 bytes
ASM diskgroups mounted

3. 创建spfile

SQL> create spfile='+CRSDG' FROM PFILE='/tmp/asm.pfile';
File created.
--asm alert日志
Wed Jan 01 01:08:59 2014
NOTE: updated gpnp profile ASM SPFILE to
NOTE: updated gpnp profile ASM diskstring: /dev/rdsk/*
NOTE: updated gpnp profile ASM diskstring: /dev/rdsk/*
NOTE: updated gpnp profile ASM SPFILE to +CRSDG/zwq-rpt-cluster/asmparameterfile/registry.253.835664939

4. 关闭asm

SQL> shutdown immediate
ORA-15097: cannot SHUTDOWN ASM instance with connected client (process 1971)
SQL> shutdown abort
ASM instance shutdown

5. 重启crs

root@zwq-rpt1:~# crsctl stop crs -f
root@zwq-rpt1:~# crsctl start crs

6. 重启其他节点crs

root@zwq-rpt2:~# crsctl stop crs -f
root@zwq-rpt2:~# crsctl start crs

7. 检查结果

root@zwq-rpt1:~# crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRSDG.dg
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
ora.DATADG.dg
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
ora.FRADG.dg
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
ora.LISTENER.lsnr
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
ora.asm
               ONLINE  ONLINE       zwq-rpt1                 Started
               ONLINE  ONLINE       zwq-rpt2                 Started
ora.gsd
               OFFLINE OFFLINE      zwq-rpt1
               OFFLINE OFFLINE      zwq-rpt2
ora.net1.network
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
ora.ons
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       zwq-rpt1
ora.cvu
      1        ONLINE  ONLINE       zwq-rpt1
ora.oc4j
      1        ONLINE  ONLINE       zwq-rpt1
ora.rptdb.db
      1        ONLINE  ONLINE       zwq-rpt1                 Open
      2        ONLINE  ONLINE       zwq-rpt2                 Open
ora.scan1.vip
      1        ONLINE  ONLINE       zwq-rpt1
ora.zwq-rpt1.vip
      1        ONLINE  ONLINE       zwq-rpt1
ora.zwq-rpt2.vip
      1        ONLINE  ONLINE       zwq-rpt2

至此恢复正常,2014年第一个故障顺利解决

ORACLE 12C RAC hub AND leaf 相互转换

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORACLE 12C RAC hub AND leaf 相互转换

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

感谢Lunar的指导,完成ORACLE 12C RAC hub和leaf相互转换,参考官方文档Oracle Flex Clusters部分
当前数据库状态

--集群状态
[root@rac1 ~]# crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.DATA.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.SYSDB_NEW.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.SYSDG.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.net1.network
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.ons
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.proxy_advm
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac1                     STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       rac2                     STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       rac2                     169.254.177.226 10.1
                                                             .1.104,STABLE
ora.asm
      1        ONLINE  ONLINE       rac1                     STABLE
      2        ONLINE  ONLINE       rac2                     STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       rac2                     STABLE
ora.gns
      1        ONLINE  ONLINE       rac2                     STABLE
ora.gns.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       rac2                     Open,STABLE
ora.oc4j
      1        ONLINE  ONLINE       rac2                     STABLE
ora.ora12c.db
      1        ONLINE  ONLINE       rac1                     Open,STABLE
      2        ONLINE  ONLINE       rac2                     Open,STABLE
ora.rac1.vip
      1        ONLINE  ONLINE       rac1                     STABLE
ora.rac2.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       rac1                     STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       rac2                     STABLE
--------------------------------------------------------------------------------
--rac运行在flex模式
[root@rac1 ~]#  crsctl get cluster mode status
Cluster is running in "flex" mode
--asm运行在flex模式
[grid@rac1 ~]$ asmcmd
ASMCMD> showclustermode
ASM cluster : Flex mode enabled
--节点角色
[root@rac1 ~]# crsctl get node role config
Node 'rac1' configured role is 'hub'
[root@rac2 ~]# crsctl get node role config
Node 'rac2' configured role is 'hub'

转换hub to leaf

--转换hub为leaf
[root@rac1 ~]# crsctl set node role leaf
CRS-4408: Node 'rac1' configured role successfully changed; restart Oracle High Availability Services for new role to take effect.
--关闭集群
[root@rac1 ~]# crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.crsd' on 'rac1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'rac1'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'rac1'
CRS-2673: Attempting to stop 'ora.SYSDB_NEW.dg' on 'rac1'
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'rac1'
CRS-2673: Attempting to stop 'ora.SYSDG.dg' on 'rac1'
CRS-2673: Attempting to stop 'ora.ora12c.db' on 'rac1'
CRS-2673: Attempting to stop 'ora.proxy_advm' on 'rac1'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.rac1.vip' on 'rac1'
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'rac1'
CRS-2677: Stop of 'ora.rac1.vip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.rac1.vip' on 'rac2'
CRS-2677: Stop of 'ora.ora12c.db' on 'rac1' succeeded
CRS-2677: Stop of 'ora.scan1.vip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.scan1.vip' on 'rac2'
CRS-2676: Start of 'ora.rac1.vip' on 'rac2' succeeded
CRS-2677: Stop of 'ora.SYSDB_NEW.dg' on 'rac1' succeeded
CRS-2676: Start of 'ora.scan1.vip' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN1.lsnr' on 'rac2'
CRS-2677: Stop of 'ora.DATA.dg' on 'rac1' succeeded
CRS-2677: Stop of 'ora.SYSDG.dg' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'rac1'
CRS-2677: Stop of 'ora.asm' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.ASMNET1LSNR_ASM.lsnr' on 'rac1'
CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'rac2' succeeded
CRS-2677: Stop of 'ora.proxy_advm' on 'rac1' succeeded
CRS-2677: Stop of 'ora.ASMNET1LSNR_ASM.lsnr' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.ons' on 'rac1'
CRS-2677: Stop of 'ora.ons' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'rac1'
CRS-2677: Stop of 'ora.net1.network' on 'rac1' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'rac1' has completed
CRS-2677: Stop of 'ora.crsd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.storage' on 'rac1'
CRS-2673: Attempting to stop 'ora.crf' on 'rac1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'rac1'
CRS-2677: Stop of 'ora.storage' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'rac1'
CRS-2677: Stop of 'ora.drivers.acfs' on 'rac1' succeeded
CRS-2677: Stop of 'ora.crf' on 'rac1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.asm' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'rac1'
CRS-2673: Attempting to stop 'ora.evmd' on 'rac1'
CRS-2677: Stop of 'ora.evmd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rac1'
CRS-2677: Stop of 'ora.cssd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1'
CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
--启动集群
[root@rac1 ~]# crsctl start crs -wait
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2672: Attempting to start 'ora.evmd' on 'rac1'
CRS-2676: Start of 'ora.evmd' on 'rac1' succeeded
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2789: Cannot stop resource 'ora.diskmon' as it is not running on server 'rac1'
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'rac1'
CRS-2676: Start of 'ora.storage' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'rac1'
CRS-2676: Start of 'ora.crf' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rac1'
CRS-2676: Start of 'ora.crsd' on 'rac1' succeeded
CRS-6017: Processing resource auto-start for servers: rac1
CRS-6016: Resource auto-start has completed for server rac1
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
--hub转换为leaf后状态
[root@rac1 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       rac2                     STABLE
ora.DATA.dg
               ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac2                     STABLE
ora.SYSDB_NEW.dg
               ONLINE  ONLINE       rac2                     STABLE
ora.SYSDG.dg
               ONLINE  ONLINE       rac2                     STABLE
ora.net1.network
               ONLINE  ONLINE       rac2                     STABLE
ora.ons
               ONLINE  ONLINE       rac2                     STABLE
ora.proxy_advm
               ONLINE  ONLINE       rac2                     STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       rac2                     STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       rac2                     169.254.177.226 10.1
                                                             .1.104,STABLE
ora.asm
      1        ONLINE  OFFLINE                               STABLE
      2        ONLINE  ONLINE       rac2                     STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       rac2                     STABLE
ora.gns
      1        ONLINE  ONLINE       rac2                     STABLE
ora.gns.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       rac2                     Open,STABLE
ora.oc4j
      1        ONLINE  ONLINE       rac2                     STABLE
ora.ora12c.db
      1        ONLINE  OFFLINE                               Instance Shutdown,ST
                                                             ABLE
      2        ONLINE  ONLINE       rac2                     Open,STABLE
ora.rac1.vip
      1        ONLINE  INTERMEDIATE rac2                     FAILED OVER,STABLE
ora.rac2.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       rac2                     STABLE
--------------------------------------------------------------------------------
--集群角色
[root@rac1 ~]# crsctl get node role config
Node 'rac1' configured role is 'leaf'
[root@rac2 ~]# crsctl get node role config
Node 'rac2' configured role is 'hub'

leaf转换为hub

--leaf转换为hub
[root@rac1 ~]# crsctl set node role hub
CRS-4408: Node 'rac1' configured role successfully changed; restart Oracle High Availability Services for new role to take effect.
--关闭集群
[root@rac1 ~]# crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.crsd' on 'rac1'
CRS-2677: Stop of 'ora.crsd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.storage' on 'rac1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'rac1'
CRS-2677: Stop of 'ora.storage' on 'rac1' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.crf' on 'rac1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'rac1'
CRS-2673: Attempting to stop 'ora.evmd' on 'rac1'
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.crf' on 'rac1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rac1'
CRS-2677: Stop of 'ora.cssd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1'
CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
--启动集群
[root@rac1 ~]# crsctl start crs -wait
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2672: Attempting to start 'ora.evmd' on 'rac1'
CRS-2676: Start of 'ora.evmd' on 'rac1' succeeded
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2789: Cannot stop resource 'ora.diskmon' as it is not running on server 'rac1'
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'
CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2676: Start of 'ora.asm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'rac1'
CRS-2676: Start of 'ora.storage' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'rac1'
CRS-2676: Start of 'ora.crf' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rac1'
CRS-2676: Start of 'ora.crsd' on 'rac1' succeeded
CRS-6017: Processing resource auto-start for servers: rac1
CRS-2672: Attempting to start 'ora.ons' on 'rac1'
CRS-2673: Attempting to stop 'ora.rac1.vip' on 'rac2'
CRS-2672: Attempting to start 'ora.ASMNET1LSNR_ASM.lsnr' on 'rac1'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'rac2'
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'rac2'
CRS-2677: Stop of 'ora.rac1.vip' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.rac1.vip' on 'rac1'
CRS-2677: Stop of 'ora.scan1.vip' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.scan1.vip' on 'rac1'
CRS-2676: Start of 'ora.rac1.vip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.LISTENER.lsnr' on 'rac1'
CRS-2676: Start of 'ora.scan1.vip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN1.lsnr' on 'rac1'
CRS-2676: Start of 'ora.ASMNET1LSNR_ASM.lsnr' on 'rac1' succeeded
CRS-2676: Start of 'ora.ons' on 'rac1' succeeded
CRS-2676: Start of 'ora.LISTENER.lsnr' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'rac1' succeeded
CRS-2676: Start of 'ora.asm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.proxy_advm' on 'rac1'
CRS-2676: Start of 'ora.proxy_advm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.ora12c.db' on 'rac1'
CRS-2676: Start of 'ora.ora12c.db' on 'rac1' succeeded
CRS-6016: Resource auto-start has completed for server rac1
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
--集群状态
[root@rac1 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.DATA.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.SYSDB_NEW.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.SYSDG.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.net1.network
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.ons
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.proxy_advm
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac1                     STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       rac2                     STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       rac2                     169.254.177.226 10.1
                                                             .1.104,STABLE
ora.asm
      1        ONLINE  ONLINE       rac1                     STABLE
      2        ONLINE  ONLINE       rac2                     STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       rac2                     STABLE
ora.gns
      1        ONLINE  ONLINE       rac2                     STABLE
ora.gns.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       rac2                     Open,STABLE
ora.oc4j
      1        ONLINE  ONLINE       rac2                     STABLE
ora.ora12c.db
      1        ONLINE  ONLINE       rac1                     Open,STABLE
      2        ONLINE  ONLINE       rac2                     Open,STABLE
ora.rac1.vip
      1        ONLINE  ONLINE       rac1                     STABLE
ora.rac2.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       rac1                     STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       rac2                     STABLE
--------------------------------------------------------------------------------
--集群角色
[root@rac1 ~]# crsctl get node role config
Node 'rac1' configured role is 'hub'
[root@rac2 ~]# crsctl get node role config
Node 'rac2' configured role is 'hub'

这里实现了ORACLE 12C RAC的leaf和hub 角色相互转换,在转换的过程中需要转移确认集群和ASM均为flex mode,如果参考相关文档完成转换

OLR相关维护

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:OLR相关维护

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

官方关于OLR描述
OLR is a registry similar to OCR located on each node in a cluster, but contains information specific to each node. It contains manageability information about Oracle Clusterware, including dependencies between various services. Oracle High Availability Services uses this information. OLR is located on local storage on each node in a cluster. Its default location is in the path Grid_home/cdata/host_name.olr, where Grid_home is the Oracle Grid Infrastructure home, and host_name is the host name of the node.
OLR是类似OCR的东西,存储在集群的每个节点本地

查看OLR位置

[root@rac2 cdata]# cd /etc/oracle
[root@rac2 oracle]# ls -l
total 2868
drwxrwx--- 2 root oinstall    4096 Nov 24 20:00 lastgasp
drwxrwxrwt 2 root oinstall    4096 Dec 21 20:51 maps
-rw-r--r-- 1 root oinstall      96 Nov 25 18:38 ocr.loc
-rw-r--r-- 1 root root           0 Nov 24 19:58 ocr.loc.orig
-rw-r--r-- 1 root oinstall      80 Nov 24 19:58 olr.loc
-rw-r--r-- 1 root root           0 Nov 24 19:58 olr.loc.orig
drwxrwxr-x 5 root oinstall    4096 Nov 24 19:57 oprocd
drwxr-xr-x 3 root oinstall    4096 Nov 24 19:57 scls_scr
-rws--x--- 1 root oinstall 2904377 Nov 24 19:57 setasmgid
[root@rac2 oracle]# more olr.loc
olrconfig_loc=/u01/app/12.1.0/grid/cdata/rac2.olr
crs_home=/u01/app/12.1.0/grid
--在部分平台olr.loc文件可能在/var/opt/oracle/目录下
[root@rac2 oracle]#  ocrcheck -config -local
Oracle Local Registry configuration is :
         Device/File Name         : /u01/app/12.1.0/grid/cdata/rac2.olr
[root@rac2 oracle]# ocrcheck -local
Status of Oracle Local Registry is as follows :
         Version                  :          4
         Total space (kbytes)     :     409568
         Used space (kbytes)      :        996
         Available space (kbytes) :     408572
         ID                       :  816087519
         Device/File Name         : /u01/app/12.1.0/grid/cdata/rac2.olr
                                    Device/File integrity check succeeded
         Local registry integrity check succeeded
         Logical corruption check succeeded
[root@rac2 oracle]# ls -l /u01/app/12.1.0/grid/cdata/rac2.olr
-rw------- 1 root oinstall 503484416 Dec 22 12:09 /u01/app/12.1.0/grid/cdata/rac2.olr

查看OLR备份

[root@rac2 oracle]# ocrconfig -local -showbackup
rac2     2013/11/24 20:02:38     /u01/app/12.1.0/grid/cdata/rac2/backup_20131124_200238.olr

备份OLR

[root@rac2 oracle]# ocrconfig -local -manualbackup
rac2     2013/12/22 12:09:33     /u01/app/12.1.0/grid/cdata/rac2/backup_20131222_120933.olr
rac2     2013/11/24 20:02:38     /u01/app/12.1.0/grid/cdata/rac2/backup_20131124_200238.olr
[root@rac2 oracle]# ls -l /u01/app/12.1.0/grid/cdata/rac2/
total 1908
-rw-r--r-- 1 root root  860160 Nov 24 20:02 backup_20131124_200238.olr
-rw-r--r-- 1 root root 1085440 Dec 22 12:09 backup_20131222_120933.olr

OLR异常恢复

--破坏OLR
[root@rac2 oracle]# ls -l /u01/app/12.1.0/grid/cdata/rac2.olr
-rw------- 1 root oinstall 503484416 Dec 22 12:09 /u01/app/12.1.0/grid/cdata/rac2.olr
[root@rac2 oracle]# /u01/app/12.1.0/grid/cdata/rac2.olr /u01/app/12.1.0/grid/cdata/rac2.olr_bak
--关闭crs
[root@rac2 oracle]# crsctl stop crs
--启动crs报错
[root@rac2 oracle]# crsctl start crs
PROCL-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
CRS-4000: Command Start failed, or completed with errors.
--跟踪crs启动
[root@rac2 oracle]# strace crsctl start crs
……
uname({sys="Linux", node="rac2", ...})  = 0
open("/etc/oracle/olr.loc", O_RDONLY)   = 14
fstat(14, {st_mode=S_IFREG|0644, st_size=80, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd8ac628000
read(14, "olrconfig_loc=/u01/app/12.1.0/gr"..., 4096) = 80
read(14, "", 4096)                      = 0
close(14)                               = 0
munmap(0x7fd8ac628000, 4096)            = 0
stat("/u01/app/12.1.0/grid/cdata/rac2.olr", 0x7fffa215a580) = -1 ENOENT (No such file or directory)
--这里可以看到先是读取/etc/oracle/olr.loc,然后获取/u01/app/12.1.0/grid/cdata/rac2.olr失败
……
--确定ohasd.bin关闭
[root@rac2 cdata]# ps -ef|grep ohasd
root     15715 31578  0 14:34 pts/3    00:00:00 grep ohasd
--还原OLR
[root@rac2 oracle]# ocrconfig -local -restore /u01/app/12.1.0/grid/cdata/rac2/backup_20131124_200238.olr
PROTL-35: The configured OLR location is not accessible
[root@rac2 oracle]# cd /u01/app/12.1.0/grid/cdata/
[root@rac2 cdata]# ls
localhost  rac12c-cluster  rac2  rac2.olr_bak
[root@rac2 cdata]# touch rac2.olr
[root@rac2 cdata]# chmod 600 rac2.olr
[root@rac2 cdata]# ocrconfig -local -restore /u01/app/12.1.0/grid/cdata/rac2/backup_20131124_200238.olr
--确定还原成功
[root@rac2 cdata]# ls -l
total 84200
drwxr-xr-x 2 grid oinstall      4096 Nov 24 19:37 localhost
drwxrwxr-x 2 grid oinstall      4096 Dec 22 09:07 rac12c-cluster
drwxr-xr-x 2 grid oinstall      4096 Dec 22 12:09 rac2
-rw------- 1 root root     503484416 Dec 22 14:29 rac2.olr
-rw------- 1 root oinstall 503484416 Dec 22 12:43 rac2.olr_bak
--启动crs
[root@rac2 cdata]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

其他OLR命令

To export OLR to a file:
# ocrconfig –local –export file_name
To import a specified file to OLR:
# ocrconfig –local –import file_name
To view the contents of the OLR file:
ocrdump -local file_name
To view the contents of the OLR backup file:
ocrdump -local -backupfile olr_backup_file_name
To change the OLR backup location:
ocrconfig -local -backuploc new_olr_backup_path

当OLR异常时,RAC节点不能正常启动,而且OLR不像OCR会定时自动备份,建议人工定时备份OLR