联系:手机/微信(+86 17813235971) QQ(107644445)
标题:私网直连后遗症:一节点无法启动导致另外节点haip无法启动
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
该案例为两节点rac(11.2.0.4),private 网络使用直连方式,其中一个节点主机异常无法启动,另外一个节点集群启动发现haip无法正常启动
# crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE xifenfei1 Started
ora.cluster_interconnect.haip >>>> OFFLINE
1 ONLINE OFFLINE
ora.crf
1 ONLINE ONLINE xifenfei1
ora.crsd
1 ONLINE OFFLINE >>>> OFFLINE
ora.cssd
1 ONLINE ONLINE xifenfei1
ora.cssdmonitor
1 ONLINE ONLINE xifenfei1
ora.ctssd
1 ONLINE ONLINE xifenfei1 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.drivers.acfs
1 ONLINE ONLINE xifenfei1
ora.evmd
1 ONLINE INTERMEDIATE xifenfei1
ora.gipcd
1 ONLINE ONLINE xifenfei1
ora.gpnpd
1 ONLINE ONLINE xifenfei1
ora.mdnsd
1 ONLINE ONLINE xifenfei1
alerthostname日志
2018-09-02 10:38:56.767:
[/u01/app/11.2.0/grid/bin/orarootagent.bin(7866)]CRS-5818:Aborted command 'start' for resource 'ora.cluster_interconnect.haip'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/agent/ohasd/orarootagent_root/orarootagent_root.log.
2018-09-02 10:39:00.771:
[ohasd(7495)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.cluster_interconnect.haip'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/ohasd/ohasd.log.
2018-09-02 10:40:00.802:
[/u01/app/11.2.0/grid/bin/orarootagent.bin(7866)]CRS-5818:Aborted command 'start' for resource 'ora.cluster_interconnect.haip'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/agent/ohasd/orarootagent_root/orarootagent_root.log.
2018-09-02 10:40:04.806:
[ohasd(7495)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.cluster_interconnect.haip'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0/grid/log/xifenfei1/ohasd/ohasd.log.
orarootagent_root日志
2018-09-02 10:37:56.805: [ USRTHRD][3650455296]{0:0:2} No HAIP info configured in GPNP, using defaults
2018-09-02 10:37:56.805: [ USRTHRD][3650455296]{0:0:2} The final CIDR subnet 169.254/16
2018-09-02 10:37:56.805: [ default][3650455296]clsvactversion:4: Retrieving Active Version from local storage.
2018-09-02 10:37:56.809: [ USRTHRD][3650455296]{0:0:2} HAIP: mbr num is 0.
[ CLWAL][3650455296]clsw_Initialize: OLR initlevel [70000]
2018-09-02 10:37:56.843: [ USRTHRD][3650455296]{0:0:2} HAIP: initializing to 1 interfaces
2018-09-02 10:37:56.844: [ USRTHRD][3650455296]{0:0:2} HAIP: configured to use 1 interfaces
gipcd.log日志
2018-09-02 10:38:56.787: [ CLSINET][2477147904] Returning NETDATA: 0 interfaces
2018-09-02 10:38:56.988: [GIPCDCLT][2477147904] gipcdClientInterfaceRequest: sent local interface list back to client
2018-09-02 10:38:56.822: [GIPCHDEM][2468742912] gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0x1369730 [0000000000000010] { gipchaContext : host 'xifenfei1', name 'gipcd_ha_name', luid '184dd356-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2018-09-02 10:38:56.822: [GIPCDCLT][2477147904] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceRequest, endp 00000000000002cb
2018-09-02 10:38:56.822: [GIPCDCLT][2477147904] gipcdClientInterfaceRequest: Received type(gipcdmsgtypeInterfaceRequest), endp(00000000000002cb), len(1032), buf(0x7fab858b7a78):[hostname(xifenfei1), retStatus(gipcretSuccess)]
2018-09-02 10:38:56.822: [GIPCDCLT][2477147904] gipcdClientInterfaceQueryToMonitor: enqueue local interface query (2) to worklist
2018-09-02 10:38:56.823: [GIPCDCLT][2477147904] gipcdClientInterfaceRequest: sent local interface query
2018-09-02 10:38:56.823: [GIPCDMON][2472945408] gipcdMonitorCheckXfer: set new infQuery
2018-09-02 10:38:56.831: [ GIPCLIB][2477147904] gipclibSetTraceLevel: to set level to 0
ohasd.log日志
2018-09-02 10:38:52.494: [GIPCHDEM][1878710016]gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0x2749eb0 [0000000000000010] { gipchaContext : host 'xifenfei1', name 'CLSFRAME_oracler-cluster', luid '47624c02-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2018-09-02 10:38:57.255: [ AGFW][3305629440]{0:0:2} Received the reply to the message: RESOURCE_START[ora.cluster_interconnect.haip 1 1] ID 4098:502 from the agent /u01/app/11.2.0/grid/bin/orarootagent_root
2018-09-02 10:38:57.255: [ AGFW][3305629440]{0:0:2} Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.cluster_interconnect.haip 1 1] ID 4098:500
2018-09-02 10:38:57.255: [ CRSPE][3295123200]{0:0:2} Received reply to action [Start] message ID: 500
2018-09-02 10:38:57.256: [ CRSPE][3295123200]{0:0:2} Got agent-specific msg: CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error:
Start action for HAIP aborted. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0/grid/log/xifenfei1/agent/ohasd/orarootagent_root/orarootagent_root.log".
2018-09-02 10:38:57.500: [GIPCHDEM][1878710016]gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0x2749eb0 [0000000000000010] { gipchaContext : host 'xifenfei1', name 'CLSFRAME_oracler-cluster', luid '47624c02-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
检查私网状态,发现eth2网络链路状态为down,由于网络直连,而另外一台机器无法启动
[root@xifenfei1 rules.d]# ethtool eth1
Settings for eth1:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Speed: Unknown!
Duplex: Unknown! (255)
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000007 (7)
drv probe link
Link detected: no ====>网卡链路状态异常
[root@xifenfei1 rules.d]# ifconfig
eth0 Link encap:Ethernet HWaddr 6C:92:BF:2B:7B:36
inet addr:10.10.17.42 Bcast:172.17.17.255 Mask:255.255.255.0
inet6 addr: fe80::6e92:bfff:fe2b:7b36/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 --------->注意
RX packets:234424 errors:0 dropped:0 overruns:0 frame:0
TX packets:160916 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:16926236 (16.1 MiB) TX bytes:24269882 (23.1 MiB)
Memory:91160000-91180000
eth1 Link encap:Ethernet HWaddr 6C:92:BF:2B:7B:37
inet addr:11.1.1.2 Bcast:11.1.1.255 Mask:255.255.255.0
UP BROADCAST MULTICAST MTU:1500 Metric:1 --------->注意少了RUNNING
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Memory:91140000-91160000
关于网卡链路异常导致haip无法启动的mos描述请参考:CRSD & HAIP Resources Remain In OFFLINE as Private Network Interface is Partially Up (Doc ID 1529721.1).该案例是11.2集群私网使用直连引起的直接后遗症(非常不建议集群私网使用直连方式)