RAC默认服务配置优先节点

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:RAC默认服务配置优先节点

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在某些rac情况下需要,需要对数据库默认的db_name对应的服务名进行修改,实现优先节点访问的效果.分析了下在默认值情况下,db_name影响到db_unique_name,然后决定了数据库的service_names.现有库的db_name无法修改,那就只能在db_unique_name上做手脚(只是修改service_names,对应的db_unique_name还是会创建默认服务,这样该服务依旧可以连接).但是在rac环境中db_unique_name记录到了crs资源之中,无法直接在数据库级别修改(修改会报ORA-32017 ORA-65500错误)

SQL> alter system set db_unique_name='nxifenfei' sid='*' scope=spfile;
alter system set db_unique_name='nxifenfei' sid='*' scope=spfile
*
ERROR at line 1:
ORA-32017: failure in updating SPFILE
ORA-65500: could not modify DB_UNIQUE_NAME, resource exists

只能先删除crs中关于db的资源,然后再进行修改服务名,再增加db资源

[oracle@xffdb1 ~]$ srvctl remove database -d xifenfei  -f

SQL> alter system set db_unique_name='nxifenfei' sid='*' scope=spfile;

[oracle@xffdb1 ~]$ srvctl add database -d nxifenfei -o /u01/app/oracle/product/19c/db_1 -p \
  +DATADG/XIFENFEI/PARAMETERFILE/spfile.271.1174153165 -pwfile +DATADG/XIFENFEI/PASSWORD/pwdxifenfei.256.1174152463
[oracle@xffdb1 ~]$ srvctl add instance -d nxifenfei -i xifenfei1 -n xffdb1
[oracle@xffdb1 ~]$ srvctl add instance -d nxifenfei -i xifenfei2 -n xffdb2
[oracle@xffdb1 ~]$ srvctl add instance -d nxifenfei -i xifenfei3 -n xffdb3

创建新服务(和db_name同名,和现在的db_unique_name不同名)

[oracle@xffdb1 ~]$ srvctl add service -db nxifenfei -service xifenfei -r xifenfei2 -a xifenfei1,xifenfei3 \
  -failovertype SESSION -failovermethod BASIC -failoverdelay 10 -failoverretry 3 -failback YES
[oracle@xffdb1 ~]$ srvctl start service -db nxifenfei -service xifenfei

[oracle@xffdb1 ~]$ srvctl config service -d nxifenfei -service xifenfei
Service name: xifenfei
Server pool:
Cardinality: 1
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Global: false
Commit Outcome: false
Failover type: SESSION
Failover method: BASIC
Failover retries: 3
Failover delay: 10
Failover restore: NONE
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: NONE
Edition:
Pluggable database name:
Hub service:
Maximum lag time: ANY
SQL Translation Profile:
Retention: 86400 seconds
Failback :  yes
Replay Initiation Time: 300 seconds
Drain timeout:
Stop option:
Session State Consistency: DYNAMIC
GSM Flags: 0
Service is enabled
Preferred instances: xifenfei2
Available instances: xifenfei1,xifenfei3
CSS critical: no
Service uses Java: false
[grid@xffdb1 ~]$

服务的其他操作

--调整服务的优先节点
srvctl modify service -db nxifenfei -service xifenfei -modifyconfig -preferred "xifenfei1" -available "xifenfei2,xifenfei3"
srvctl stop service -db nxifenfei -service xifenfei 
srvctl start service -db nxifenfei -service xifenfei 

--切换服务所在节点
srvctl relocate service -db nxifenfei -service xifenfei -oldinst xifenfei2 -newinst xifenfei1

--删除服务
srvctl stop service -db nxifenfei -service xifenfei
srvctl remove service -db nxifenfei -service xifenfei 

Oracle 19c RAC 替换私网操作

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle 19c RAC 替换私网操作

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

19c的三节点集群,需要替换一个私网网卡(如果有足够的停机窗口有一个更加简单的方法,直接通过修改网卡名称实现替换,不过需要主机重启一次,参考:Linux 8 修改网卡名称)
1. 先在主机层面确认新配置网络能够相互ping通,在hosts文件加入私网信息,并且确认ssh 可以相互访问

ssh xffdb1-priv3 date;ssh xffdb2-priv3 date;ssh xffdb3-priv3 date;

2. 删除掉需要删除的网络上的asm监听和该network信息

[grid@xffdb1 ~]$ srvctl config listener -asmlistener
Name: ASMNET1LSNR_ASM
Type: ASM Listener
Owner: grid
Subnet: 172.16.16.0
Home: <CRS home>
End points: TCP:1525
Listener is enabled.
Listener is individually enabled on nodes:
Listener is individually disabled on nodes:
Name: ASMNET2LSNR_ASM
Type: ASM Listener
Owner: grid
Subnet: 172.17.17.0
Home: <CRS home>
End points: TCP:1526
Listener is enabled.
Listener is individually enabled on nodes:
Listener is individually disabled on nodes:
[grid@xffdb1 ~]$ srvctl config asmnetwork
ASM network 1 exists
Subnet IPv4: 172.16.16.0//
Subnet IPv6:
Network is enabled
Network is individually enabled on nodes:
Network is individually disabled on nodes:
ASM network 2 exists
Subnet IPv4: 172.17.17.0//
Subnet IPv6:
Network is enabled
Network is individually enabled on nodes:
Network is individually disabled on nodes:
[grid@xffdb1 ~]$

[grid@xffdb3 ~]$ srvctl config asm
ASM home: <CRS home>
Password file: +DATA/orapwASM
Backup of Password file: +DATA/orapwASM_backup
ASM listener: LISTENER
ASM instance count: 3
Cluster ASM listener: ASMNET1LSNR_ASM,ASMNET2LSNR_ASM
[grid@xffdb3 ~]$ crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       xffdb1                   STABLE
               ONLINE  ONLINE       xffdb2                   STABLE
               ONLINE  ONLINE       xffdb3                   STABLE
ora.chad
               ONLINE  ONLINE       xffdb1                   STABLE
               ONLINE  ONLINE       xffdb2                   STABLE
               ONLINE  ONLINE       xffdb3                   STABLE
ora.net1.network
               ONLINE  ONLINE       xffdb1                   STABLE
               ONLINE  ONLINE       xffdb2                   STABLE
               ONLINE  ONLINE       xffdb3                   STABLE
ora.ons
               ONLINE  ONLINE       xffdb1                   STABLE
               ONLINE  ONLINE       xffdb2                   STABLE
               ONLINE  ONLINE       xffdb3                   STABLE
ora.proxy_advm
               OFFLINE OFFLINE      xffdb1                   STABLE
               OFFLINE OFFLINE      xffdb2                   STABLE
               OFFLINE OFFLINE      xffdb3                   STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup)
      1        ONLINE  ONLINE       xffdb1                   STABLE
      2        ONLINE  ONLINE       xffdb2                   STABLE
      3        ONLINE  ONLINE       xffdb3                   STABLE
ora.ASMNET2LSNR_ASM.lsnr(ora.asmgroup)
      1        ONLINE  ONLINE       xffdb1                   STABLE
      2        ONLINE  ONLINE       xffdb2                   STABLE
      3        ONLINE  ONLINE       xffdb3                   STABLE
ora.OCR.dg(ora.asmgroup)
      1        ONLINE  ONLINE       xffdb1                   STABLE
      2        ONLINE  ONLINE       xffdb2                   STABLE
      3        ONLINE  ONLINE       xffdb3                   STABLE
ora.DATADG.dg(ora.asmgroup)
      1        ONLINE  ONLINE       xffdb1                   STABLE
      2        ONLINE  ONLINE       xffdb2                   STABLE
      3        ONLINE  ONLINE       xffdb3                   STABLE
ora.FRADG.dg(ora.asmgroup)
      1        ONLINE  ONLINE       xffdb1                   STABLE
      2        ONLINE  ONLINE       xffdb2                   STABLE
      3        ONLINE  ONLINE       xffdb3                   STABLE
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xffdb2                   STABLE
ora.asm(ora.asmgroup)
      1        ONLINE  ONLINE       xffdb1                   STABLE
      2        ONLINE  ONLINE       xffdb2                   STABLE
      3        ONLINE  ONLINE       xffdb3                   Started,STABLE
ora.asmnet1.asmnetwork(ora.asmgroup)
      1        ONLINE  ONLINE       xffdb1                   STABLE
      2        ONLINE  ONLINE       xffdb2                   STABLE
      3        ONLINE  ONLINE       xffdb3                   STABLE
ora.asmnet2.asmnetwork(ora.asmgroup)
      1        ONLINE  ONLINE       xffdb1                   STABLE
      2        ONLINE  ONLINE       xffdb2                   STABLE
      3        ONLINE  ONLINE       xffdb3                   STABLE
ora.cvu
      1        ONLINE  ONLINE       xffdb2                   STABLE
ora.xffdb1.vip
      1        ONLINE  ONLINE       xffdb1                   STABLE
ora.xffdb2.vip
      1        ONLINE  ONLINE       xffdb2                   STABLE
ora.xffdb3.vip
      1        ONLINE  ONLINE       xffdb3                   STABLE
ora.xifenfei.db
      1        ONLINE  ONLINE       xffdb1                   Open,HOME=/u01/app/o
                                                             racle/product/19c/db
                                                             _1,STABLE
      2        ONLINE  ONLINE       xffdb2                   Open,HOME=/u01/app/o
                                                             racle/product/19c/db
                                                             _1,STABLE
      3        ONLINE  ONLINE       xffdb3                   Open,HOME=/u01/app/o
                                                             racle/product/19c/db
                                                             _1,STABLE
ora.qosmserver
      1        ONLINE  ONLINE       xffdb2                   STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       xffdb2                   STABLE
--------------------------------------------------------------------------------

[grid@xffdb1 peer]$ srvctl update listener -listener ASMNET2LSNR_ASM -asm -remove -force
[grid@xffdb1 peer]$ srvctl remove asmnetwork -netnum 2 -force
PRCR-1028 : Failed to remove resource ora.asmnet2.asmnetwork
PRCR-1072 : Failed to unregister resource ora.asmnet2.asmnetwork
CRS-0245:  User doesn't have enough privilege to perform the operation
[root@xffdb1 ~]# source /home/grid/.bash_profile
[root@xffdb1 ~]# srvctl remove asmnetwork -netnum 2 -force
[root@xffdb1 ~]#
[root@xffdb1 ~]#
[root@xffdb1 ~]# crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       xffdb1                   STABLE
               ONLINE  ONLINE       xffdb2                   STABLE
               ONLINE  ONLINE       xffdb3                   STABLE
ora.chad
               ONLINE  ONLINE       xffdb1                   STABLE
               ONLINE  ONLINE       xffdb2                   STABLE
               ONLINE  ONLINE       xffdb3                   STABLE
ora.net1.network
               ONLINE  ONLINE       xffdb1                   STABLE
               ONLINE  ONLINE       xffdb2                   STABLE
               ONLINE  ONLINE       xffdb3                   STABLE
ora.ons
               ONLINE  ONLINE       xffdb1                   STABLE
               ONLINE  ONLINE       xffdb2                   STABLE
               ONLINE  ONLINE       xffdb3                   STABLE
ora.proxy_advm
               OFFLINE OFFLINE      xffdb1                   STABLE
               OFFLINE OFFLINE      xffdb2                   STABLE
               OFFLINE OFFLINE      xffdb3                   STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup)
      1        ONLINE  ONLINE       xffdb1                   STABLE
      2        ONLINE  ONLINE       xffdb2                   STABLE
      3        ONLINE  ONLINE       xffdb3                   STABLE
ora.OCR.dg(ora.asmgroup)
      1        ONLINE  ONLINE       xffdb1                   STABLE
      2        ONLINE  ONLINE       xffdb2                   STABLE
      3        ONLINE  ONLINE       xffdb3                   STABLE
ora.DATADG.dg(ora.asmgroup)
      1        ONLINE  ONLINE       xffdb1                   STABLE
      2        ONLINE  ONLINE       xffdb2                   STABLE
      3        ONLINE  ONLINE       xffdb3                   STABLE
ora.FRADG.dg(ora.asmgroup)
      1        ONLINE  ONLINE       xffdb1                   STABLE
      2        ONLINE  ONLINE       xffdb2                   STABLE
      3        ONLINE  ONLINE       xffdb3                   STABLE
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xffdb2                   STABLE
ora.asm(ora.asmgroup)
      1        ONLINE  ONLINE       xffdb1                   STABLE
      2        ONLINE  ONLINE       xffdb2                   STABLE
      3        ONLINE  ONLINE       xffdb3                   Started,STABLE
ora.asmnet1.asmnetwork(ora.asmgroup)
      1        ONLINE  ONLINE       xffdb1                   STABLE
      2        ONLINE  ONLINE       xffdb2                   STABLE
      3        ONLINE  ONLINE       xffdb3                   STABLE
ora.cvu
      1        ONLINE  ONLINE       xffdb2                   STABLE
ora.xffdb1.vip
      1        ONLINE  ONLINE       xffdb1                   STABLE
ora.xffdb2.vip
      1        ONLINE  ONLINE       xffdb2                   STABLE
ora.xffdb3.vip
      1        ONLINE  ONLINE       xffdb3                   STABLE
ora.xifenfei.db
      1        ONLINE  ONLINE       xffdb1                   Open,HOME=/u01/app/o
                                                             racle/product/19c/db
                                                             _1,STABLE
      2        ONLINE  ONLINE       xffdb2                   Open,HOME=/u01/app/o
                                                             racle/product/19c/db
                                                             _1,STABLE
      3        ONLINE  ONLINE       xffdb3                   Open,HOME=/u01/app/o
                                                             racle/product/19c/db
                                                             _1,STABLE
ora.qosmserver
      1        ONLINE  ONLINE       xffdb2                   STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       xffdb2                   STABLE
--------------------------------------------------------------------------------
[grid@xffdb2 peer]$ srvctl config listener -asmlistener
Name: ASMNET1LSNR_ASM
Type: ASM Listener
Owner: grid
Subnet: 172.16.16.0
Home: <CRS home>
End points: TCP:1525
Listener is enabled.
Listener is individually enabled on nodes:
Listener is individually disabled on nodes:
[grid@xffdb2 peer]$ srvctl config asmnetwork
ASM network 1 exists
Subnet IPv4: 172.16.16.0//
Subnet IPv6:
Network is enabled
Network is individually enabled on nodes:
Network is individually disabled on nodes:

3. 替换集群私网操作

[grid@xffdb1 ~]$ oifcfg getif
bond0  192.168.20.0  global  public
ens9f0  172.16.16.0  global  cluster_interconnect,asm
ens9f1  172.17.17.0  global  cluster_interconnect,asm
[grid@xffdb1 ~]$ oifcfg setif -global ens6f0np0/172.18.18.0:cluster_interconnect,asm
[grid@xffdb1 ~]$ oifcfg getif
bond0  192.168.20.0  global  public
ens9f0  172.16.16.0  global  cluster_interconnect,asm
ens9f1  172.17.17.0  global  cluster_interconnect,asm
ens6f0np0  172.18.18.0  global  cluster_interconnect,asm
[grid@xffdb1 ~]$ oifcfg delif -global ens9f1/172.17.17.0
[grid@xffdb1 ~]$  oifcfg getif
bond0  192.168.20.0  global  public
ens9f0  172.16.16.0  global  cluster_interconnect,asm
ens6f0np0  172.18.18.0  global  cluster_interconnect,asm
[grid@xffdb1 ~]$ oifcfg delif -global ens9f1/172.17.17.0
[grid@xffdb1 ~]$  oifcfg getif
bond0  192.168.20.0  global  public
ens9f0  172.16.16.0  global  cluster_interconnect,asm
ens6f0np0  172.18.18.0  global  cluster_interconnect,asm

4. 依次重启集群三个节点(ASMNET2LSNR_ASM监听需要人工kill),集群网络替换完成(因为asm listener已经有一个,另外一个私网不准备给他们加上asm listener),如果要增加可以进行如下操作

# srvctl add asmnetwork -netnum 2 -subnet 172.18.18.0
% srvctl add listener -asmlistener -l ASMNET1LSNR_ASM -subnet 172.18.18.0

如何修改集群的公网信息(包括 VIP) (Doc ID 1674442.1)

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:如何修改集群的公网信息(包括 VIP) (Doc ID 1674442.1)

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户端使用 VIP(虚拟 IP)连接 Oracle 数据库的版本为 10g 和 11g 的集群环境。这些虚拟 IP 是和虚拟主机名对应的静态 IP 地址并且通过 DNS 解析(除非您使用了 11gR2 GNS)。

在安装 Oracle 集群管理软件时,用户会被要求为集群中的每一个节点输入一个虚拟 IP 和虚拟主机名。这些信息会被记录在 OCR (Oracle Cluster Registry)中,而且 HA 框架中的不同组件会依赖于这些 VIP。如果出于某种原因,需要修改 VIP、VIP 对应的主机名或者公网的子网、网络掩码等信息,请按照本文介绍的过程。

如果修改涉及到集群私网,请参考 Note 283684.1

情况1.   修改公网对应的主机名

集群公网对应的主机名是在安装时输入的,并且被记录在 OCR 中。这个主机名在安装之后是不能修改的。唯一的修改方法是删除节点,修改主机名,之后将节点重新添加到集群,或者直接重新安装集群软件,并完成后续的克隆配置。

 

情况2.  只修改公网 IP或者VIP, 但是不修改网卡、子网或网络掩码信息,或者只是修改MAC地址,而不需要修改其他信息

如果只需要修改公网 IP 地址或者VIP,而且新的地址仍然在相同的子网和相同的网络接口上,或者只是修改公网IP的MAC地址,IP/interface/subnet/netmask仍旧保持不变,集群层面不需要做任何修改,所有需要的修改是在 OS 层面反映 IP 地址的变化。

1. 关闭 Oracle 集群管理软件
2. 在网络层面,DNS 和 /etc/hosts 文件中修改 IP 地址,或者直接修改MAC地址
3. 重新启动 Oracle 集群管理软件

以上的修改可以使用滚动的方式完成,例如:每次修改一个节点。

 

情况3. 修改公网网卡,子网或网络掩码信息

如果修改涉及到了不同的子网(网络掩码)或者网卡,需要将 OCR 中已经存在的网卡信息删除并重新添加新的信息。

作为grid用户:

在以下的示例中子网从 10.2.156.0 被修改为 10.2.166.0,需要执行两个步骤 –首先‘delif’,接下来 ‘setif’:

% $CRS_HOME/bin/oifcfg/oifcfg delif -global <if_name>[/<subnet>]
% $CRS_HOME/bin/oifcfg/oifcfg setif -global <if_name>/<subnet>:public

例如:
% $CRS_HOME/bin/oifcfg delif -global eth0/10.X.156.0
% $CRS_HOME/bin/oifcfg setif -global eth0/10.X.166.0:public

然后,在操作系统层面进行修改。除非 OS 层面的修改需要重新启动节点,否则不需要重启 Oracle 集群管理软件。修改可以使用滚动的方式完成。

一旦公网信息被修改,与其相关的 VIP 和 SCAN VIP 也需要修改,请参考情况4和情况5。

注意:对于11gR2,上面命令要求集群在所有节点运行,否则报错PRIF-33 和 PRIF-32,比如:
[grid@racnode1 bin]$ ./oifcfg delif -global <if_name>/192.168.1.0
PRIF-33: Failed to set or delete interface because hosts could not be discovered
CRS-02307: No GPnP services on requested remote hosts.
PRIF-32: Error in checking for profile availability for host <nodename>2
CRS-02306: GPnP service on host “<nodename>2″ not found.

 

情况4. 修改 VIP 相关的公网信息

准备修改VIP

一般而言,只有 10.2.0.3 之前的版本需要完全的停机。从 10.2.0.3 开始,ASM 和数据库实例对 VIP 资源的依赖关系已经被删除,所以修改 VIP 不需要停止 ASM 和数据库实例,只有当修改 VIP 时产生的客户端连接会受影响。如果修改只涉及到特定的节点,那么只有连接到对应节点的客户端链接在修改时会受影响。

首先,请参考情况3以确保公网信息被修改。如果在 OS 层面的网络修改后发生了节点或者集群管理软件重启,VIP 将不会被启动,请跳到步骤“修改 VIP 和相关属性”。

获得当前的 VIP 配置

1. 获取当前设置
对于版本 10g 和 11gR1, 使用 Oracle 集群管理软件的拥有者执行下面的命令:

$ srvctl config nodeapps -n <nodename> -a

例如:
$ srvctl config nodeapps -n <nodename>1 -a
VIP exists.: /<nodename>1-vip/101.XX.XX.184/255.255.254.0/<if_name>
对于版本 11gR2, 使用 Grid Infrastructure 的拥有者执行下面的命令:

$ srvctl config nodeapps -a

例如:
$ srvctl config nodeapps -a
Network exists: 1/101.17.80.0/255.255.254.0/<if_name>, type static
VIP exists: /racnode1-vip/101.17.XX.184/101.17.80.0/255.255.254.0/<if_name>, hosting node <nodename>1
VIP exists: /racnode2-vip/101.17.XX.186/101.17.80.0/255.255.254.0/<if_name>, hosting node <nodename>2
2. 验证 VIP 状态

版本 10.2 和 11.1:
$ crs_stat -t

版本 11.2:
$ crsctl stat res -t
- 以上命令应该显示 VIPs 状态为 ONLINE

$ ifconfig -a
(HP 平台请使用 netstat –in, Windows 平台请使用 ipconfig /all)
- VIP 逻辑网卡对应公网网卡

 

停止资源

3. 停止 nodeapps 资源 (如果有必要的话,停止存在依赖关系的 ASM 和数据库资源):

对于版本 10g 和 11gR1, 使用 Oracle 集群管理软件的拥有者执行下面的命令:

$ srvctl stop instance -d <db_name> -i <inst_name>   (对于 10.2.0.3 及以上版本,可以忽略)
$ srvctl stop asm -n <node_name>                     (对于 10.2.0.3 及以上版本,可以忽略)
$ srvctl stop nodeapps -n <node_name>

例如:
$ srvctl stop instance -d <DBNAME> -i <INSTANCENAME>1
$ srvctl stop asm -n <nodename>1
$ srvctl stop nodeapps -n <nodename>1
对于版本 11gR2, 使用 Grid Infrastructure 的拥有者执行下面的命令:

$ srvctl stop instance -d <db_name> -n <node_name>   (可以忽略)
$ srvctl stop vip -n <node_name> -f

例如:
$ srvctl stop instance -d <DBNAME> -n <nodename>1
$ srvctl stop vip -n <nodename>1 -f

 

注意1: 对于版本 11gR2,需要使用 -f 选项停止 listener 资源,否则会报以下错误:
PRCR-1014 : Failed to stop resource ora.<nodename>1.vip
PRCR-1065 : Failed to stop resource ora.<nodename>1.vip
CRS-2529: Unable to act on ‘ora.<nodename>1.vip’ because that would require stopping or relocating ‘ora.LISTENER.lsnr’, but the force option was not specified

4. 验证 VIP 现在处于 OFFLINE 状态,并且 VIP 不再绑定到公网网卡

$ crs_stat -t (对于版本 11gR2,使用命令 $ crsctl stat res –t)

$ ifconfig -a
(HP 平台请使用 netstat –in, Windows 平台请使用 ipconfig /all)

 

修改 VIP 和相关属性

5. 确定新的 VIP 地址/子网/网络掩码或者 VIP 对应的主机名,在 OS 层面修改网络配置信息,确认新的 VIP 地址应经注册到 DNS 或者确认 /etc/hosts 文件(Unix/Linux 平台),\WINDOWS\System32\drivers\etc\hosts 文件(Windows平台)已经被修改。如果网卡信息被修改,确认在修改之前新的网卡在服务器上已经可用。

例如:
新VIP 地址:110.XX.XX.11 <nodename>1-nvip
新子网信息:110.11.70.0
新网络掩码:255.255.255.0
新网卡:<if_name>
6. 使用 root 用户修改 VIP 资源:

如果子网或网卡接口发生变化,请修改网络资源。
检查文档的语法是否正确,因为每个“srvctl modify network”命令选项可能不同。

srvctl modify network [-netnum network_number] [-subnet subnet/netmask
[/if1[|if2|...]]]

通过发出获取network_number

srvctl config network
# srvctl modify nodeapps -n <node> -A <new_vip_address or new_vip_hostname>/<netmask>/<[if1[if2...]]>

例如:
# srvctl modify nodeapps -n <nodename>1 -A <nodename>1-nvip/255.255.255.0/<if_name>

 

注意 1:从版本 11.2 开始,VIP 依赖于 network 资源(ora.net1.network),OCR 只记录 VIP 主机名或者 VIP 资源相关的 IP 地址。集群公网的属性(子网/网络掩码)通过网络资源记录。当 nodeapps 资源被修改后,network资源(ora.net1.network)相关的属性也会随之被修改。

从 11.2.0.2 开始,如果只修改子网/网络掩码信息,网络资源可以通过以下的 srvctl modify network 命令直接修改。

使用 root 用户:
# srvctl modify network -k <network_number>] [-S <subnet>/<netmask>[/if1[|if2...]]

例如:
# srvctl modify network -k 1 -S 110.XX.XX.0/255.255.255.0/<if_name>

如果其他属性没有变化,不需要修改 VIP 或 SCAN VIP。

注意 2:在12.1.0.1的版本上由于Bug 16608577 – CANNOT ADD SECOND PUBLIC INTERFACE IN ORACLE 12.1 ,srvctl modify network 的命令会失败并提示:

# srvctl modify network -k 1 -S 110.XX.XX.0/255.255.255.0/<if_name>
PRCT-1305 : The specified interface name “<if_name>2″ does not match the existing network interface name “<if_name>1″

需要通过以下workaround来解决:

# srvctl modify network -k 1 -S 110.XX.XX.0/255.255.255.0
# srvctl modify network -k 1 -S 110.XX.XX.0/255.255.255.0/<if_name>2

 

* 一个 11gR2 修改 VIP 主机名,但是不修改 IP 地址的例子。

例如:只把 VIP 主机名从 <nodename>1-vip 修改为 <nodename>1-nvip,IP 地址和其他属性保持不变。

如果 IP 地址保持不变,以上的命令将不会改变命令‘crsctl stat res ora.<nodename>1.vip -p’的输出中项目 USR_ORA_VIP 的值。请使用下面的命令:
# crsctl modify res ora.<nodename>1.vip -attr USR_ORA_VIP=<nodename>1-nvip

验证项目 USR_ORA_VIP 的改变:
# crsctl stat res ora.<nodename>1.vip -p |grep USR_ORA_VIP

 

注意:对于 Windows 平台,如果网卡名中包含了空格,那么网卡名需要包含在双引号(“)中。例如:
使用管理员用户或者软件安装用户:
> srvctl modify nodeapps -n <nodename>1 -A 110.XX.XX.11/255.255.255.0/”Local Area Connection 1″
7. 验证改变

$ srvctl config nodeapps -n <node> -a (10g and 11gR1)
$ srvctl config nodeapps -a (11gR2)

例如:
$ srvctl config nodeapps -n <nodename>1 -a
VIP exists.: /<nodename>1-nvip/110.11.70.11/255.255.255.0/<if_name>2

8. fpp 或 rhp 等其他工具使用此文件,需要保持最新。

允许编辑网络资源。

cd $GI_HOME/crs/install and backup the crsconfig_params file
cp $GI_HOME/crs/install/crsconfig_params $GI_HOME/crs/install/crsconfig_params.orig

然后以root用户编辑文件

vi $GI_HOME/crs/install/crsconfig_params

仅更改 CRS_NODEVIP、NETWORKS 和 NEW_NODEVIPS 的地址和子网地址

保存文件。

 

重新启动资源

9. 启动 nodeapps 和其它资源

对于版本 10g 和 11gR1, 使用 Oracle 集群管理软件的拥有者执行下面的命令:

$ srvctl start nodeapps -n <node_name>
$ srvctl start asm -n <node_name>               (对于 10.2.0.3 及以上版本,可以忽略)
$ srvctl start instance -d <dbanme> -i <inst>   (对于 10.2.0.3 及以上版本,可以忽略)

例如:
$ srvctl start nodeapps -n <nodename>1
$ srvctl start asm -n <nodename>1
$ srvctl start instance -d <DBNAME> -i <INSTANCE_NAME>1

对于版本 11gR2, 使用 Grid Infrastructure 的拥有者执行下面的命令:

$ srvctl start vip -n <node_name>
$ srvctl start listener -n <node_name>
$ srvctl start instance -d <db_name> -n <node_name> (可以忽略)

例如:

$ srvctl start vip -n <nodename>1
$ srvctl start listener -n <nodename>1
$ srvctl start instance -d <DBNAME> -n <nodename>1

注意:如果网络的属性做了修改,比如netmask 等做了修改,需要重新启动nodeapps

 
10. 验证新的 VIP 状态为 ONLINE 并且已经绑定到集群公网网卡

$ crs_stat -t (对于版本 11gR2,使用命令 $ crsctl stat res –t)

$ ifconfig -a
(HP 平台请使用 netstat –in, Windows 平台请使用 ipconfig /all)
11. 如果集群中的其它节点也需要类似的改变,请重复同样的步骤。

其它

12. 如果需要,修改 listener.ora, tnsnames.ora 和 LOCAL_LISTENER/REMOTE_LISTENER 参数反应 VIP 的改变。

注意: ASM和DB实例的LOCAL_LISTENER参数,是GI自动设置的。VIP的改变,LOCAL_LISTENER会自动识别,并生效。但是由于Bug 22824602,一些特定情况下。 LOCAL_LISTENER参数没有反应 VIP 的改变。workaround解决方法是重启受影响的节点的集群软件。

 

情况5:修改 SCAN VIP 相关的公网信息

对于 11gR2 Grid Infrastructure,客户端可以通过 SCAN 和 SCAN VIP 连接数据库。请参考下面的 Note 修改 SCAN VIP。

Note 952903.1 How to update the IP address of the SCAN VIP resources (ora.scan<n>.vip)
Note 972500.1 How to Modify SCAN Setting or SCAN Listener Port after Installation

 

如何在 oracle 集群环境下修改私网信息 (Doc ID 2103317.1)

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:如何在 oracle 集群环境下修改私网信息 (Doc ID 2103317.1)

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

Oracle 集群中的网络信息(接口,子网及每个网卡的角色)都可以被’oifcfg’ 命令管理, 但是没有网卡的IP地址除外,oifcfg 命令不能修改IP地址信息. ‘oifcfg getif’ 命令可以用来显示OCR中当前网卡的配置信息:

% $CRS_HOME/bin/oifcfg getif
<interfacename>0 10.X.XXX.0 global public
<interfacename>1 192.XXX.X.0 global cluster_interconnect

在 Unix/Linux 系统中,网卡名字是被系统自动分配的,依据系统平台的不同而不同。对于 windows 系统,请参考下面的附带的文档. 上面的例子显示当前网卡 <interfacename>0 被用作公网并且子网为 10.2.156.0 <interfacename>1 被用作集群私网,子网为192.168.0.0。

‘公有’ 网络是服务器与客户端之间的通信(与 VIP 使用相同的网段并以不同的记录存储在 OCR 中),与之对应的’cluster_interconnect’网络是用来在 RDBMS/ASM 节点间缓存融合。从 11gR2 开始,cluster_interconnect 同时被用作集群间的心跳,相对于 11gR2 以前的版本,当配置集群心跳信息时指定主机名而言,这是一个标志性的改变。

如果私有网卡的子网或接口名字配置不正确,我们需要使用 crs/grid 用户来更改。

 

例1: 更改私有主机名

在 11.2 oracle clusterware 之前的版本,私有主机名被记录在 OCR 中, 它不能被更改,一般情况下私有主机名是不需要改变的,它附属的 ip 可以被更改,只有使用删除/添加节点或重新安装 oracle clusterware 来更改私有主机名。

在 11.2 Grid 结构中,私有主机名不在记录在 OCR 中,并且不存在依赖关系,所以它可以在 /etc/hosts 文件中任意更改。

 

例2:只更改私有 ip 地址不更改网卡、子网及子网掩码信息

举例,私有 ip 地址从 192.XXX.X.10 更改至 192.XXX.X.21,网卡名字及子网保持不变。或者只改变MAC地址,保持private IP address/interface name/subnet/network不变

只要关闭需要更改主机上的 oracle clusterware 软件,在操作系统层,根据需求更改私有 ip 地址或者MAC地址(如:/etc/hosts,OS network config 等等), 再重启启动 oracle clusterware 软件即可。

 

例3:只改变私有网络的 MTU 值

举例, 将私有网络 MTU 值从1500更改至9000(激活 jumbo frame),网卡名字保持不变。

1. 关闭集群中的所有节点。
2. 在操作系统层更改 MTU 需要设定的值,确保更改后 MTU 值的私有网卡可用并且可以 ping 通集群中的所有节点。
3. 重启所有节点的集群管理软件。

 

例4:更改私有网卡名字,子网及掩码

提示:当子网掩码被更改,但是子网标识没有改变时,如:
子网掩码从 255.255.0.0 更改至 255.255.255.0,私网 ip 为 192.168.0.x,子网标识保持不变 192.168.0.0,网卡名字没有改变.关闭所有需要更改的主机 oracle clusterware,在操作系统层修改私有网络IP地址(如:操作系统 网络配置等等)。并重启集群中所有节点,请注意,这种更改是不能采用轮转方式(rolling manaer)完成的。

当子网掩码被改变,附属的子网标识也经常会被改变,oracle 在 OCR 中只存储网卡名字及子网标识的信息,而不存储子网掩码。可以使用 oifcfg 命令完成这样的变更,oifcfg 命令只需在集群中的一个节点执行,而不是所有节点。

 

A. 对于 11gR2 以前的集群管理软件

1. 使用 oifcfg 命令添加新的私有网络信息,删除旧的私有网络信息:

% $ORA_CRS_HOME/bin/oifcfg/oifcfg setif -global <if_name>/<subnet>:cluster_interconnect
% $ORA_CRS_HOME/bin/oifcfg/oifcfg delif -global <if_name>[/<subnet>]]

举例:
% $ORA_CRS_HOME/bin/oifcfg setif -global <interfacename>3/192.168.2.0:cluster_interconnect
% $ORA_CRS_HOME/bin/oifcfg delif -global <interfacename>1/192.168.1.0

校验结果
% $ORA_CRS_HOME/bin/oifcfg getif
eth0 10.X.XXX.0 global public
eth3 192.XXX.2.0 global cluster_interconnect

2. 关闭 Oracle Clusterware

使用 root 用户执行: # crsctl stop crs

3. 在操作系统层面更改网络配置,修改集群内所有节点的 /etc/hosts 文件,确保集群内所有节点新的网络设置都已生效:

% ping <private hostname/IP>
% ifconfig -a  on Unix/Linux

% ipconfig /all on windows

4. 重新启动 Oracle Clusterware

以 root 用户:# crsctl start crs

提示:如果在 linux 系统上正在运行 OCFS2,则可能还需要更改 OCFS2 运行在其它节点的私有 ip 地址. 更多详细的信息,请参考: Note 604958.1

 

B. 对于 11gR2 和 12c 上没有使用 flex ASM 的版本

针对于 11.2 的结构,私有网络配置信息不但保存在 OCR 中,而且还保存在 gpnp 属性文件中。如果私有网络不可用或定义错误,则 CRSD 进程将无法启动,任何随后对于 OCR 的改变都是不可能完成的,因此需要注意当对私有网络的配置信息进行修改,正确的改变顺序是非常重要的。同时请注意,手动修改 gpnp 属性文件是不支持的。

在对集群中所有节点操作之前,请先备份 profile.xml 配置文件。作为 grid 用户执行:
$ cd $GRID_HOME/gpnp/<hostname>/profiles/peer/
$ cp -p profile.xml profile.xml.bk
1. 确保集群中的所有节点都已启动并正常运行

2. 使用 grid 用户:

获取下面信息, 例如:

$ oifcfg getif
<interfacename>1 100.17.10.0 global public
<interfacename>0 192.168.0.0 global cluster_interconnect
加入新的集群私网通讯信息:

$ oifcfg setif -global <interface>/<subnet>:cluster_interconnect

例如:
a. 加入新的并有相同子网的接口卡 bond0
$ oifcfg setif -global bond0/192.168.0.0:cluster_interconnect

b. add a new subnet with the same interface name but different subnet or new interface name
$ 添加一个新的子网具有相同网卡的名称但不同的子网或新的网卡名

$ oifcfg setif -global eth3/192.168.1.96:cluster_interconnect

 

1. 如果网卡不可用,需要使用 –global 选项来完成,而不能使用 –node 选项,它将导致节点被驱逐。

2. 如果网卡在服务器上可用,则可以使用下面命令识别子网地址:
$ oifcfg iflist

它列出了网卡及子网地址,即使 oracle 集群没有启动,此命令也可以被执行。请注意,子网掩码有可能不是 x.y.z.0 的格式,它可以是 x.y.z.24,x.y.z.64 或 x.y.z.128 等格式。如:
$ oifcfg iflist
lan1 18.1.2.0
lan2 10.2.3.64        << 这是一个私有网络子网地址附属的私有网络 ip 地址为 10.2.3.XX

3. 如果需要添加第二个私有网络,而不是替换现有的私有网络,则需要保证两个网卡的 MTU 值相同,否则实例将无法启动并报如下错误信息:
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if MTU failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpcini2
ORA-27303: additional information: requested interface lan1:801 has a different MTU (1500) than lan3:801 (9000), which is not supported. Check output from ifconfig command

4. 对于 11gR2 或更高版本, 不建议在 ASM 或 database 的 spfile 或 pfile 中设置 cluster interconnects 参数。无论什么原因如果设置了该参数,则需要在集群关闭之前需将新的私网 ip 地址设置在 spfile 或 pfile 中,否则它会由于私网信息不匹配而导致重启失败。
校验更改后的值:

$ oifcfg getif
3. 使用 root 用户关闭集群中所有的节点并禁用集群:

# crsctl stop crs
# crsctl disable crs
4. 使网络配置信息都已在 OS 层更改完成,确保更改完成后新的接口在所有的节点都可用有效:

$ ifconfig -a
$ ping <private hostname>
5. 使用 root 用户激活 oracle 集群并重新启动集群中的所有节点:

# crsctl enable crs
# crsctl start crs
6. 如果需要去除旧接口卡信息:

$ oifcfg delif -global <interface>[/<subnet>]
例如:
$ oifcfg delif -global eth0/192.168.0.0

 

C. 对于 12C和18C flex ASM 结构

请检查上面部分B,并关注提示部分,按下面命令做备份:

在对集群中所有节点操作之前,请先备份 profile.xml 配置文件。 作为 grid 用户执行:
$ cd $GRID_HOME/gpnp/<hostname>/profiles/peer/
$ cp -p profile.xml profile.xml.bk

1. 确保 oracle 集群中的所有节点都已正常运行。

2. 使用 grid 用户:

得到现有信息,如下:

$ oifcfg getif
<interfacename>1 100.17.10.0 global public
<interfacename>0 192.168.0.0 global cluster_interconnect,asm

上面例子显示网卡 ech0 被用作集群私网和 ASM 网络。

加入新的集群私网信息:

$ oifcfg setif -global <interface>/<subnet>:cluster_interconnect[,asm]

如:
a. 加入一个新的具有相同子网网卡 bond0
$ oifcfg setif -global bond0/192.168.0.0:cluster_interconnect,asm

b. 加入一个新的并具有相同网卡名字的子网,或不同子网和具有新的接口名字
$ oifcfg setif -global eth0/192.68.10.0:cluster_interconnect,asm

$ oifcfg setif -global eth3/192.168.1.96:cluster_interconnect,asm

如果有不同的网络用于私有网络和 ASM 网络,则可以对其进行相应的调整。

3. 当 ASMLISTENER 正被用作私有网络,如果对其修改则会影响 ASMLISTENER。需要添加一个新的 ASMLISTENER 及新的网络配置。如果 ASM 的子网网络没有改变则跳过这一步。

3.1. 加入一个新的 ASMLISTENE(例:ASMNEWLISNR_ASM)及新的子网,使用 grid 用户:

$ srvctl add  listener -asmlistener -l <new ASM LISTENER NAME> -subnet <new subnet>

如:
$ srvctl add listener -asmlistener -l ASMNEWLSNR_ASM -subnet 192.168.10.0

3.2. 删除现有的 ASMLISTENER(这个例子中 ASMLSNR_ASM)并去除依赖关系,使用 grid 用户:

$ srvctl update listener -listener ASMLSNR_ASM -asm -remove -force
$ lsnrctl stop ASMLSNR_ASM
应当显示:  TNS-01101: Could not find listener name or service name ASMLSNR_ASM

 

注意. 需要使用 –force 选项,否则会出现下面错误:

$ srvctl update listener -listener ASMLSNR_ASM -asm -remove
PRCR-1025 : Resource ora.ASMLSNR_ASM.lsnr is still running
$ srvctl stop listener -l ASMLSNR_ASM
PRCR-1065 : Failed to stop resource ora.ASMLSNR_ASM.lsnr
CRS-2529: Unable to act on ‘ora.ASMLSNR_ASM.lsnr’ because that would require stopping or relocating ‘ora.asm’, but the force option was not specified
3.3 校验配置信息:

$ srvctl config listener -asmlistener
$ srvctl config asm
4. 使用 root 用户关闭集群中的所有节点并禁用集群:

# crsctl stop crs
# crsctl disable crs

5. 在操作系统层面更改网络配置,更改之后,确保所有节点上的新网卡生效:

$ ifconfig -a
$ ping <private hostname>

6. 使用 root 用户激活 oracle 集群并重新启动集群中的所有节点:

# crsctl enable crs
# crsctl start crs

7. 删除旧的网卡信息:

$ oifcfg delif -global <interface>[/<subnet>]
如:
$ oifcfg delif -global <interfacename>0/192.168.0.0

 

 

关于 19.X 的一些注意事项

在19c上存在一个新的资源 ora.asmnet1.asmnetwork 也需要更新

srvctl remove asmnetwork -netnum 1
srvctl add asmnetwork -netnum 1 -subnet XXX

在删除资源时可能碰到下面的错误

$ srvctl remove asmnetwork -netnum 1

PRCR-1025 : Resource ora.asmnet1.asmnetwork is still running

或者

$ srvctl remove asmnetwork -netnum 1

PRCR-1028 : Failed to remove resource ora.asmnet1.asmnetwork
PRCR-1072 : Failed to unregister ora.asmnet1.asmnetwork
CRS-2730: Resource ‘ora.ASMNET1LSNR_ASM.lsnr’ depends on resource ‘ora.asmnet1.asm

要避免这样的错误,可以在删除时加上-force选项

srvctl remove asmnetwork -netnum 1 -force

 

关于 11gR2 的一些注意事项
1. 如果底层网络配置已经更改, 但是 oifcfg 尚未执行同样的变更,则重启 oracle 集群会导致 crsd 进程不能启动。

crsd.log 日志将会显示如下:

2010-01-30 09:22:47.234: [ default][2926461424] CRS Daemon Starting
..
2010-01-30 09:22:47.273: [ GPnP][2926461424]clsgpnp_Init: [at clsgpnp0.c:837] GPnP client pid=7153, tl=3, f=0
2010-01-30 09:22:47.282: [ OCRAPI][2926461424]clsu_get_private_ip_addresses: no ip addresses found.
2010-01-30 09:22:47.282: [GIPCXCPT][2926461424] gipcShutdownF: skipping shutdown, count 2, from [ clsinet.c : 1732], ret gipcretSuccess (0)
2010-01-30 09:22:47.283: [GIPCXCPT][2926461424] gipcShutdownF: skipping shutdown, count 1, from [ clsgpnp0.c : 1021], ret gipcretSuccess (0)
[ OCRAPI][2926461424]a_init_clsss: failed to call clsu_get_private_ip_addr (7)
2010-01-30 09:22:47.285: [ OCRAPI][2926461424]a_init:13!: Clusterware init unsuccessful : [44]
2010-01-30 09:22:47.285: [ CRSOCR][2926461424] OCR context init failure. Error: PROC-44: Error in network address and interface operations Network address and interface operations error [7]
2010-01-30 09:22:47.285: [ CRSD][2926461424][PANIC] CRSD exiting: Could not init OCR, code: 44
2010-01-30 09:22:47.285: [ CRSD][2926461424] Done.

以上错误显示操作系统层面的设置(oifcfg iflist)与 gpnp profile.xml 配置文件设置不匹配。

解决方法:恢复操作系统网络配置到最初的状态,启动 oracle 集群,然后再按照上面的步骤重新更改。

如果底层的网络并没有改变,但 oifcfg 已经被设置了一个错误的子网地址或接口名字,则会发生同样的问题。

2. 如果集群中的任何一个节点关闭,oifcfg 命令将会失败并显示错误:

$ oifcfg setif -global bond0/192.168.0.0:cluster_interconnect
PRIF-26: Error in update the profiles in the cluster

解决方案:启动 oracle 集群中没有运行的节点,确保集群中所有的节点都已启动,如果由于操作系统原因不能启动的节点,请先将此节点从集群中删除在执行私网网络变更。

3. 如果执行上面命令的的用户非 GI 的拥有者,则会出现相同的错误:

$ oifcfg setif -global bond0/192.168.0.0:cluster_interconnect
PRIF-26: Error in update the profiles in the cluster

解决方案:确保使用 GI 的拥有者登录并执行上面命令。

4. 从 11.2.0.2 开始,如果在没有加入一个新私有网卡,就试图删除最后一个私有网卡(集群私网)则会发生下面错误:

PRIF-31: Failed to delete the specified network interface because it is the last private interface

解决方案:在删除旧的私有网卡之前,先加入新的私有网卡。

5. 如果主机节点的 oracle 集群关闭在关闭状态,则会报下面错误:

$ oifcfg getif
PRIF-10: failed to initialize the cluster registry

解决方案:启动该主机节点上的 oracle 集群软件。

 

关于 Windows 系统注意事项

更改网卡的语法在 windows/RAC 和Unix/Linux 集群是一样的,但是网卡名称会略有不同,在 windows 系统上,默认分配给接口通常的名称为:

Local Area Connection
Local Area Connection 1
Local Area Connection 2

如果使用一个网卡名称含有空格,则名称必须使用引号括起来,同时,请注意这是区分大小写的。例如,在 windows上,设置集群私网链接:

C:\oracle\product\10.2.0\crs\bin\oifcfg setif -global “Local Area Connection 1″/192.168.1.0:cluster_interconnect

然而,在 windows 上重新命名网卡按最佳实践更有意义,如重命名为”ocwpublic” 和”ocwprivate”。如果 oracle 集群安装完成后需要更改网卡名字,则需要运行”oifcfg”命令来添加新的网卡并删除旧的。综上所述。

您可以运行下面命令查看每个节点上可用的网卡名字。

oifcfg iflist -p -n

必须在每个节点上运行这个命令来验证网卡名称相同的定义。

使用 oifcfg 命令更改网卡名字的影响

对于私网网卡,数据库将使用存储在 OCR 中定义为集群互联的网卡作为节点间缓存融合通信。在告警日志开始的时候,就会显示集群互联有效的信息。在参数清单。例如:

For pre 11.2.0.2:
Cluster communication is configured to use the following interface(s) for this instance
192.XXX.X.1

For 11.2.0.2+: (HAIP address will show in alert log instead of private IP)
Cluster communication is configured to use the following interface(s) for this instance
169.XXX.XX.97

如果上面信息不正确,则实例需要重启以便 OCR 条目修正,这同样适用于 ASM 实例和数据库实例。在 windows 系统上,实例被关闭后,在 OCR 将被重读之前,还需要停止/启动 OracleService < SID >(或 OracleASMService < ASMSID > 。

 

Oifcfg 命令用法

查看 oifcfg 命令的全部选项,只需输入:

$ <CRS_HOME>/bin/oifcfg

 

更新 crsconfig_params

一些工具比如 fpp 或者 rhp 需要使用这个文件,因此它需要保持最新。

允许修改网络资源信息

cd $GI_HOME/crs/install 并且备份文件 crsconfig_params
cp $GI_HOME/crs/install/crsconfig_params $GI_HOME/crs/install/crsconfig_params.orig

然后以root用户修改此文件

vi $GI_HOME/crs/install/crsconfig_params

只修改CRS_NODEVIP, NETWORKS, 以及 NEW_NODEVIPS对应的网址和子网信息

保存文件

例5 对于 11gR2 或以上版本的 HAIP 添加或删除集群私网

1. 添加另外的私有网络到现有的使用 HAIP 的集群中,作为 grid 用户执行:

$ oifcfg setif -global <interface>/<subnet>:cluster_interconnect

例如:

$ oifcfg setif -global <interfacename>/192.XXX.XX.0:cluster_interconnect

关闭 CRS 中的所有节点,通过重新启动 crs 中的所有节点,来使 HAIP 读入新的接口,不能使用滚动方式重启。

2. 在使用 HAIP 的集群中删除私有网络,作为 grid 用户执行:

$ oifcfg delif -global <nterface>

例如:
$ oifcfg delif -global <interfacename>

HAIP 将切换至其它可用接口,在接口被删除后,集群/数据库会继续采用此方式运行。

删除多余的 HAIP 接口,应关闭 CRS 所有节点,然后重启 CRS 所有节点。不能采用以滚动的方式重新启动 CRS。

本来来自:如何在 oracle 集群环境下修改私网信息 (Doc ID 2103317.1)

RAC主机相差超过10分钟导致crs无法启动

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:RAC主机相差超过10分钟导致crs无法启动

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户反馈有一套19c 2节点rac,断电之后,一个节点数据库无法正常启动,通过crsctl命令查看发现crs进程没有正常启动

[root@xifenf1 ~]# /u01/app/19.0/grid/bin/crsctl status res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                               STABLE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.crf
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.crsd
      1        ONLINE  OFFLINE                               STABLE
ora.cssd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.ctssd
      1        ONLINE  OFFLINE                               STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.evmd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.gipcd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.gpnpd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.mdnsd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.storage
      1        ONLINE  ONLINE       xifenf1                  STABLE
--------------------------------------------------------------------------------

查看crs的alert日志发现集群时间间隔超过600s,无法启动csst进程

2024-06-11 17:33:09.953 [OCSSD(5020)]CRS-1605: CSSD voting file is online: /dev/asm_ocr5; 
 details in /u01/app/grid/diag/crs/xifenf1/crs/trace/ocssd.trc.
2024-06-11 17:33:09.956 [OCSSD(5020)]CRS-1605: CSSD voting file is online: /dev/asm_ocr1; 
 details in /u01/app/grid/diag/crs/xifenf1/crs/trace/ocssd.trc.
2024-06-11 17:33:10.024 [OCSSD(5020)]CRS-1605: CSSD voting file is online: /dev/asm_ocr2; 
 details in /u01/app/grid/diag/crs/xifenf1/crs/trace/ocssd.trc.
2024-06-11 17:33:10.031 [OCSSD(5020)]CRS-1605: CSSD voting file is online: /dev/asm_ocr4; 
 details in /u01/app/grid/diag/crs/xifenf1/crs/trace/ocssd.trc.
2024-06-11 17:33:10.040 [OCSSD(5020)]CRS-1605: CSSD voting file is online: /dev/asm_ocr3; 
 details in /u01/app/grid/diag/crs/xifenf1/crs/trace/ocssd.trc.
2024-06-11 17:33:11.900 [OCSSD(5020)]CRS-1601: CSSD Reconfiguration complete. Active nodes are xifenf1 xifenf2 .
2024-06-11 17:33:13.344 [OCSSD(5020)]CRS-1720: Cluster Synchronization Services daemon (CSSD) is ready for operation.
2024-06-11 17:33:13.809 [OCTSSD(5488)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 5488
2024-06-11 17:33:16.017 [OCTSSD(5488)]CRS-2407: The new Cluster Time Synchronization Service reference node is host xifenf2.
2024-06-11 17:33:16.018 [OCTSSD(5488)]CRS-2401: The Cluster Time Synchronization Service started on host xifenf1.
2024-06-11 17:33:16.105 [OCTSSD(5488)]CRS-2419: The clock on host xifenf1 differs from mean cluster time by 1031504618 microseconds. 
  The Cluster Time Synchronization Service will not perform time synchronization 
  because the time difference is beyond the permissible offset of 600 seconds. 
  Details in /u01/app/grid/diag/crs/xifenf1/crs/trace/octssd.trc.
2024-06-11 17:33:16.579 [OCTSSD(5488)]CRS-2402: The Cluster Time Synchronization Service aborted on host xifenf1. 
  Details at (:ctsselect_mstm4:) in /u01/app/grid/diag/crs/xifenf1/crs/trace/octssd.trc.

查看主机时间

[grid@xifenf1 ~]$ date ;ssh xifenf2 date
Tue Jun 11 17:54:09 CST 2024
Tue Jun 11 18:04:34 CST 2024

修改主机时间

[root@xifenf1 ~]# date -s "20240611 18:06:00"
Tue Jun 11 18:06:00 CST 2024
[root@xifenf1 ~]# su - grid
Last login: Tue Jun 11 17:37:53 CST 2024 on pts/0
[grid@xifenf1 ~]$ date ;ssh xifenf2 date
Tue Jun 11 18:06:09 CST 2024
Tue Jun 11 18:05:34 CST 2024

重启crs

[root@xifenf1 ~]# /u01/app/19.0/grid/bin/crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'xifenf1'
CRS-2673: Attempting to stop 'ora.storage' on 'xifenf1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'xifenf1'
CRS-2673: Attempting to stop 'ora.crf' on 'xifenf1'
CRS-2677: Stop of 'ora.storage' on 'xifenf1' succeeded
CRS-2673: Attempting to stop 'ora.evmd' on 'xifenf1'
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'xifenf1'
CRS-2677: Stop of 'ora.mdnsd' on 'xifenf1' succeeded
CRS-2677: Stop of 'ora.crf' on 'xifenf1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'xifenf1' succeeded
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'xifenf1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'xifenf1'
CRS-2677: Stop of 'ora.cssd' on 'xifenf1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'xifenf1'
CRS-2673: Attempting to stop 'ora.gipcd' on 'xifenf1'
CRS-2677: Stop of 'ora.gpnpd' on 'xifenf1' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'xifenf1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'xifenf1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@xifenf1 ~]# /u01/app/19.0/grid/bin/crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@xifenf1 ~]# /u01/app/19.0/grid/bin/crsctl status res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.crf
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.crsd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.cssd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.ctssd
      1        ONLINE  ONLINE       xifenf1                  ACTIVE:35600,STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.evmd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.gipcd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.gpnpd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.mdnsd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.storage
      1        ONLINE  ONLINE       xifenf1                  STABLE
--------------------------------------------------------------------------------

ora.storage无法启动报ORA-12514故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ora.storage无法启动报ORA-12514故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

19.11集群,节点2人工重启之后,crs启动异常

[grid@xff2 ~]$ crsctl status res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       xff2                    STABLE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       xff2                    STABLE
ora.crf
      1        ONLINE  ONLINE       xff2                    STABLE
ora.crsd
      1        ONLINE  OFFLINE                               STABLE
ora.cssd
      1        ONLINE  ONLINE       xff2                    STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       xff2                    STABLE
ora.ctssd
      1        ONLINE  ONLINE       xff2                    OBSERVER,STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.drivers.acfs
      1        ONLINE  ONLINE       xff2                    STABLE
ora.evmd
      1        ONLINE  ONLINE       xff2                    STABLE
ora.gipcd
      1        ONLINE  ONLINE       xff2                    STABLE
ora.gpnpd
      1        ONLINE  ONLINE       xff2                    STABLE
ora.mdnsd
      1        ONLINE  ONLINE       xff2                    STABLE
ora.storage
      1        ONLINE OFFLINE                                STABLE
--------------------------------------------------------------------------------

crs的alert日志显示

2024-03-05 12:46:26.021 [CLSECHO(3653)]ACFS-9327: Verifying ADVM/ACFS devices.
2024-03-05 12:46:26.040 [CLSECHO(3661)]ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'.
2024-03-05 12:46:26.065 [CLSECHO(3673)]ACFS-9156: Detecting control device '/dev/ofsctl'.
2024-03-05 12:46:26.357 [CLSECHO(3703)]ACFS-9294: updating file /etc/sysconfig/oracledrivers.conf
2024-03-05 12:46:26.376 [CLSECHO(3711)]ACFS-9322: completed
2024-03-05 12:46:27.764 [CSSDMONITOR(3855)]CRS-8500: Oracle Clusterware CSSDMONITOR process is starting with operating system process ID 3855
2024-03-05 12:46:27.839 [OSYSMOND(3857)]CRS-8500: Oracle Clusterware OSYSMOND process is starting with operating system process ID 3857
2024-03-05 12:46:28.129 [CSSDAGENT(3890)]CRS-8500: Oracle Clusterware CSSDAGENT process is starting with operating system process ID 3890
2024-03-05 12:46:29.125 [OCSSD(3910)]CRS-8500: Oracle Clusterware OCSSD process is starting with operating system process ID 3910
2024-03-05 12:46:30.187 [OCSSD(3910)]CRS-1713: CSSD daemon is started in hub mode
2024-03-05 12:46:31.428 [OCSSD(3910)]CRS-1707: Lease acquisition for node xff2 number 2 completed
2024-03-05 12:46:32.630 [OCSSD(3910)]CRS-1621: The IPMI configuration data for this node stored in the Oracle registry is incomplete; details at (:CSSNK00002:) in /u01/app/grid/diag/crs/xff2/crs/trace/ocssd.trc
2024-03-05 12:46:32.630 [OCSSD(3910)]CRS-1617: The information required to do node kill for node xff2 is incomplete; details at (:CSSNM00004:) in /u01/app/grid/diag/crs/xff2/crs/trace/ocssd.trc
2024-03-05 12:46:32.638 [OCSSD(3910)]CRS-1605: CSSD voting file is online: /dev/sda1; details in /u01/app/grid/diag/crs/xff2/crs/trace/ocssd.trc.
2024-03-05 12:46:33.546 [OCSSD(3910)]CRS-1601: CSSD Reconfiguration complete. Active nodes are xff1 xff2 .
2024-03-05 12:46:35.405 [OCSSD(3910)]CRS-1720: Cluster Synchronization Services daemon (CSSD) is ready for operation.
2024-03-05 12:46:35.533 [OCTSSD(4138)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 4138
2024-03-05 12:46:36.339 [OCTSSD(4138)]CRS-2403: The Cluster Time Synchronization Service on host xff2 is in observer mode.
2024-03-05 12:46:37.601 [OCTSSD(4138)]CRS-2407: The new Cluster Time Synchronization Service reference node is host xff1.
2024-03-05 12:46:37.601 [OCTSSD(4138)]CRS-2401: The Cluster Time Synchronization Service started on host xff2.
2024-03-05 12:46:54.181 [ORAROOTAGENT(2427)]CRS-5019: All OCR locations are on ASM disk groups [SYSTEMDG], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/u01/app/grid/diag/crs/xff2/crs/trace/ohasd_orarootagent_root.trc".
2024-03-05 12:47:15.209 [OLOGGERD(4553)]CRS-8500: Oracle Clusterware OLOGGERD process is starting with operating system process ID 4553
2024-03-05 12:52:04.581 [CRSCTL(8313)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /u01/app/grid/diag/crs/xff2/crs/trace/crsctl_8313.trc.
2024-03-05 12:56:44.519 [ORAROOTAGENT(2427)]CRS-5818: Aborted command 'start' for resource 'ora.storage'. Details at (:CRSAGF00113:) {0:5:3} in /u01/app/grid/diag/crs/xff2/crs/trace/ohasd_orarootagent_root.trc.
2024-03-05 12:56:44.608 [OHASD(2217)]CRS-2757: Command 'Start' timed out waiting for response from the resource 'ora.storage'. Details at (:CRSPE00221:) {0:5:3} in /u01/app/grid/diag/crs/xff2/crs/trace/ohasd.trc.
2024-03-05 12:56:44.606 [ORAROOTAGENT(2427)]CRS-5017: The resource action "ora.storage start" encountered the following error:
2024-03-05 12:56:44.606+agent's abort action pending. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/xff2/crs/trace/ohasd_orarootagent_root.trc".
2024-03-05 12:57:58.464 [CRSD(11801)]CRS-8500: Oracle Clusterware CRSD process is starting with operating system process ID 11801
2024-03-05 12:58:12.059 [CRSD(11801)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /u01/app/grid/diag/crs/xff2/crs/trace/crsd.trc.

ohasd_orarootagent_root 日志

2024-03-05 12:52:00.769 :  OCRRAW:4255452928: kgfnConnect3: Got a Connection Error when connecting to ASM.

2024-03-05 12:52:00.771 :  OCRRAW:4255452928: kgfnConnect2: failed to connect

2024-03-05 12:52:00.771 :  OCRRAW:4255452928: kgfnConnect2Retry: failed to connect connect after 1 attempts, 124s elapsed

2024-03-05 12:52:00.771 :  OCRRAW:4255452928: kgfo_kge2slos error stack at kgfoAl06: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor


2024-03-05 12:52:00.771 :  OCRRAW:4255452928: -- trace dump on error exit --

2024-03-05 12:52:00.771 :  OCRRAW:4255452928: Error [kgfoAl06] in [kgfokge] at kgfo.c:2176

2024-03-05 12:52:00.771 :  OCRRAW:4255452928: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
ORA-12514: TNS:listener does not currently know of service requested

2024-03-05 12:52:00.771 :  OCRRAW:4255452928: Category: 7

"/u01/app/grid/diag/crs/xff2/crs/trace/crsctl_8313.trc" 208L, 11809C

2024-03-05 12:52:03.543 :  OCRRAW:4255452928: 9379 Error 4 opening dom root in 0xf9afdb79c0

2024-03-05 12:52:03.551 :  OCRRAW:4255452928: kgfnConnect2: kgfnGetBeqData failed

2024-03-05 12:52:03.577 :  OCRRAW:4255452928: kgfnConnect2Int: cstr=(DESCRIPTION=(TCP_USER_TIMEOUT=1)(CONNECT_TIMEOUT=60)(EXPIRE_TIME=1)(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=tcp)(HOST=节点1私网IP)(PORT=1525)))(CONNECT_DATA=(SERVICE_NAME=+ASM)))

2024-03-05 12:52:03.578 :  OCRRAW:4255452928: kgfnConnect2Int: ServerAttach

2024-03-05 12:52:04.579 :  OCRRAW:4255452928: kgfnServerAttachConnErrors: Encountered service based error 12514

2024-03-05 12:52:04.579 :  OCRRAW:4255452928: kgfnRecordErr 12514 OCI error:
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor


2024-03-05 12:52:04.579 :  OCRRAW:4255452928: kgfnConnect3: Got a Connection Error when connecting to ASM.

2024-03-05 12:52:04.581 :  OCRRAW:4255452928: kgfnConnect2: failed to connect

2024-03-05 12:52:04.581 :  OCRRAW:4255452928: kgfnConnect2Retry: failed to connect connect after 1 attempts, 122s elapsed

2024-03-05 12:52:04.581 :  OCRRAW:4255452928: kgfo_kge2slos error stack at kgfoAl06: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor


2024-03-05 12:52:04.581 :  OCRRAW:4255452928: -- trace dump on error exit --

2024-03-05 12:52:04.581 :  OCRRAW:4255452928: Error [kgfoAl06] in [kgfokge] at kgfo.c:3180

2024-03-05 12:52:04.581 :  OCRRAW:4255452928: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
ORA-12514: TNS:listener does not currently know of service requested

2024-03-05 12:52:04.581 :  OCRRAW:4255452928: Category: 7

2024-03-05 12:52:04.581 :  OCRRAW:4255452928: DepInfo: 12514

2024-03-05 12:52:04.581 :  OCRRAW:4255452928: ADR is not properly configured

2024-03-05 12:52:04.581 :  OCRRAW:4255452928: -- trace dump end --

  OCRASM:4255452928: SLOS : SLOS: cat=7, opn=kgfoAl06, dep=12514, loc=kgfokge

2024-03-05 12:52:04.581 :  OCRASM:4255452928: ASM Error Stack : ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

2024-03-05 12:52:04.581 :  OCRASM:4255452928: proprasmo: kgfoCheckMount returned [7]
2024-03-05 12:52:04.581 :  OCRASM:4255452928: proprasmo: The ASM instance is down
2024-03-05 12:52:04.635 :  OCRRAW:4255452928: proprioo: Failed to open [+SYSTEMDG/xff-cluster/OCRFILE/registry.255.1072903025]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
2024-03-05 12:52:04.635 :  OCRRAW:4255452928: proprioo: No OCR/OLR devices are usable
  OCRUTL:4255452928: u_fill_errorbuf: Error Info : [Insufficient quorum to open OCR devices]
 default:4255452928: u_set_gbl_comp_error: comptype '107' : error '0'
2024-03-05 12:52:04.635 :  OCRRAW:4255452928: proprinit: Could not open raw device
2024-03-05 12:52:04.635 : default:4255452928: a_init:7!: Backend init unsuccessful : [26]
2024-03-05 12:52:04.637 : default:4255452928: clsvactversion:4: Retrieving Active Version from local storage.

通过这里,初步判断是由于节点2访问(DESCRIPTION=(TCP_USER_TIMEOUT=1)(CONNECT_TIMEOUT=60)(EXPIRE_TIME=1)(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=tcp)(HOST=节点1私网IP)(PORT=1525)))(CONNECT_DATA=(SERVICE_NAME=+ASM)))异常导致,查看节点1的该监听状态

[grid@xff1 ~]$ lsnrctl status ASMNET1LSNR_ASM

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 05-MAR-2024 13:04:51

Copyright (c) 1991, 2021, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=ASMNET1LSNR_ASM)))
STATUS of the LISTENER
------------------------
Alias                     ASMNET1LSNR_ASM
Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date                20-MAY-2021 23:53:50
Uptime                    25 days 8 hr. 15 min. 15 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/19c/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/xff1/asmnet1lsnr_asm/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=ASMNET1LSNR_ASM)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=节点1私网IP)(PORT=1525)))
The listener supports no services
The command completed successfully

发现该监听没有注册服务进去,检查相关listener参数配置

[grid@xff1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Tue Mar 5 13:26:29 2024
Version 19.11.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.11.0.0.0

SQL> show parameter listener;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
forward_listener                     string
listener_networks                    string
local_listener                       string      
remote_listener                      string

初步判断是由于节点1的ASMNET1LSNR_ASM监听状态异常,很可能是由于asm实例的listener参数异常导致,比较稳妥的解决方案是重启节点1,让其重新生成listener相关参数,实现动态注册,临时解决方法,

[grid@xff1 ~]$ sqlplus / as sysasm

SQL*Plus: Release 19.0.0.0.0 - Production on Tue Mar 5 13:05:11 2024
Version 19.11.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.11.0.0.0

SQL> ALTER SYSTEM SET local_listener ='(ADDRESS=(PROTOCOL=TCP)(HOST=节点1私网IP)(PORT=1525))' sid='+ASM1' SCOPE=MEMORY;

System altered.



[grid@xff1 ~]$ lsnrctl status ASMNET1LSNR_ASM

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 05-MAR-2024 13:05:21

Copyright (c) 1991, 2021, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=ASMNET1LSNR_ASM)))
STATUS of the LISTENER
------------------------
Alias                     ASMNET1LSNR_ASM
Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date                20-MAY-2021 23:53:50
Uptime                    25 days 8 hr. 15 min. 45 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/19c/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/xff1/asmnet1lsnr_asm/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=ASMNET1LSNR_ASM)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=节点1私网IP)(PORT=1525)))
Services Summary...
Service "+ASM" has 1 instance(s).
  Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "+ASM_DATA" has 1 instance(s).
  Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "+ASM_FRA" has 1 instance(s).
  Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "+ASM_SYSTEMDG" has 1 instance(s).
  Instance "+ASM1", status READY, has 1 handler(s) for this service...
The command completed successfully
[grid@xff1 ~]$ 

设置节点1的asm实例的local_listener 参数之后,集群启动成功

[grid@xff2 ~]$ crsctl status res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       xff2                    STABLE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       xff2                    STABLE
ora.crf
      1        ONLINE  ONLINE       xff2                    STABLE
ora.crsd
      1        ONLINE  ONLINE       xff2                    STABLE
ora.cssd
      1        ONLINE  ONLINE       xff2                    STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       xff2                    STABLE
ora.ctssd
      1        ONLINE  ONLINE       xff2                    OBSERVER,STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.drivers.acfs
      1        ONLINE  ONLINE       xff2                    STABLE
ora.evmd
      1        ONLINE  ONLINE       xff2                    STABLE
ora.gipcd
      1        ONLINE  ONLINE       xff2                    STABLE
ora.gpnpd
      1        ONLINE  ONLINE       xff2                    STABLE
ora.mdnsd
      1        ONLINE  ONLINE       xff2                    STABLE
ora.storage
      1        ONLINE  ONLINE       xff2                    STABLE
--------------------------------------------------------------------------------

udev_start导致vip漂移(参见情况:rac在线加盘操作引起)

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:udev_start导致vip漂移(参见情况:rac在线加盘操作引起)

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户对asm进行扩容,执行udev_start命令之后,所有的vip全部漂移,业务全部中断
20230513203654


优先恢复业务,把所有vip漂移回来

[grid@rac3 ~]$  srvctl relocate vip -i rac1 -n rac1 -f -v
VIP was relocated successfully.
[grid@rac3 ~]$  srvctl relocate vip -i rac2 -n rac2 -f -v
VIP was relocated successfully.
[grid@rac3 ~]$  srvctl relocate vip -i rac3 -n rac3 -f -v
VIP was relocated successfully.
[grid@rac3 ~]$  srvctl relocate vip -i rac4 -n rac4 -f -v
VIP was relocated successfully.

vip恢复正常,业务也恢复正常
20230513203712


出现该问题的原因是由于udev_start命令引起网卡瞬间中断,从而使得vip发生漂移
20230513212316

查看ifcfg配置文件
20230513212440

引起该问题的原因是udev对网卡进行了操作,从而引起该问题,处理建议在对应的ifcfg文件中加上 HOTPLUG=”no” (pulbic,private和其他需要关注的网络)
参考:Network interface going down when dynamically adding disks to storage using udev in RHEL 6 (Doc ID 1569028.1)
20230513213713

删除ora.asmgroup资源offline记录

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:删除ora.asmgroup资源offline记录

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

采用了fix asm之后,查看集群状态的时候会有一个ora.asmgroup相关是offline状态,可以通过srvctl modify asm -count 2命令强制把asm count设置为2从而就不会有offline的资源存在

[grid@dbserver1 ~]$ crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.chad
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.net1.network
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.ons
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.proxy_advm
               OFFLINE OFFLINE      dbserver1                STABLE
               OFFLINE OFFLINE      dbserver2                STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
      3        ONLINE  OFFLINE                               STABLE
ora.ASMNET2LSNR_ASM.lsnr(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
      3        ONLINE  OFFLINE                               STABLE
ora.DATA.dg(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.FRA.dg(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.SYSDG.dg(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.asm(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                Started,STABLE
      2        ONLINE  ONLINE       dbserver2                Started,STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.asmnet1.asmnetwork(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.asmnet2.asmnetwork(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.dbserver1.vip
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.dbserver2.vip
      1        ONLINE  ONLINE       dbserver2                STABLE
ora.xff.db
      1        ONLINE  ONLINE       dbserver1                Open,HOME=/u01/app/o
                                                             racle/product/19c/db
                                                             _1,STABLE
      2        ONLINE  ONLINE       dbserver2                Open,HOME=/u01/app/o
                                                             racle/product/19c/db
                                                             _1,STABLE
ora.qosmserver
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       dbserver1                STABLE
--------------------------------------------------------------------------------
[grid@dbserver1 ~]$ srvctl modify asm -count 2
[grid@dbserver1 ~]$ crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.chad
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.net1.network
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.ons
               ONLINE  ONLINE       dbserver1                STABLE
               ONLINE  ONLINE       dbserver2                STABLE
ora.proxy_advm
               OFFLINE OFFLINE      dbserver1                STABLE
               OFFLINE OFFLINE      dbserver2                STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
ora.ASMNET2LSNR_ASM.lsnr(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
ora.DATA.dg(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
ora.FRA.dg(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.SYSDG.dg(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
ora.asm(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                Started,STABLE
      2        ONLINE  ONLINE       dbserver2                Started,STABLE
ora.asmnet1.asmnetwork(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
ora.asmnet2.asmnetwork(ora.asmgroup)
      1        ONLINE  ONLINE       dbserver1                STABLE
      2        ONLINE  ONLINE       dbserver2                STABLE
ora.cvu
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.dbserver1.vip
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.dbserver2.vip
      1        ONLINE  ONLINE       dbserver2                STABLE
ora.xff.db
      1        ONLINE  ONLINE       dbserver1                Open,HOME=/u01/app/o
                                                             racle/product/19c/db
                                                             _1,STABLE
      2        ONLINE  ONLINE       dbserver2                Open,HOME=/u01/app/o
                                                             racle/product/19c/db
                                                             _1,STABLE
ora.qosmserver
      1        ONLINE  ONLINE       dbserver1                STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       dbserver1                STABLE
--------------------------------------------------------------------------------
[grid@dbserver1 ~]$ 

网卡异常导致数据库实例启动异常

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:网卡异常导致数据库实例启动异常

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一套集群,一个节点启动正常,另外一个节点无法正常启动实例,启动异常节点alert日志

Tue Mar 07 19:07:29 2023
IPC Send timeout detected. Receiver ospid 6386 [
Tue Mar 07 19:07:29 2023
Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_lms0_6386.trc:
IPC Send timeout detected. Receiver ospid 6402 [
Tue Mar 07 19:07:29 2023
Errors in file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_lms4_6402.trc:
Tue Mar 07 19:07:29 2023
Received an instance abort message from instance 1
Please check instance 1 alert and LMON trace files for detail.
System state dump requested by (instance=2, osid=6384 (LMD0)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_diag_6374_20230307190729.trc
LMD0 (ospid: 6384): terminating the instance due to error 481
Dumping diagnostic data in directory=[cdmp_20230307190729],
      requested by (instance=2, osid=6384 (LMD0)), summary=[abnormal instance termination].
Instance terminated by LMD0, pid = 6384

正常节点alert日志

Tue Mar 07 19:02:07 2023
Reconfiguration started (old inc 20, new inc 22)
List of instances:
 1 2 (myinst: 1)
 Global Resource Directory frozen
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
Tue Mar 07 19:02:08 2023
 LMS 5: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Mar 07 19:02:08 2023
 LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Mar 07 19:02:08 2023
 LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Mar 07 19:02:08 2023
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Mar 07 19:02:08 2023
 LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Mar 07 19:02:08 2023
 LMS 7: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Mar 07 19:02:08 2023
Tue Mar 07 19:02:08 2023
 LMS 4: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 LMS 6: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Submitted all GCS remote-cache requests
 Fix write in gcs resources
Tue Mar 07 19:02:27 2023
IPC Send timeout detected. Sender: ospid 6936 [oracle@xffnode1.localdomain (PING)]
Receiver: inst 2 binc 441249706 ospid 59731
Tue Mar 07 19:07:29 2023
IPC Send timeout detected. Sender: ospid 6946 [oracle@xffnode1.localdomain (LMS0)]
Receiver: inst 2 binc 429479852 ospid 6386
Tue Mar 07 19:07:29 2023
IPC Send timeout detected. Sender: ospid 6962 [oracle@xffnode1.localdomain (LMS4)]
Receiver: inst 2 binc 429479854 ospid 6402
Tue Mar 07 19:07:29 2023
IPC Send timeout detected. Sender: ospid 6966 [oracle@xffnode1.localdomain (LMS5)]

通过上述日志,可以确认主要由于两个节点之间无法正常通讯,从而使得新节点无法加入到集群(无法完成集群重组),从而使得实例启动异常.一般出现这类情况最检查的就是私网异常,通过分析oswnetstat记录发现packet reassembles failed特别严重
20230307230341


一般出现该问题,考虑是由于ipfrag_*_thresh默认值不足导致,通过设置

net.ipv4.ipfrag_high_thresh = 16777216
net.ipv4.ipfrag_low_thresh = 15728640

临时请库成功,但是数据库实例重组时间依旧过长
20230307230658


packet reassembles failed依旧在增加,通过分析网卡情况发现网卡异常,采用haip(双万兆网卡)的其中一块网卡异常
20230307230813

为了数据库性能不收太大影响,临时禁用异常网卡,重启库正常
20230308001033

后续等网络层面解决之后再启用该网卡

11.2 crs启动超时dd npohasd 处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:11.2 crs启动超时dd npohasd 处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户由于光纤链路故障导致表决盘异常从而使得主机重启,主机重启之后,集群没有正常启动
操作系统和crs版本

[root@rac1 ~]# cat /etc/redhat-release 
CentOS release 6.9 (Final)
[root@rac1 ~]# sqlplus -v

SQL*Plus: Release 11.2.0.4.0 Production

人工启动crs hang住一段时间然后报错

[root@rac1 ~]# crsctl start crs
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.

查看启动进程

[grid@rac1 ~]$ ps -ef|grep d.bin
root       7043      1  0 11:48 ?        00:00:00 /u01/app/grid/product/11.2.0/bin/ohasd.bin reboot
root       8311      1  0 11:53 ?        00:00:00 /u01/app/grid/product/11.2.0/bin/ohasd.bin reboot
grid      10984  10954  0 12:10 pts/2    00:00:00 grep d.bin

根据经验这个故障很可能就是BUG:17229230 – DURING REBOOT, “OHASD.BIN REBOOT” REMAINS SLEEPING,临时解决方案,一个会话启动crs,然后在另外一个会话发起

/bin/dd if=/var/tmp/.oracle/npohasd of=/dev/null bs=1024 count=1

后续crs启动正常

[root@rac1 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@rac1 ~]# crsctl status res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                               Instance Shutdown   
ora.cluster_interconnect.haip
      1        ONLINE  OFFLINE                                                   
ora.crf
      1        ONLINE  ONLINE       rac1                                         
ora.crsd
      1        ONLINE  OFFLINE                                                   
ora.cssd
      1        ONLINE  OFFLINE                               STARTING            
ora.cssdmonitor
      1        ONLINE  ONLINE       rac1                                         
ora.ctssd
      1        ONLINE  OFFLINE                                                   
ora.diskmon
      1        OFFLINE OFFLINE                                                   
ora.evmd
      1        ONLINE  OFFLINE                                                   
ora.gipcd
      1        ONLINE  ONLINE       rac1                                         
ora.gpnpd
      1        ONLINE  ONLINE       rac1                                         
ora.mdnsd
      1        ONLINE  ONLINE       rac1                                         

终止dd命令,集群启动正常