如何修改集群的公网信息(包括 VIP) (Doc ID 1674442.1)

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:如何修改集群的公网信息(包括 VIP) (Doc ID 1674442.1)

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户端使用 VIP(虚拟 IP)连接 Oracle 数据库的版本为 10g 和 11g 的集群环境。这些虚拟 IP 是和虚拟主机名对应的静态 IP 地址并且通过 DNS 解析(除非您使用了 11gR2 GNS)。

在安装 Oracle 集群管理软件时,用户会被要求为集群中的每一个节点输入一个虚拟 IP 和虚拟主机名。这些信息会被记录在 OCR (Oracle Cluster Registry)中,而且 HA 框架中的不同组件会依赖于这些 VIP。如果出于某种原因,需要修改 VIP、VIP 对应的主机名或者公网的子网、网络掩码等信息,请按照本文介绍的过程。

如果修改涉及到集群私网,请参考 Note 283684.1

情况1.   修改公网对应的主机名

集群公网对应的主机名是在安装时输入的,并且被记录在 OCR 中。这个主机名在安装之后是不能修改的。唯一的修改方法是删除节点,修改主机名,之后将节点重新添加到集群,或者直接重新安装集群软件,并完成后续的克隆配置。

 

情况2.  只修改公网 IP或者VIP, 但是不修改网卡、子网或网络掩码信息,或者只是修改MAC地址,而不需要修改其他信息

如果只需要修改公网 IP 地址或者VIP,而且新的地址仍然在相同的子网和相同的网络接口上,或者只是修改公网IP的MAC地址,IP/interface/subnet/netmask仍旧保持不变,集群层面不需要做任何修改,所有需要的修改是在 OS 层面反映 IP 地址的变化。

1. 关闭 Oracle 集群管理软件
2. 在网络层面,DNS 和 /etc/hosts 文件中修改 IP 地址,或者直接修改MAC地址
3. 重新启动 Oracle 集群管理软件

以上的修改可以使用滚动的方式完成,例如:每次修改一个节点。

 

情况3. 修改公网网卡,子网或网络掩码信息

如果修改涉及到了不同的子网(网络掩码)或者网卡,需要将 OCR 中已经存在的网卡信息删除并重新添加新的信息。

作为grid用户:

在以下的示例中子网从 10.2.156.0 被修改为 10.2.166.0,需要执行两个步骤 –首先‘delif’,接下来 ‘setif’:

% $CRS_HOME/bin/oifcfg/oifcfg delif -global <if_name>[/<subnet>]
% $CRS_HOME/bin/oifcfg/oifcfg setif -global <if_name>/<subnet>:public

例如:
% $CRS_HOME/bin/oifcfg delif -global eth0/10.X.156.0
% $CRS_HOME/bin/oifcfg setif -global eth0/10.X.166.0:public

然后,在操作系统层面进行修改。除非 OS 层面的修改需要重新启动节点,否则不需要重启 Oracle 集群管理软件。修改可以使用滚动的方式完成。

一旦公网信息被修改,与其相关的 VIP 和 SCAN VIP 也需要修改,请参考情况4和情况5。

注意:对于11gR2,上面命令要求集群在所有节点运行,否则报错PRIF-33 和 PRIF-32,比如:
[grid@racnode1 bin]$ ./oifcfg delif -global <if_name>/192.168.1.0
PRIF-33: Failed to set or delete interface because hosts could not be discovered
CRS-02307: No GPnP services on requested remote hosts.
PRIF-32: Error in checking for profile availability for host <nodename>2
CRS-02306: GPnP service on host “<nodename>2″ not found.

 

情况4. 修改 VIP 相关的公网信息

准备修改VIP

一般而言,只有 10.2.0.3 之前的版本需要完全的停机。从 10.2.0.3 开始,ASM 和数据库实例对 VIP 资源的依赖关系已经被删除,所以修改 VIP 不需要停止 ASM 和数据库实例,只有当修改 VIP 时产生的客户端连接会受影响。如果修改只涉及到特定的节点,那么只有连接到对应节点的客户端链接在修改时会受影响。

首先,请参考情况3以确保公网信息被修改。如果在 OS 层面的网络修改后发生了节点或者集群管理软件重启,VIP 将不会被启动,请跳到步骤“修改 VIP 和相关属性”。

获得当前的 VIP 配置

1. 获取当前设置
对于版本 10g 和 11gR1, 使用 Oracle 集群管理软件的拥有者执行下面的命令:

$ srvctl config nodeapps -n <nodename> -a

例如:
$ srvctl config nodeapps -n <nodename>1 -a
VIP exists.: /<nodename>1-vip/101.XX.XX.184/255.255.254.0/<if_name>
对于版本 11gR2, 使用 Grid Infrastructure 的拥有者执行下面的命令:

$ srvctl config nodeapps -a

例如:
$ srvctl config nodeapps -a
Network exists: 1/101.17.80.0/255.255.254.0/<if_name>, type static
VIP exists: /racnode1-vip/101.17.XX.184/101.17.80.0/255.255.254.0/<if_name>, hosting node <nodename>1
VIP exists: /racnode2-vip/101.17.XX.186/101.17.80.0/255.255.254.0/<if_name>, hosting node <nodename>2
2. 验证 VIP 状态

版本 10.2 和 11.1:
$ crs_stat -t

版本 11.2:
$ crsctl stat res -t
- 以上命令应该显示 VIPs 状态为 ONLINE

$ ifconfig -a
(HP 平台请使用 netstat –in, Windows 平台请使用 ipconfig /all)
- VIP 逻辑网卡对应公网网卡

 

停止资源

3. 停止 nodeapps 资源 (如果有必要的话,停止存在依赖关系的 ASM 和数据库资源):

对于版本 10g 和 11gR1, 使用 Oracle 集群管理软件的拥有者执行下面的命令:

$ srvctl stop instance -d <db_name> -i <inst_name>   (对于 10.2.0.3 及以上版本,可以忽略)
$ srvctl stop asm -n <node_name>                     (对于 10.2.0.3 及以上版本,可以忽略)
$ srvctl stop nodeapps -n <node_name>

例如:
$ srvctl stop instance -d <DBNAME> -i <INSTANCENAME>1
$ srvctl stop asm -n <nodename>1
$ srvctl stop nodeapps -n <nodename>1
对于版本 11gR2, 使用 Grid Infrastructure 的拥有者执行下面的命令:

$ srvctl stop instance -d <db_name> -n <node_name>   (可以忽略)
$ srvctl stop vip -n <node_name> -f

例如:
$ srvctl stop instance -d <DBNAME> -n <nodename>1
$ srvctl stop vip -n <nodename>1 -f

 

注意1: 对于版本 11gR2,需要使用 -f 选项停止 listener 资源,否则会报以下错误:
PRCR-1014 : Failed to stop resource ora.<nodename>1.vip
PRCR-1065 : Failed to stop resource ora.<nodename>1.vip
CRS-2529: Unable to act on ‘ora.<nodename>1.vip’ because that would require stopping or relocating ‘ora.LISTENER.lsnr’, but the force option was not specified

4. 验证 VIP 现在处于 OFFLINE 状态,并且 VIP 不再绑定到公网网卡

$ crs_stat -t (对于版本 11gR2,使用命令 $ crsctl stat res –t)

$ ifconfig -a
(HP 平台请使用 netstat –in, Windows 平台请使用 ipconfig /all)

 

修改 VIP 和相关属性

5. 确定新的 VIP 地址/子网/网络掩码或者 VIP 对应的主机名,在 OS 层面修改网络配置信息,确认新的 VIP 地址应经注册到 DNS 或者确认 /etc/hosts 文件(Unix/Linux 平台),\WINDOWS\System32\drivers\etc\hosts 文件(Windows平台)已经被修改。如果网卡信息被修改,确认在修改之前新的网卡在服务器上已经可用。

例如:
新VIP 地址:110.XX.XX.11 <nodename>1-nvip
新子网信息:110.11.70.0
新网络掩码:255.255.255.0
新网卡:<if_name>
6. 使用 root 用户修改 VIP 资源:

如果子网或网卡接口发生变化,请修改网络资源。
检查文档的语法是否正确,因为每个“srvctl modify network”命令选项可能不同。

srvctl modify network [-netnum network_number] [-subnet subnet/netmask
[/if1[|if2|...]]]

通过发出获取network_number

srvctl config network
# srvctl modify nodeapps -n <node> -A <new_vip_address or new_vip_hostname>/<netmask>/<[if1[if2...]]>

例如:
# srvctl modify nodeapps -n <nodename>1 -A <nodename>1-nvip/255.255.255.0/<if_name>

 

注意 1:从版本 11.2 开始,VIP 依赖于 network 资源(ora.net1.network),OCR 只记录 VIP 主机名或者 VIP 资源相关的 IP 地址。集群公网的属性(子网/网络掩码)通过网络资源记录。当 nodeapps 资源被修改后,network资源(ora.net1.network)相关的属性也会随之被修改。

从 11.2.0.2 开始,如果只修改子网/网络掩码信息,网络资源可以通过以下的 srvctl modify network 命令直接修改。

使用 root 用户:
# srvctl modify network -k <network_number>] [-S <subnet>/<netmask>[/if1[|if2...]]

例如:
# srvctl modify network -k 1 -S 110.XX.XX.0/255.255.255.0/<if_name>

如果其他属性没有变化,不需要修改 VIP 或 SCAN VIP。

注意 2:在12.1.0.1的版本上由于Bug 16608577 – CANNOT ADD SECOND PUBLIC INTERFACE IN ORACLE 12.1 ,srvctl modify network 的命令会失败并提示:

# srvctl modify network -k 1 -S 110.XX.XX.0/255.255.255.0/<if_name>
PRCT-1305 : The specified interface name “<if_name>2″ does not match the existing network interface name “<if_name>1″

需要通过以下workaround来解决:

# srvctl modify network -k 1 -S 110.XX.XX.0/255.255.255.0
# srvctl modify network -k 1 -S 110.XX.XX.0/255.255.255.0/<if_name>2

 

* 一个 11gR2 修改 VIP 主机名,但是不修改 IP 地址的例子。

例如:只把 VIP 主机名从 <nodename>1-vip 修改为 <nodename>1-nvip,IP 地址和其他属性保持不变。

如果 IP 地址保持不变,以上的命令将不会改变命令‘crsctl stat res ora.<nodename>1.vip -p’的输出中项目 USR_ORA_VIP 的值。请使用下面的命令:
# crsctl modify res ora.<nodename>1.vip -attr USR_ORA_VIP=<nodename>1-nvip

验证项目 USR_ORA_VIP 的改变:
# crsctl stat res ora.<nodename>1.vip -p |grep USR_ORA_VIP

 

注意:对于 Windows 平台,如果网卡名中包含了空格,那么网卡名需要包含在双引号(“)中。例如:
使用管理员用户或者软件安装用户:
> srvctl modify nodeapps -n <nodename>1 -A 110.XX.XX.11/255.255.255.0/”Local Area Connection 1″
7. 验证改变

$ srvctl config nodeapps -n <node> -a (10g and 11gR1)
$ srvctl config nodeapps -a (11gR2)

例如:
$ srvctl config nodeapps -n <nodename>1 -a
VIP exists.: /<nodename>1-nvip/110.11.70.11/255.255.255.0/<if_name>2

8. fpp 或 rhp 等其他工具使用此文件,需要保持最新。

允许编辑网络资源。

cd $GI_HOME/crs/install and backup the crsconfig_params file
cp $GI_HOME/crs/install/crsconfig_params $GI_HOME/crs/install/crsconfig_params.orig

然后以root用户编辑文件

vi $GI_HOME/crs/install/crsconfig_params

仅更改 CRS_NODEVIP、NETWORKS 和 NEW_NODEVIPS 的地址和子网地址

保存文件。

 

重新启动资源

9. 启动 nodeapps 和其它资源

对于版本 10g 和 11gR1, 使用 Oracle 集群管理软件的拥有者执行下面的命令:

$ srvctl start nodeapps -n <node_name>
$ srvctl start asm -n <node_name>               (对于 10.2.0.3 及以上版本,可以忽略)
$ srvctl start instance -d <dbanme> -i <inst>   (对于 10.2.0.3 及以上版本,可以忽略)

例如:
$ srvctl start nodeapps -n <nodename>1
$ srvctl start asm -n <nodename>1
$ srvctl start instance -d <DBNAME> -i <INSTANCE_NAME>1

对于版本 11gR2, 使用 Grid Infrastructure 的拥有者执行下面的命令:

$ srvctl start vip -n <node_name>
$ srvctl start listener -n <node_name>
$ srvctl start instance -d <db_name> -n <node_name> (可以忽略)

例如:

$ srvctl start vip -n <nodename>1
$ srvctl start listener -n <nodename>1
$ srvctl start instance -d <DBNAME> -n <nodename>1

注意:如果网络的属性做了修改,比如netmask 等做了修改,需要重新启动nodeapps

 
10. 验证新的 VIP 状态为 ONLINE 并且已经绑定到集群公网网卡

$ crs_stat -t (对于版本 11gR2,使用命令 $ crsctl stat res –t)

$ ifconfig -a
(HP 平台请使用 netstat –in, Windows 平台请使用 ipconfig /all)
11. 如果集群中的其它节点也需要类似的改变,请重复同样的步骤。

其它

12. 如果需要,修改 listener.ora, tnsnames.ora 和 LOCAL_LISTENER/REMOTE_LISTENER 参数反应 VIP 的改变。

注意: ASM和DB实例的LOCAL_LISTENER参数,是GI自动设置的。VIP的改变,LOCAL_LISTENER会自动识别,并生效。但是由于Bug 22824602,一些特定情况下。 LOCAL_LISTENER参数没有反应 VIP 的改变。workaround解决方法是重启受影响的节点的集群软件。

 

情况5:修改 SCAN VIP 相关的公网信息

对于 11gR2 Grid Infrastructure,客户端可以通过 SCAN 和 SCAN VIP 连接数据库。请参考下面的 Note 修改 SCAN VIP。

Note 952903.1 How to update the IP address of the SCAN VIP resources (ora.scan<n>.vip)
Note 972500.1 How to Modify SCAN Setting or SCAN Listener Port after Installation

 

如何在 oracle 集群环境下修改私网信息 (Doc ID 2103317.1)

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:如何在 oracle 集群环境下修改私网信息 (Doc ID 2103317.1)

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

Oracle 集群中的网络信息(接口,子网及每个网卡的角色)都可以被’oifcfg’ 命令管理, 但是没有网卡的IP地址除外,oifcfg 命令不能修改IP地址信息. ‘oifcfg getif’ 命令可以用来显示OCR中当前网卡的配置信息:

% $CRS_HOME/bin/oifcfg getif
<interfacename>0 10.X.XXX.0 global public
<interfacename>1 192.XXX.X.0 global cluster_interconnect

在 Unix/Linux 系统中,网卡名字是被系统自动分配的,依据系统平台的不同而不同。对于 windows 系统,请参考下面的附带的文档. 上面的例子显示当前网卡 <interfacename>0 被用作公网并且子网为 10.2.156.0 <interfacename>1 被用作集群私网,子网为192.168.0.0。

‘公有’ 网络是服务器与客户端之间的通信(与 VIP 使用相同的网段并以不同的记录存储在 OCR 中),与之对应的’cluster_interconnect’网络是用来在 RDBMS/ASM 节点间缓存融合。从 11gR2 开始,cluster_interconnect 同时被用作集群间的心跳,相对于 11gR2 以前的版本,当配置集群心跳信息时指定主机名而言,这是一个标志性的改变。

如果私有网卡的子网或接口名字配置不正确,我们需要使用 crs/grid 用户来更改。

 

例1: 更改私有主机名

在 11.2 oracle clusterware 之前的版本,私有主机名被记录在 OCR 中, 它不能被更改,一般情况下私有主机名是不需要改变的,它附属的 ip 可以被更改,只有使用删除/添加节点或重新安装 oracle clusterware 来更改私有主机名。

在 11.2 Grid 结构中,私有主机名不在记录在 OCR 中,并且不存在依赖关系,所以它可以在 /etc/hosts 文件中任意更改。

 

例2:只更改私有 ip 地址不更改网卡、子网及子网掩码信息

举例,私有 ip 地址从 192.XXX.X.10 更改至 192.XXX.X.21,网卡名字及子网保持不变。或者只改变MAC地址,保持private IP address/interface name/subnet/network不变

只要关闭需要更改主机上的 oracle clusterware 软件,在操作系统层,根据需求更改私有 ip 地址或者MAC地址(如:/etc/hosts,OS network config 等等), 再重启启动 oracle clusterware 软件即可。

 

例3:只改变私有网络的 MTU 值

举例, 将私有网络 MTU 值从1500更改至9000(激活 jumbo frame),网卡名字保持不变。

1. 关闭集群中的所有节点。
2. 在操作系统层更改 MTU 需要设定的值,确保更改后 MTU 值的私有网卡可用并且可以 ping 通集群中的所有节点。
3. 重启所有节点的集群管理软件。

 

例4:更改私有网卡名字,子网及掩码

提示:当子网掩码被更改,但是子网标识没有改变时,如:
子网掩码从 255.255.0.0 更改至 255.255.255.0,私网 ip 为 192.168.0.x,子网标识保持不变 192.168.0.0,网卡名字没有改变.关闭所有需要更改的主机 oracle clusterware,在操作系统层修改私有网络IP地址(如:操作系统 网络配置等等)。并重启集群中所有节点,请注意,这种更改是不能采用轮转方式(rolling manaer)完成的。

当子网掩码被改变,附属的子网标识也经常会被改变,oracle 在 OCR 中只存储网卡名字及子网标识的信息,而不存储子网掩码。可以使用 oifcfg 命令完成这样的变更,oifcfg 命令只需在集群中的一个节点执行,而不是所有节点。

 

A. 对于 11gR2 以前的集群管理软件

1. 使用 oifcfg 命令添加新的私有网络信息,删除旧的私有网络信息:

% $ORA_CRS_HOME/bin/oifcfg/oifcfg setif -global <if_name>/<subnet>:cluster_interconnect
% $ORA_CRS_HOME/bin/oifcfg/oifcfg delif -global <if_name>[/<subnet>]]

举例:
% $ORA_CRS_HOME/bin/oifcfg setif -global <interfacename>3/192.168.2.0:cluster_interconnect
% $ORA_CRS_HOME/bin/oifcfg delif -global <interfacename>1/192.168.1.0

校验结果
% $ORA_CRS_HOME/bin/oifcfg getif
eth0 10.X.XXX.0 global public
eth3 192.XXX.2.0 global cluster_interconnect

2. 关闭 Oracle Clusterware

使用 root 用户执行: # crsctl stop crs

3. 在操作系统层面更改网络配置,修改集群内所有节点的 /etc/hosts 文件,确保集群内所有节点新的网络设置都已生效:

% ping <private hostname/IP>
% ifconfig -a  on Unix/Linux

% ipconfig /all on windows

4. 重新启动 Oracle Clusterware

以 root 用户:# crsctl start crs

提示:如果在 linux 系统上正在运行 OCFS2,则可能还需要更改 OCFS2 运行在其它节点的私有 ip 地址. 更多详细的信息,请参考: Note 604958.1

 

B. 对于 11gR2 和 12c 上没有使用 flex ASM 的版本

针对于 11.2 的结构,私有网络配置信息不但保存在 OCR 中,而且还保存在 gpnp 属性文件中。如果私有网络不可用或定义错误,则 CRSD 进程将无法启动,任何随后对于 OCR 的改变都是不可能完成的,因此需要注意当对私有网络的配置信息进行修改,正确的改变顺序是非常重要的。同时请注意,手动修改 gpnp 属性文件是不支持的。

在对集群中所有节点操作之前,请先备份 profile.xml 配置文件。作为 grid 用户执行:
$ cd $GRID_HOME/gpnp/<hostname>/profiles/peer/
$ cp -p profile.xml profile.xml.bk
1. 确保集群中的所有节点都已启动并正常运行

2. 使用 grid 用户:

获取下面信息, 例如:

$ oifcfg getif
<interfacename>1 100.17.10.0 global public
<interfacename>0 192.168.0.0 global cluster_interconnect
加入新的集群私网通讯信息:

$ oifcfg setif -global <interface>/<subnet>:cluster_interconnect

例如:
a. 加入新的并有相同子网的接口卡 bond0
$ oifcfg setif -global bond0/192.168.0.0:cluster_interconnect

b. add a new subnet with the same interface name but different subnet or new interface name
$ 添加一个新的子网具有相同网卡的名称但不同的子网或新的网卡名

$ oifcfg setif -global eth3/192.168.1.96:cluster_interconnect

 

1. 如果网卡不可用,需要使用 –global 选项来完成,而不能使用 –node 选项,它将导致节点被驱逐。

2. 如果网卡在服务器上可用,则可以使用下面命令识别子网地址:
$ oifcfg iflist

它列出了网卡及子网地址,即使 oracle 集群没有启动,此命令也可以被执行。请注意,子网掩码有可能不是 x.y.z.0 的格式,它可以是 x.y.z.24,x.y.z.64 或 x.y.z.128 等格式。如:
$ oifcfg iflist
lan1 18.1.2.0
lan2 10.2.3.64        << 这是一个私有网络子网地址附属的私有网络 ip 地址为 10.2.3.XX

3. 如果需要添加第二个私有网络,而不是替换现有的私有网络,则需要保证两个网卡的 MTU 值相同,否则实例将无法启动并报如下错误信息:
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if MTU failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpcini2
ORA-27303: additional information: requested interface lan1:801 has a different MTU (1500) than lan3:801 (9000), which is not supported. Check output from ifconfig command

4. 对于 11gR2 或更高版本, 不建议在 ASM 或 database 的 spfile 或 pfile 中设置 cluster interconnects 参数。无论什么原因如果设置了该参数,则需要在集群关闭之前需将新的私网 ip 地址设置在 spfile 或 pfile 中,否则它会由于私网信息不匹配而导致重启失败。
校验更改后的值:

$ oifcfg getif
3. 使用 root 用户关闭集群中所有的节点并禁用集群:

# crsctl stop crs
# crsctl disable crs
4. 使网络配置信息都已在 OS 层更改完成,确保更改完成后新的接口在所有的节点都可用有效:

$ ifconfig -a
$ ping <private hostname>
5. 使用 root 用户激活 oracle 集群并重新启动集群中的所有节点:

# crsctl enable crs
# crsctl start crs
6. 如果需要去除旧接口卡信息:

$ oifcfg delif -global <interface>[/<subnet>]
例如:
$ oifcfg delif -global eth0/192.168.0.0

 

C. 对于 12C和18C flex ASM 结构

请检查上面部分B,并关注提示部分,按下面命令做备份:

在对集群中所有节点操作之前,请先备份 profile.xml 配置文件。 作为 grid 用户执行:
$ cd $GRID_HOME/gpnp/<hostname>/profiles/peer/
$ cp -p profile.xml profile.xml.bk

1. 确保 oracle 集群中的所有节点都已正常运行。

2. 使用 grid 用户:

得到现有信息,如下:

$ oifcfg getif
<interfacename>1 100.17.10.0 global public
<interfacename>0 192.168.0.0 global cluster_interconnect,asm

上面例子显示网卡 ech0 被用作集群私网和 ASM 网络。

加入新的集群私网信息:

$ oifcfg setif -global <interface>/<subnet>:cluster_interconnect[,asm]

如:
a. 加入一个新的具有相同子网网卡 bond0
$ oifcfg setif -global bond0/192.168.0.0:cluster_interconnect,asm

b. 加入一个新的并具有相同网卡名字的子网,或不同子网和具有新的接口名字
$ oifcfg setif -global eth0/192.68.10.0:cluster_interconnect,asm

$ oifcfg setif -global eth3/192.168.1.96:cluster_interconnect,asm

如果有不同的网络用于私有网络和 ASM 网络,则可以对其进行相应的调整。

3. 当 ASMLISTENER 正被用作私有网络,如果对其修改则会影响 ASMLISTENER。需要添加一个新的 ASMLISTENER 及新的网络配置。如果 ASM 的子网网络没有改变则跳过这一步。

3.1. 加入一个新的 ASMLISTENE(例:ASMNEWLISNR_ASM)及新的子网,使用 grid 用户:

$ srvctl add  listener -asmlistener -l <new ASM LISTENER NAME> -subnet <new subnet>

如:
$ srvctl add listener -asmlistener -l ASMNEWLSNR_ASM -subnet 192.168.10.0

3.2. 删除现有的 ASMLISTENER(这个例子中 ASMLSNR_ASM)并去除依赖关系,使用 grid 用户:

$ srvctl update listener -listener ASMLSNR_ASM -asm -remove -force
$ lsnrctl stop ASMLSNR_ASM
应当显示:  TNS-01101: Could not find listener name or service name ASMLSNR_ASM

 

注意. 需要使用 –force 选项,否则会出现下面错误:

$ srvctl update listener -listener ASMLSNR_ASM -asm -remove
PRCR-1025 : Resource ora.ASMLSNR_ASM.lsnr is still running
$ srvctl stop listener -l ASMLSNR_ASM
PRCR-1065 : Failed to stop resource ora.ASMLSNR_ASM.lsnr
CRS-2529: Unable to act on ‘ora.ASMLSNR_ASM.lsnr’ because that would require stopping or relocating ‘ora.asm’, but the force option was not specified
3.3 校验配置信息:

$ srvctl config listener -asmlistener
$ srvctl config asm
4. 使用 root 用户关闭集群中的所有节点并禁用集群:

# crsctl stop crs
# crsctl disable crs

5. 在操作系统层面更改网络配置,更改之后,确保所有节点上的新网卡生效:

$ ifconfig -a
$ ping <private hostname>

6. 使用 root 用户激活 oracle 集群并重新启动集群中的所有节点:

# crsctl enable crs
# crsctl start crs

7. 删除旧的网卡信息:

$ oifcfg delif -global <interface>[/<subnet>]
如:
$ oifcfg delif -global <interfacename>0/192.168.0.0

 

 

关于 19.X 的一些注意事项

在19c上存在一个新的资源 ora.asmnet1.asmnetwork 也需要更新

srvctl remove asmnetwork -netnum 1
srvctl add asmnetwork -netnum 1 -subnet XXX

在删除资源时可能碰到下面的错误

$ srvctl remove asmnetwork -netnum 1

PRCR-1025 : Resource ora.asmnet1.asmnetwork is still running

或者

$ srvctl remove asmnetwork -netnum 1

PRCR-1028 : Failed to remove resource ora.asmnet1.asmnetwork
PRCR-1072 : Failed to unregister ora.asmnet1.asmnetwork
CRS-2730: Resource ‘ora.ASMNET1LSNR_ASM.lsnr’ depends on resource ‘ora.asmnet1.asm

要避免这样的错误,可以在删除时加上-force选项

srvctl remove asmnetwork -netnum 1 -force

 

关于 11gR2 的一些注意事项
1. 如果底层网络配置已经更改, 但是 oifcfg 尚未执行同样的变更,则重启 oracle 集群会导致 crsd 进程不能启动。

crsd.log 日志将会显示如下:

2010-01-30 09:22:47.234: [ default][2926461424] CRS Daemon Starting
..
2010-01-30 09:22:47.273: [ GPnP][2926461424]clsgpnp_Init: [at clsgpnp0.c:837] GPnP client pid=7153, tl=3, f=0
2010-01-30 09:22:47.282: [ OCRAPI][2926461424]clsu_get_private_ip_addresses: no ip addresses found.
2010-01-30 09:22:47.282: [GIPCXCPT][2926461424] gipcShutdownF: skipping shutdown, count 2, from [ clsinet.c : 1732], ret gipcretSuccess (0)
2010-01-30 09:22:47.283: [GIPCXCPT][2926461424] gipcShutdownF: skipping shutdown, count 1, from [ clsgpnp0.c : 1021], ret gipcretSuccess (0)
[ OCRAPI][2926461424]a_init_clsss: failed to call clsu_get_private_ip_addr (7)
2010-01-30 09:22:47.285: [ OCRAPI][2926461424]a_init:13!: Clusterware init unsuccessful : [44]
2010-01-30 09:22:47.285: [ CRSOCR][2926461424] OCR context init failure. Error: PROC-44: Error in network address and interface operations Network address and interface operations error [7]
2010-01-30 09:22:47.285: [ CRSD][2926461424][PANIC] CRSD exiting: Could not init OCR, code: 44
2010-01-30 09:22:47.285: [ CRSD][2926461424] Done.

以上错误显示操作系统层面的设置(oifcfg iflist)与 gpnp profile.xml 配置文件设置不匹配。

解决方法:恢复操作系统网络配置到最初的状态,启动 oracle 集群,然后再按照上面的步骤重新更改。

如果底层的网络并没有改变,但 oifcfg 已经被设置了一个错误的子网地址或接口名字,则会发生同样的问题。

2. 如果集群中的任何一个节点关闭,oifcfg 命令将会失败并显示错误:

$ oifcfg setif -global bond0/192.168.0.0:cluster_interconnect
PRIF-26: Error in update the profiles in the cluster

解决方案:启动 oracle 集群中没有运行的节点,确保集群中所有的节点都已启动,如果由于操作系统原因不能启动的节点,请先将此节点从集群中删除在执行私网网络变更。

3. 如果执行上面命令的的用户非 GI 的拥有者,则会出现相同的错误:

$ oifcfg setif -global bond0/192.168.0.0:cluster_interconnect
PRIF-26: Error in update the profiles in the cluster

解决方案:确保使用 GI 的拥有者登录并执行上面命令。

4. 从 11.2.0.2 开始,如果在没有加入一个新私有网卡,就试图删除最后一个私有网卡(集群私网)则会发生下面错误:

PRIF-31: Failed to delete the specified network interface because it is the last private interface

解决方案:在删除旧的私有网卡之前,先加入新的私有网卡。

5. 如果主机节点的 oracle 集群关闭在关闭状态,则会报下面错误:

$ oifcfg getif
PRIF-10: failed to initialize the cluster registry

解决方案:启动该主机节点上的 oracle 集群软件。

 

关于 Windows 系统注意事项

更改网卡的语法在 windows/RAC 和Unix/Linux 集群是一样的,但是网卡名称会略有不同,在 windows 系统上,默认分配给接口通常的名称为:

Local Area Connection
Local Area Connection 1
Local Area Connection 2

如果使用一个网卡名称含有空格,则名称必须使用引号括起来,同时,请注意这是区分大小写的。例如,在 windows上,设置集群私网链接:

C:\oracle\product\10.2.0\crs\bin\oifcfg setif -global “Local Area Connection 1″/192.168.1.0:cluster_interconnect

然而,在 windows 上重新命名网卡按最佳实践更有意义,如重命名为”ocwpublic” 和”ocwprivate”。如果 oracle 集群安装完成后需要更改网卡名字,则需要运行”oifcfg”命令来添加新的网卡并删除旧的。综上所述。

您可以运行下面命令查看每个节点上可用的网卡名字。

oifcfg iflist -p -n

必须在每个节点上运行这个命令来验证网卡名称相同的定义。

使用 oifcfg 命令更改网卡名字的影响

对于私网网卡,数据库将使用存储在 OCR 中定义为集群互联的网卡作为节点间缓存融合通信。在告警日志开始的时候,就会显示集群互联有效的信息。在参数清单。例如:

For pre 11.2.0.2:
Cluster communication is configured to use the following interface(s) for this instance
192.XXX.X.1

For 11.2.0.2+: (HAIP address will show in alert log instead of private IP)
Cluster communication is configured to use the following interface(s) for this instance
169.XXX.XX.97

如果上面信息不正确,则实例需要重启以便 OCR 条目修正,这同样适用于 ASM 实例和数据库实例。在 windows 系统上,实例被关闭后,在 OCR 将被重读之前,还需要停止/启动 OracleService < SID >(或 OracleASMService < ASMSID > 。

 

Oifcfg 命令用法

查看 oifcfg 命令的全部选项,只需输入:

$ <CRS_HOME>/bin/oifcfg

 

更新 crsconfig_params

一些工具比如 fpp 或者 rhp 需要使用这个文件,因此它需要保持最新。

允许修改网络资源信息

cd $GI_HOME/crs/install 并且备份文件 crsconfig_params
cp $GI_HOME/crs/install/crsconfig_params $GI_HOME/crs/install/crsconfig_params.orig

然后以root用户修改此文件

vi $GI_HOME/crs/install/crsconfig_params

只修改CRS_NODEVIP, NETWORKS, 以及 NEW_NODEVIPS对应的网址和子网信息

保存文件

例5 对于 11gR2 或以上版本的 HAIP 添加或删除集群私网

1. 添加另外的私有网络到现有的使用 HAIP 的集群中,作为 grid 用户执行:

$ oifcfg setif -global <interface>/<subnet>:cluster_interconnect

例如:

$ oifcfg setif -global <interfacename>/192.XXX.XX.0:cluster_interconnect

关闭 CRS 中的所有节点,通过重新启动 crs 中的所有节点,来使 HAIP 读入新的接口,不能使用滚动方式重启。

2. 在使用 HAIP 的集群中删除私有网络,作为 grid 用户执行:

$ oifcfg delif -global <nterface>

例如:
$ oifcfg delif -global <interfacename>

HAIP 将切换至其它可用接口,在接口被删除后,集群/数据库会继续采用此方式运行。

删除多余的 HAIP 接口,应关闭 CRS 所有节点,然后重启 CRS 所有节点。不能采用以滚动的方式重新启动 CRS。

本来来自:如何在 oracle 集群环境下修改私网信息 (Doc ID 2103317.1)

ORA-600 [kcvfdb_pdb_set_clean_scn: cleanckpt] 相关bug

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 [kcvfdb_pdb_set_clean_scn: cleanckpt] 相关bug

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

从oracle 12c引入pdb特性之后,数据库非常规恢复中就有一个ORA-600 [kcvfdb_pdb_set_clean_scn: cleanckpt]的噩梦,一般是在recreate controlfile之后resetlogs库之后,在open pdb会遇到这个错误,mos上主要bug信息
083940


Bug 22709877 open resetlogs fails with ORA-600 [kcvfdb_pdb_set_clean_scn: cleanckpt]
kcvfdb_pdb_set_clean_scn cleanckpt

Bug 16784143 ORA-600 [kcvfdb_pdb_set_clean_scn: cleanckpt] with PDBs
bug 16784143

Bug 31686545 RMAN: Open PDB fails with ORA-00600 [kcvfdb_pdb_set_clean_scn: cleanckpt]
bug 31686545

对于这些bug引起的ORA-600 [kcvfdb_pdb_set_clean_scn: cleanckpt]问题,尽可能规避掉这些bug,实在无法规避考虑通过一些其他途径解决问题

ORA-600 krhpfh_03-1210故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 krhpfh_03-1210故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

rac数据库多个节点均处于open状态,数据查询正常,但是应用入库有些时候会失败报类似ORA-01187: cannot read from file because it failed verification tests错误:
ora-01187


故障最初原因是由于有坏盘,换盘之后,有两个节点数据实例crash

Mon Aug 19 21:16:47 2024
Read of datafile '+DATA/xifenfei99.dbf' (fno 1399) header failed with ORA-01207
Rereading datafile 1399 header failed with ORA-01207
Errors in file /u01/app/oracle/diag/rdbms/xff/xff5/trace/xff5_ckpt_75779.trc:
ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
ORA-01207: file is more recent than control file - old control file
Errors in file /u01/app/oracle/diag/rdbms/xff/xff5/trace/xff5_ckpt_75779.trc:
ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
ORA-01207: file is more recent than control file - old control file
CKPT (ospid: 75779): terminating the instance due to error 1242
Mon Aug 19 21:16:47 2024
System state dump requested by (instance=5, osid=75779 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff5/trace/xff5_diag_75725.trc
Mon Aug 19 21:16:52 2024
ORA-1092 : opitsk aborting process
Mon Aug 19 21:16:53 2024
ORA-1092 : opitsk aborting process
Mon Aug 19 21:16:53 2024
License high water mark = 131
Termination issued to instance processes. Waiting for the processes to exit
Mon Aug 19 21:17:02 2024
Instance termination failed to kill one or more processes
Instance terminated by CKPT, pid = 75779
Mon Aug 19 21:17:03 2024
USER (ospid: 33495): terminating the instance
Termination issued to instance processes. Waiting for the processes to exit
Mon Aug 19 21:17:13 2024
Instance termination failed to kill one or more processes
Instance terminated by USER, pid = 33495

但是数据库人工启动成功,查询所有数据文件均处于online状态
20240820-182825


可是有部分入库进程非常慢大量等待在enq:HW – contention
20240826-120804

所有数据库节点alert日志偶尔报ORA-01186: file 1399 failed verification tests等错

Tue Aug 20 21:30:02 2024
Read of datafile '+DATA/xifenfei99.dbf' (fno 1399) header failed with ORA-01207
Rereading datafile 1399 header failed with ORA-01207
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_dbw0_43828.trc:
ORA-01186: file 1399 failed verification tests
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
ORA-01207: file is more recent than control file - old control file
File 1399 not verified due to error ORA-01122
Read of datafile '+DATA/xifenfei99.dbf' (fno 1399) header failed with ORA-01207
Rereading datafile 1399 header failed with ORA-01207
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_dbw0_43828.trc:
ORA-01186: file 1399 failed verification tests
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
ORA-01207: file is more recent than control file - old control file
File 1399 not verified due to error ORA-01122

基于这种情况,初步判断:
1. 是由于该集群本身多节点(6个节点),只要有节点是open状态,其他节点关闭再启动依旧可以正常启动,但是无法写入数据到报ORA-01207错误的数据文件中(可以读取数据).
2. 如果所有节点关闭关闭,然后数据库无法正常启动会报ORA-01207: file is more recent than control file错误

这样的情况,根据以往经验,ORA-01207: file is more recent than control file通过重建ctl即可恢复,先关闭所有节点,然后尝试启动一个节点

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
ORA-01207: file is more recent than control file - old control file
alter database open
Wed Aug 21 14:14:22 2024
SUCCESS: diskgroup REDO was mounted
Wed Aug 21 14:14:22 2024
NOTE: dependency between database xff and diskgroup resource ora.REDO.dg is established
Wed Aug 21 14:14:27 2024
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_47884.trc:
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
ORA-01207: file is more recent than control file - old control file
ORA-1122 signalled during: alter database open...

和预期的一样,重试重建ctl,然后数据库报ORA-00600 [krhpfh_03-1210]错误

SQL> shutdown immediate;
ORA-01109: database not open


Database dismounted.
ORACLE instance shut down.
SQL> startup nomount pfile='/tmp/xff/pfile';
ORACLE instance started.

Total System Global Area 1.3255E+11 bytes
Fixed Size		    2244832 bytes
Variable Size		 9.7442E+10 bytes
Database Buffers	 3.4897E+10 bytes
Redo Buffers		  208654336 bytes
SQL> @rectl

Control file created.

SQL> 
SQL> 
SQL> 
SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-01610: recovery using the BACKUP CONTROLFILE option must be done


SQL> recover database using backup controlfile;
ORA-00283: recovery session canceled due to errors
ORA-00600: internal error code, arguments: [krhpfh_03-1210], [fno =], [1399],
[fhcpc =], [274968], [fhccc =], [274983], [], [], [], [], []
ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'

这里的提示是有fhcpc和fhccc值不对导致,通过bbed查看相关值

BBED> set file 1399
	FILE#          	1399

BBED> p kcvfhccc
ub4 kcvfhccc                                @148      0x00043227 ===>274983(10进制)

BBED> p kcvfhcpc
ub4 kcvfhcpc                                @140      0x00043218 ===>274968(10进制)

报错比较明显通过bbed修改这两个值

BBED> m /x 2a390400 offset 148
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: /tmp/xff/1399.dbf.header (1399)
 Block: 1                Offsets:  148 to  659           Dba:0x5dc00001
------------------------------------------------------------------------
 2a390400 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 0c000000 0f004441 
 5441315f 5442535f 45515f30 31000000 00000000 00000000 00000000 78010000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 cfebdd33 01000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 419333df 81001c0a 6ab13046 06000000 
 c1520400 02000000 10000000 7e000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 0d000d00 0d000100 00000000 00000000 

 <32 bytes per line>

BBED> m /x 2b390400 offset 140
 File: /tmp/xff/1399.dbf.header (1399)
 Block: 1                Offsets:  140 to  651           Dba:0x5dc00001
------------------------------------------------------------------------
 2b390400 e6ef524d 2a390400 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 0c000000 0f004441 5441315f 5442535f 45515f30 31000000 00000000 00000000 
 00000000 78010000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 cfebdd33 01000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 419333df 81001c0a 
 6ab13046 06000000 c1520400 02000000 10000000 7e000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 0d000d00 0d000100 

 <32 bytes per line>

修改好这些值之后,recover database和open数据库成功,检查字典正常,业务读写也正常,完成本次恢复任务

SQL> @hcheck
HCheck Version 07MAY18 on 21-AUG-2024 15:13:02
----------------------------------------------
Catalog Version 11.2.0.3.0 (1102000300)
db_name: XFF

				   Catalog	 Fixed
Procedure Name			   Version    Vs Release    Timestamp
Result
------------------------------ ... ---------- -- ---------- --------------
------
.- LobNotInObj		       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
.- MissingOIDOnObjCol	       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
.- SourceNotInObj	       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
.- OversizedFiles	       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
.- PoorDefaultStorage	       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
.- PoorStorage		       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
.- TabPartCountMismatch        ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
.- OrphanedTabComPart	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- MissingSum$		       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- MissingDir$		       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- DuplicateDataobj	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- ObjSynMissing	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- ObjSeqMissing	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- OrphanedUndo 	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- OrphanedIndex	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- OrphanedIndexPartition      ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- OrphanedIndexSubPartition   ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- OrphanedTable	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- OrphanedTablePartition      ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- OrphanedTableSubPartition   ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- MissingPartCol	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- OrphanedSeg$ 	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- OrphanedIndPartObj#	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- DuplicateBlockUse	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- FetUet		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- Uet0Check		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- SeglessUET		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- BadInd$		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- BadTab$		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- BadIcolDepCnt	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- ObjIndDobj		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- TrgAfterUpgrade	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- ObjType0		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- BadOwner		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- StmtAuditOnCommit	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- BadPublicObjects	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- BadSegFreelist	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- BadDepends		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- CheckDual		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- ObjectNames		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- BadCboHiLo		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- ChkIotTs		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- NoSegmentIndex	       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- BadNextObject	       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- DroppedROTS		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- FilBlkZero		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- DbmsSchemaCopy	       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- OrphanedObjError	       ... 1102000300 >  1102000000 08/21 15:13:05 PASS
.- ObjNotLob		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- MaxControlfSeq	       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- SegNotInDeferredStg	       ... 1102000300 >  1102000000 08/21 15:13:06 PASS
.- SystemNotRfile1	       ... 1102000300 >   902000000 08/21 15:13:06 PASS
.- DictOwnNonDefaultSYSTEM     ... 1102000300 <=  *All Rel* 08/21 15:13:07 PASS
.- OrphanTrigger	       ... 1102000300 <=  *All Rel* 08/21 15:13:07 PASS
.- ObjNotTrigger	       ... 1102000300 <=  *All Rel* 08/21 15:13:07 PASS
---------------------------------------
21-AUG-2024 15:13:07  Elapsed: 5 secs
---------------------------------------
Found 0 potential problem(s) and 0 warning(s)

PL/SQL procedure successfully completed.

Statement processed.

Complete output is in trace file:
/u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_70961_HCHECK.trc

19c库启动报ORA-600 kcbzib_kcrsds_1

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:19c库启动报ORA-600 kcbzib_kcrsds_1

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一套19c的库由于某种情况,发现异常,当时的技术使用隐含参数强制拉库,导致数据库启动报ORA-00704 ORA-600 kcbzib_kcrsds_1错误
kcbzib_kcrsds_1

2024-08-24T06:11:25.494304+08:00
ALTER DATABASE OPEN
2024-08-24T06:11:25.494370+08:00
TMI: adbdrv open database BEGIN 2024-08-24 06:11:25.494324
Smart fusion block transfer is disabled:
  instance mounted in exclusive mode.
2024-08-24T06:11:25.515306+08:00
Beginning crash recovery of 1 threads
 parallel recovery started with 7 processes
 Thread 1: Recovery starting at checkpoint rba (logseq 2 block 3), scn 286550073
2024-08-24T06:11:25.567011+08:00
Started redo scan
2024-08-24T06:11:25.587170+08:00
Completed redo scan
 read 0 KB redo, 0 data blocks need recovery
2024-08-24T06:11:25.595192+08:00
Started redo application at
 Thread 1: logseq 2, block 3, offset 0, scn 0x0000000011146839
2024-08-24T06:11:25.595552+08:00
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: /dbf/RLZY/redo02.log
2024-08-24T06:11:25.595712+08:00
Completed redo application of 0.00MB
2024-08-24T06:11:25.596058+08:00
Completed crash recovery at
 Thread 1: RBA 2.3.0, nab 3, scn 0x000000001114683a
 0 data blocks read, 0 data blocks written, 0 redo k-bytes read
Endian type of dictionary set to little
2024-08-24T06:11:25.648152+08:00
LGWR (PID:1614826): STARTING ARCH PROCESSES
2024-08-24T06:11:25.661738+08:00
TT00 (PID:1614908): Gap Manager starting
Starting background process ARC0
2024-08-24T06:11:25.677246+08:00
ARC0 started with pid=54, OS id=1614910 
2024-08-24T06:11:25.687525+08:00
LGWR (PID:1614826): ARC0: Archival started
LGWR (PID:1614826): STARTING ARCH PROCESSES COMPLETE
2024-08-24T06:11:25.687733+08:00
ARC0 (PID:1614910): Becoming a 'no FAL' ARCH
ARC0 (PID:1614910): Becoming the 'no SRL' ARCH
2024-08-24T06:11:25.696437+08:00
TMON (PID:1614886): STARTING ARCH PROCESSES
Starting background process ARC1
2024-08-24T06:11:25.711645+08:00
Thread 1 advanced to log sequence 3 (thread open)
Redo log for group 3, sequence 3 is not located on DAX storage
2024-08-24T06:11:25.715270+08:00
ARC1 started with pid=56, OS id=1614914 
Starting background process ARC2
Thread 1 opened at log sequence 3
  Current log# 3 seq# 3 mem# 0: /dbf/RLZY/redo03.log
Successful open of redo thread 1
2024-08-24T06:11:25.728586+08:00
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Stopping change tracking
2024-08-24T06:11:25.734124+08:00
ARC2 started with pid=57, OS id=1614916 
Starting background process ARC3
2024-08-24T06:11:25.752891+08:00
ARC3 started with pid=58, OS id=1614918 
2024-08-24T06:11:25.752979+08:00
TMON (PID:1614886): ARC1: Archival started
TMON (PID:1614886): ARC2: Archival started
TMON (PID:1614886): ARC3: Archival started
TMON (PID:1614886): STARTING ARCH PROCESSES COMPLETE
2024-08-24T06:11:25.802551+08:00
ARC0 (PID:1614910): Archived Log entry 2828 added for T-1.S-2 ID 0x74f18f91 LAD:1
2024-08-24T06:11:25.806845+08:00
TT03 (PID:1614922): Sleep 5 seconds and then try to clear SRLs in 2 time(s)
Errors in file /oracle/diag/rdbms/xff/xff/trace/xff_ora_1614892.trc  (incident=124865):
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/diag/rdbms/xff/xff/incident/incdir_124865/xff_ora_1614892_i124865.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2024-08-24T06:11:25.871925+08:00
2024-08-24T06:11:26.772652+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2024-08-24T06:11:26.872265+08:00
Errors in file /oracle/diag/rdbms/xff/xff/trace/xff_ora_1614892.trc:
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
2024-08-24T06:11:26.872351+08:00
Errors in file /oracle/diag/rdbms/xff/xff/trace/xff_ora_1614892.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
2024-08-24T06:11:26.872412+08:00
Errors in file /oracle/diag/rdbms/xff/xff/trace/xff_ora_1614892.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
2024-08-24T06:11:26.872455+08:00
Error 704 happened during db open, shutting down database
Errors in file /oracle/diag/rdbms/xff/xff/trace/xff_ora_1614892.trc  (incident=124866):
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/diag/rdbms/xff/xff/incident/incdir_124866/xff_ora_1614892_i124866.trc
opiodr aborting process unknown ospid (1614892) as a result of ORA-603
2024-08-24T06:11:27.498146+08:00
Errors in file /oracle/diag/rdbms/xff/xff/trace/xff_ora_1614892.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
2024-08-24T06:11:27.501122+08:00
ORA-603 : opitsk aborting process
License high water mark = 8
USER(prelim) (ospid: 1614892): terminating the instance due to ORA error 704
2024-08-24T06:11:28.526358+08:00
Instance terminated by USER(prelim), pid = 1614892

官方关于kcbzib_kcrsds_1从解释只有:Bug 31887074 – sr21.1bigscn_hipu3 – trc – ksfdopn2 – ORA-600 [kcbzib_kcrsds_1] (Doc ID 31887074.8)
ksfdopn2


虽然关于ORA-600 [kcbzib_kcrsds_1],oracle官方没有给出来解决方案,其实通过以往大量的恢复案例和经验中已经知道,这个错误解决方案就是修改oracle scn的方法可以绕过去,以前有过一些类似恢复案例:
ORA-600 kcbzib_kcrsds_1报错
12C数据库报ORA-600 kcbzib_kcrsds_1故障处理
ORA-00603 ORA-01092 ORA-600 kcbzib_kcrsds_1
redo异常强制拉库报ORA-600 kcbzib_kcrsds_1修复
Patch SCN工具一键恢复ORA-600 kcbzib_kcrsds_1
存储故障,强制拉库报ORA-600 kcbzib_kcrsds_1处理

DBMS_SESSION.set_context提示ORA-01031问题解决

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:DBMS_SESSION.set_context提示ORA-01031问题解决

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最近给客户把oracle数据库从11.2.0.3 aix平台迁移到19.23 linux平台,使用impdp+network_link 按照用户的方式处理,结果发现有一个程序运行异常
ORA-01031


ORA-01031: insufficient privileges通过程序跟踪确认是在调用以下部分异常

 DBMS_SESSION.set_context (
         'back_exec',
         'back_exec_log_no',
         v_back_exec_log_no
……

通过跟踪执行用户权限,确认EXECUTE ON SYS.DBMS_SESSION已经授权.做一个简单测试重现给问题:

SQL> show user;
USER 为 "SYS"

SQL> exec DBMS_SESSION.SET_CONTEXT('test_ctx', 'a', '1');
BEGIN DBMS_SESSION.SET_CONTEXT('test_ctx', 'a', '1'); END;

*
第 1 行出现错误:
ORA-01031: 权限不足
ORA-06512: 在 "SYS.DBMS_SESSION", line 114
ORA-06512: 在 line 1

SQL> GRANT EXECUTE ON SYS.DBMS_SESSION TO sys;

授权成功。

SQL> exec DBMS_SESSION.SET_CONTEXT('test_ctx', 'a', '1');
BEGIN DBMS_SESSION.SET_CONTEXT('test_ctx', 'a', '1'); END;

*
第 1 行出现错误:
ORA-01031: 权限不足
ORA-06512: 在 "SYS.DBMS_SESSION", line 114
ORA-06512: 在 line 1

基于这种情况,肯定不是权限问题,查询官方DBMS_SESSION.set_context部分描述
dbms_session.set_context


确认namespace:The namespace of the application context to be set, limited to 128 bytes. Exceeding the maximum permissible length will result in an error during the execution of the procedure.

SQL> create context test_ctx using sys.DBMS_SESSION;

上下文已创建。

SQL> exec DBMS_SESSION.SET_CONTEXT('test_ctx', 'a', '1');

PL/SQL 过程已成功完成。

官方有相关的执行例子:Example: Creating a Global Application Context that Uses a Client Session ID

redo写丢失导致ORA-600 kcrf_resilver_log_1故障

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:redo写丢失导致ORA-600 kcrf_resilver_log_1故障

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有一个客户硬件故障,做完硬件恢复之后,数据库启动报ORA-600 kcrf_resilver_log_1错误.
kcrf_resilver_log_1

Thu Aug 22 13:37:50 2024
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 3 processes
Started redo scan
Errors in file e:\oracle\zy\diag\rdbms\orcl\orcl\trace\orcl_ora_1640.trc  (incident=9767):
ORA-00600: 内部错误代码, 参数: [kcrf_resilver_log_1], [0x7DCEBE020], [2], [], [], [], [], [], [], [], [], []
Incident details in: e:\oracle\zy\diag\rdbms\orcl\orcl\incident\incdir_9767\orcl_ora_1640_i9767.trc
Thu Aug 22 13:37:55 2024
Trace dumping is performing id=[cdmp_20240822133755]
Aborting crash recovery due to error 600
Errors in file e:\oracle\zy\diag\rdbms\orcl\orcl\trace\orcl_ora_1640.trc:
ORA-00600: 内部错误代码, 参数: [kcrf_resilver_log_1], [0x7DCEBE020], [2], [], [], [], [], [], [], [], [], []
Errors in file e:\oracle\zy\diag\rdbms\orcl\orcl\trace\orcl_ora_1640.trc:
ORA-00600: 内部错误代码, 参数: [kcrf_resilver_log_1], [0x7DCEBE020], [2], [], [], [], [], [], [], [], [], []

查询mos出现该问题的原因一般是由于redo log write lost导致
kcrf_resilver_log_1-9056657


这个问题恢复起来不难,一般就是尝试强制打开库,以前有过类似的恢复case:
ORA-600 kcrf_resilver_log_1故障处理
ORA-00600[kcrf_resilver_log_1]异常恢复

硬件故障导致ORA-01242 ORA-01122等错误

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:硬件故障导致ORA-01242 ORA-01122等错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户多个节点rac,早上反馈说有两个节点实例异常,需要分析原因,查看其中一个节点的数据库alert日志,发现是由于访问1399号文件异常报ORA-01242 ORA-01122等错误,导致实例crash

Mon Aug 19 20:48:02 2024
Read of datafile '+DATA/xifenfei_01-157.dbf' (fno 1399) header failed with ORA-01207
Rereading datafile 1399 header failed with ORA-01207
Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_ckpt_75582.trc:
ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf'
ORA-01207: file is more recent than control file - old control file
Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_ckpt_75582.trc:
ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf'
ORA-01207: file is more recent than control file - old control file
CKPT (ospid: 75582): terminating the instance due to error 1242
Mon Aug 19 20:48:02 2024
System state dump requested by (instance=6, osid=75582 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_diag_75520.trc
Termination issued to instance processes. Waiting for the processes to exit
Mon Aug 19 20:48:13 2024
ORA-1092 : opitsk aborting process

继续分析日志发现集群尝试拉起该实例,遭遇ORA-01186,ORA-01122无法启动成功

ALTER DATABASE OPEN /* db agent *//* {0:6:39} */
Mon Aug 19 20:49:34 2024
SUCCESS: diskgroup DATA was mounted
Mon Aug 19 20:49:34 2024
NOTE: dependency between database xff and diskgroup resource ora.DATA.dg is established
Mon Aug 19 20:50:41 2024
Picked broadcast on commit scheme to generate SCNs
Mon Aug 19 20:50:42 2024
Read of datafile '+DATA/xifenfei_01-157.dbf' (fno 1399) header failed with ORA-01207
Rereading datafile 1399 header failed with ORA-01207
Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_dbw0_29208.trc:
ORA-01186: file 1399 failed verification tests
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf'
ORA-01207: file is more recent than control file - old control file
File 1399 not verified due to error ORA-01122

这个错误是数据库文件访问异常导致,根据经验,出现这种问题一般是由于底层异常导致,查看系统messages日志,发现有硬件磁盘报错

Aug 19 20:41:58 xff6 fcoemon: FC_HOST_EVENT 6894 at 1724071318 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:41:58 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:41:58 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:41:58 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:41:58 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:03 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:42:03 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:03 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:42:03 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:03 xff6 fcoemon: FC_HOST_EVENT 6895 at 1724071323 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:07 xff6 fcoemon: FC_HOST_EVENT 6896 at 1724071327 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:07 xff6 kernel: sd 1:0:0:44: [sdat]  
Aug 19 20:42:07 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:07 xff6 kernel: sd 1:0:0:44: [sdat]  
Aug 19 20:42:07 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:12 xff6 fcoemon: FC_HOST_EVENT 6897 at 1724071332 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:12 xff6 kernel: sd 1:0:0:44: [sdat]  
Aug 19 20:42:12 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:12 xff6 kernel: sd 1:0:0:44: [sdat]  
Aug 19 20:42:12 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:25 xff6 fcoemon: FC_HOST_EVENT 6898 at 1724071345 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:25 xff6 kernel: sd 1:0:0:42: [sdar]  
Aug 19 20:42:25 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:25 xff6 kernel: sd 1:0:0:42: [sdar]  
Aug 19 20:42:25 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6899 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:42: [sdar]  
Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:42: [sdar]  
Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0xd0 ASCQ=0x6ASC=0xd0 ASCQ=0x6
Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6900 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0xd0 ASCQ=0x6ASC=0xd0 ASCQ=0x6
Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6901 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:53 xff6 fcoemon: FC_HOST_EVENT 6902 at 1724071373 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:53 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:53 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:53 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:53 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
Aug 19 20:43:03 xff6 kernel: sd 1:0:0:40: [sdap]  
Aug 19 20:43:03 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:43:03 xff6 kernel: sd 1:0:0:40: [sdap]  
Aug 19 20:43:03 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6903 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6904 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6905 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:43:03 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:43:03 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:43:03 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:43:03 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
Aug 19 20:49:26 xff6 kernel: scsi_verify_blk_ioctl: 683 callbacks suppressed

客户进一步分析是由于昨天存储坏了一块盘,然后热备盘顶上了,但是不知道什么原因出现了文件访问异常,可能和当时的rebuild过程有关系.由于客户是rac环境,还有部分剩余节点运行正常,对于异常节点直接启动库成功
20240820-182825


节点写入数据报ORA-01187: cannot read from file because it failed verification tests错误
ora-01187

在所有节点通过执行ALTER SYSTEM CHECK DATAFILES,然后所有节点操作正常
check_datafile

200T 数据库非归档无备份恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:200T 数据库非归档无备份恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一套近200T的,6个节点的RAC,由于存储管线链路不稳定,导致服务器经常性掉盘,引起asm 磁盘组频繁dismount/mount,数据库集群节点不停的重启,修复好链路问题之后,数据库启动报ORA-01113,ORA-01110
ORA-01113-ORA-01110


通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)脚本检测,发现有10个数据文件异常,无法正常恢复
20240814155122

该库比较大,有近200T,因此恢复需要各位谨慎(无法做现场备份,另外客户要求2天时间必须恢复好)
200t

由于数据库是非归档模式,该库无法通过应用归档日志来实现对这些文件进行恢复,对于这种情况,直接使用dbms_diskgroup把数据文件头拷贝到文件系统中,类似操作

SQL> @dbms_diskgroup_get_block.sql  +DATA/xifenfei.dbf 1 1 /tmp/xff/xifenfei.dbf.header

Parameter 1:
ASM_file_name (required)


Parameter 2:
block_to_extract (required)


Parameter 3
number_of_blocks_to_extract (required)


Parameter 4:
FileSystem_File_Name (required)

old  14:  v_AsmFilename := '&ASM_File_Name';
new  14:  v_AsmFilename := '+DATA/xifenfei.dbf';
old  15:  v_offstart := '&block_to_extract';
new  15:  v_offstart := '1';
old  16:  v_numblks := '&number_of_blocks_to_extract';
new  16:  v_numblks := '1';
old  17:  v_FsFilename := '&FileSystem_File_Name';
new  17:  v_FsFilename := '/tmp/xff/xifenfei.dbf.header';
File: +DATA/xifenfei.dbf
Type: 2 Data File
Size (in logical blocks): 3978880
Logical Block Size: 16384
Physical Block Size: 512

PL/SQL procedure successfully completed.

然后通过bbed修改相关scn

BBED> set filename 'xifenfei.dbf.header'
	FILENAME       	xifenfei.dbf.header

BBED> set blocksize 16384
	BLOCKSIZE      	16384

BBED> map
 File: xifenfei.dbf.header (0)
 Block: 1                                     Dba:0x00000000
------------------------------------------------------------
 Data File Header

 struct kcvfh, 860 bytes                    @0       

 ub4 tailchk                                @16380   


BBED> p kcvfh.kcvfhckp.kcvcpscn
struct kcvcpscn, 8 bytes                    @484     
   ub4 kscnbas                              @484      0xa8061324
   ub2 kscnwrp                              @488      0x0081

BBED> assign file 295 block 1 kcvfh.kcvfhckp.kcvcpscn = file 1 block 1 kcvfh.kcvfhckp.kcvcpscn;
struct kcvcpscn, 8 bytes                    @484     
   ub4 kscnbas                              @484      0xa8133e2b
   ub2 kscnwrp                              @488      0x0081

然后把修改的数据文件头写回到asm中

SQL> @dbms_diskgroup_cp_block_to_asm.sql  /tmp/xff/xifenfei.dbf.header  +DATA/xifenfei.dbf 1 1 

Parameter 1:
v_FsFileName (required)


Parameter 2:
v_AsmFileName (required)


Parameter 3
v_offstart (required)


Parameter 4
v_numblks (required)

old  16: v_FsFileName := '&v_FsFileName';
new  16: v_FsFileName := '/tmp/xff/xifenfei.dbf.header';
old  17: v_AsmFileName := '&v_AsmFileName';
new  17: v_AsmFileName := '+DATA/xifenfei.dbf';
old  18: v_offstart := '&v_offstart';
new  18: v_offstart := '1';
old  19:  v_numblks := '&v_numblks';
new  19:  v_numblks := '1';
File: +DATA/xifenfei.dbf
Type: 2 Data File
Size (in logical blocks): 3978880
Logical Block Size: 16384

PL/SQL procedure successfully completed.

查询文件头是否修改成功

[oracle@xff1 xff]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Sat Aug 10 16:45:02 2024

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL> set numw 16
SQL> select CHECKPOINT_CHANGE# from v$datafile_header where file# in (1,295);

CHECKPOINT_CHANGE#
------------------
      556870614571
      556870614571

SQL> recover datafile 295;
Media recovery complete.

通过上述操作,确认bbed修改文件头成功,后续类似方法对其他9个文件进行修改,并打开数据库

SQL> recover database;
Media recovery complete.
SQL> alter database open;

Database altered.

alert日志提示

Sat Aug 10 16:46:11 2024
ALTER DATABASE RECOVER  datafile 295  
Media Recovery Start
Serial Media Recovery started
WARNING! Recovering data file 295 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
Media Recovery Complete (xff1)
Completed: ALTER DATABASE RECOVER  datafile 295  
Sat Aug 10 16:46:39 2024
ALTER DATABASE RECOVER  database  
Media Recovery Start
 started logmerger process
Sat Aug 10 16:46:51 2024
WARNING! Recovering data file 1139 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 1140 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 1601 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 1803 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 1827 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 1931 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 2185 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 2473 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 2616 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
Sat Aug 10 16:46:54 2024
Parallel Media Recovery started with 64 slaves
Media Recovery Complete (xff1)
Completed: ALTER DATABASE RECOVER  database  
Sat Aug 10 17:19:58 2024
alter database open
This instance was first to open
Sat Aug 10 17:19:58 2024
SUCCESS: diskgroup DATA was mounted
Sat Aug 10 17:19:58 2024
NOTE: dependency between database xff and diskgroup resource ora.DATA.dg is established
Sat Aug 10 17:20:10 2024
Picked broadcast on commit scheme to generate SCNs
Sat Aug 10 17:20:10 2024
SUCCESS: diskgroup REDO was mounted
Sat Aug 10 17:20:10 2024
NOTE: dependency between database xff and diskgroup resource ora.REDO.dg is established
Thread 1 opened at log sequence 124958
  Current log# 14 seq# 124958 mem# 0: +REDO/xff/log2.ora
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Aug 10 17:20:14 2024
SMON: enabling cache recovery
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
[33770] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:261099864 end:261100854 diff:990 (9 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Sat Aug 10 17:20:16 2024
minact-scn: Inst 1 is now the master inc#:2 mmon proc-id:33650 status:0x7
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
Starting background process GTX0
Sat Aug 10 17:20:16 2024
GTX0 started with pid=45, OS id=34119 
Starting background process RCBG
Sat Aug 10 17:20:16 2024
RCBG started with pid=46, OS id=34121 
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Sat Aug 10 17:20:16 2024
QMNC started with pid=47, OS id=34134 
Starting background process SMCO
Completed: alter database open

其他集群其他节点数据库,一切正常
20240814162201


检查数据字典一致性

SQL> @hcheck.sql
HCheck Version 07MAY18 on 10-AUG-2024 18:24:49
----------------------------------------------
Catalog Version 11.2.0.3.0 (1102000300)
db_name: XFF

				   Catalog	 Fixed
Procedure Name			   Version    Vs Release    Timestamp
Result
------------------------------ ... ---------- -- ---------- --------------
------
.- LobNotInObj		       ... 1102000300 <=  *All Rel* 08/10 18:24:49 PASS
.- MissingOIDOnObjCol	       ... 1102000300 <=  *All Rel* 08/10 18:24:49 PASS
.- SourceNotInObj	       ... 1102000300 <=  *All Rel* 08/10 18:24:49 PASS
.- OversizedFiles	       ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- PoorDefaultStorage	       ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- PoorStorage		       ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- TabPartCountMismatch        ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- OrphanedTabComPart	       ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- MissingSum$		       ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- MissingDir$		       ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- DuplicateDataobj	       ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- ObjSynMissing	       ... 1102000300 <=  *All Rel* 08/10 18:24:51 PASS
.- ObjSeqMissing	       ... 1102000300 <=  *All Rel* 08/10 18:24:51 PASS
.- OrphanedUndo 	       ... 1102000300 <=  *All Rel* 08/10 18:24:51 PASS
.- OrphanedIndex	       ... 1102000300 <=  *All Rel* 08/10 18:24:51 PASS
.- OrphanedIndexPartition      ... 1102000300 <=  *All Rel* 08/10 18:24:51 PASS
.- OrphanedIndexSubPartition   ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- OrphanedTable	       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- OrphanedTablePartition      ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- OrphanedTableSubPartition   ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- MissingPartCol	       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- OrphanedSeg$ 	       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- OrphanedIndPartObj#	       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- DuplicateBlockUse	       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- FetUet		       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- Uet0Check		       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- SeglessUET		       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- BadInd$		       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- BadTab$		       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- BadIcolDepCnt	       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- ObjIndDobj		       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- TrgAfterUpgrade	       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- ObjType0		       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- BadOwner		       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- StmtAuditOnCommit	       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- BadPublicObjects	       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- BadSegFreelist	       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- BadDepends		       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- CheckDual		       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- ObjectNames		       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- BadCboHiLo		       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- ChkIotTs		       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- NoSegmentIndex	       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- BadNextObject	       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- DroppedROTS		       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- FilBlkZero		       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- DbmsSchemaCopy	       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- OrphanedObjError	       ... 1102000300 >  1102000000 08/10 18:24:54 PASS
.- ObjNotLob		       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- MaxControlfSeq	       ... 1102000300 <=  *All Rel* 08/10 18:24:55 PASS
.- SegNotInDeferredStg	       ... 1102000300 >  1102000000 08/10 18:25:18 PASS
.- SystemNotRfile1	       ... 1102000300 >   902000000 08/10 18:25:18 PASS
.- DictOwnNonDefaultSYSTEM     ... 1102000300 <=  *All Rel* 08/10 18:25:18 PASS
.- OrphanTrigger	       ... 1102000300 <=  *All Rel* 08/10 18:25:18 PASS
.- ObjNotTrigger	       ... 1102000300 <=  *All Rel* 08/10 18:25:18 PASS
---------------------------------------
10-AUG-2024 18:25:18  Elapsed: 29 secs
---------------------------------------
Found 0 potential problem(s) and 0 warning(s)

PL/SQL procedure successfully completed.

Statement processed.

Complete output is in trace file:
/u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_71148_HCHECK.trc

运气不错,数据字典本身没有损坏,业务直接运行,一切正常(主要原因是在光纤链路不稳定的情况下,客户已经没有往库中写入数据)

利用flashback快速恢复failover 的备库

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:利用flashback快速恢复failover 的备库

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户数据库架构为单机+dataguard,一台生产库跑在物理机,备库跑在虚拟化环境中(当时由于成本原因使用了机械盘),今天物理机突然直接罢工,客户要求紧急切换备库

Thu Aug 08 09:52:13 2024
Media Recovery Waiting for thread 1 sequence 189448 (in transit)
Recovery of Online Redo Log: Thread 1 Group 12 Seq 189448 Reading mem 0
  Mem# 0: /oradata/xff/std_redo12.log
Thu Aug 08 09:52:13 2024
Archived Log entry 187514 added for thread 1 sequence 189447 ID 0x2e6bc37f dest 1:
Thu Aug 08 10:54:40 2024
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH force
Terminal Recovery: Stopping real time apply
Thu Aug 08 10:54:40 2024
MRP0: Background Media Recovery cancelled with status 16037
Errors in file /u01/app/oracle/diag/rdbms/xffdg/xff/trace/xff_pr00_17876.trc:
ORA-16037: user requested cancel of managed recovery operation
Managed Standby Recovery not using Real Time Apply
Recovery interrupted!
Recovered data files to a consistent state at change 34188310512
Thu Aug 08 10:54:43 2024
MRP0: Background Media Recovery process shutdown (xff)
Terminal Recovery: Stopped real time apply
Thu Aug 08 10:55:14 2024
Stopping background process MMNL
Stopping background process MMON
Thu Aug 08 10:55:46 2024
Background process MMON not dead after 30 seconds
Killing background process MMON
All dispatchers and shared servers shutdown
CLOSE: killing server sessions.
Active process 17691 user 'oracle' program 'oracle@xffDG (MMON)'
Active process 15077 user 'oracle' program 'oracle@xffDG'
Active process 17691 user 'oracle' program 'oracle@xffDG (MMON)'
Active process 11536 user 'oracle' program 'oracle@xffDG (M000)'
Active process 17691 user 'oracle' program 'oracle@xffDG (MMON)'
Active process 15077 user 'oracle' program 'oracle@xffDG'
Active process 11536 user 'oracle' program 'oracle@xffDG (M000)'
Active process 11536 user 'oracle' program 'oracle@xffDG (M000)'
Active process 11536 user 'oracle' program 'oracle@xffDG (M000)'
CLOSE: all sessions shutdown successfully.
Thu Aug 08 10:56:11 2024
SMON: disabling cache recovery
Attempt to do a Terminal Recovery (xff)
Media Recovery Start: Managed Standby Recovery (xff)
 started logmerger process
Thu Aug 08 10:56:13 2024
Managed Standby Recovery not using Real Time Apply
Parallel Media Recovery started with 4 slaves
Media Recovery Waiting for thread 1 sequence 189448 (in transit)
Killing 4 processes with pids 17733,17729,17731,32533 (all RFS, wait for I/O) 
in order to disallow current and future RFS connections. Requested by OS process 15184
Thu Aug 08 10:56:16 2024
idle dispatcher 'D000' terminated, pid = (16, 1)
Begin: Standby Redo Logfile archival
End: Standby Redo Logfile archival
Terminal Recovery timestamp is '08/08/2024 10:56:17'
Terminal Recovery: applying standby redo logs.
Terminal Recovery: thread 1 seq# 189448 redo required
Terminal Recovery:
Recovery of Online Redo Log: Thread 1 Group 12 Seq 189448 Reading mem 0
  Mem# 0: /oradata/xff/std_redo12.log
Identified End-Of-Redo (failover) for thread 1 sequence 189448 at SCN 0xffff.ffffffff
Incomplete Recovery applied until change 34188310513 time 08/08/2024 11:32:41
Thu Aug 08 10:56:18 2024
Media Recovery Complete (xff)
Terminal Recovery: successful completion
Thu Aug 08 10:56:18 2024
ARCH: Archival stopped, error occurred. Will continue retrying
Forcing ARSCN to IRSCN for TR 7:4123539441
ORACLE Instance xff - Archival Error
Attempt to set limbo arscn 7:4123539441 irscn 7:4123539441
Resetting standby activation ID 778814335 (0x2e6bc37f)
ORA-16014: log 12 sequence# 189448 not archived, no available destinations
ORA-00312: online log 12 thread 1: '/oradata/xff/std_redo12.log'
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH force
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
ORA-16136 signalled during: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL...
Thu Aug 08 10:56:28 2024
ALTER DATABASE ACTIVATE PHYSICAL STANDBY DATABASE
ALTER DATABASE ACTIVATE [PHYSICAL] STANDBY DATABASE (xff)
Begin: Standby Redo Logfile archival
End: Standby Redo Logfile archival
Thu Aug 08 10:56:28 2024
Archiver process freed from errors. No longer stopped
Standby terminal recovery start SCN: 34188310512
RESETLOGS after incomplete recovery UNTIL CHANGE 34188310513
Online log /oradata/xff/redo01.log: Thread 1 Group 1 was previously cleared
Online log /oradata/xff/redo02.log: Thread 1 Group 2 was previously cleared
Online log /oradata/xff/redo03.log: Thread 1 Group 3 was previously cleared
Online log /oradata/xff/redo04.log: Thread 1 Group 4 was previously cleared
Standby became primary SCN: 34188310511
Thu Aug 08 10:56:29 2024
Setting recovery target incarnation to 3
ACTIVATE STANDBY: Complete - Database mounted as primary
Completed: ALTER DATABASE ACTIVATE PHYSICAL STANDBY DATABASE
ARC1: Becoming the 'no SRL' ARCH
alter database open
Thu Aug 08 10:56:34 2024
Assigning activation ID 832379854 (0x319d1bce)
Thread 1 advanced to log sequence 2 (thread open)
Thread 1 opened at log sequence 2
  Current log# 2 seq# 2 mem# 0: /oradata/xff/redo02.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Aug 08 10:56:34 2024
SMON: enabling cache recovery
Thu Aug 08 10:56:34 2024
ARC0: LGWR is scheduled to archive destination LOG_ARCHIVE_DEST_2 after log switch
Thu Aug 08 10:56:34 2024
NSA2 started with pid=14, OS id=15198
[15133] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:1087824580 end:1087828220 diff:3640 (36 seconds)
Dictionary check beginning
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Thu Aug 08 10:56:38 2024
Database Characterset is ZHS16GBK
Starting background process SMCO
Thu Aug 08 10:56:39 2024
SMCO started with pid=15, OS id=15200
Thread 1 advanced to log sequence 3 (LGWR switch)
  Current log# 3 seq# 3 mem# 0: /oradata/xff/redo03.log
******************************************************************
LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2
******************************************************************
Thu Aug 08 10:56:40 2024
Archived Log entry 187515 added for thread 1 sequence 2 ID 0x319d1bce dest 1:
Starting background process QMNC
Thu Aug 08 10:56:43 2024
QMNC started with pid=17, OS id=15204
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Completed: alter database open

很不幸由于虚拟机资源io太差,无法接管业务,硬件工程师紧急修复好物理机,启动数据库正常,客户直接把业务又切换到物理机中,现在需要恢复dataguard环境(并且客户把虚拟机迁移到ssd环境中),把虚拟机数据库重启到mount状态

[oracle@xffDG ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Thu Aug 8 20:06:30 2024

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup mount;
ORACLE instance started.

Total System Global Area 2.5655E+10 bytes
Fixed Size                  2265224 bytes
Variable Size            3892318072 bytes
Database Buffers         2.1743E+10 bytes
Redo Buffers               16896000 bytes
Database mounted.
SQL> select open_mode,database_role from v$database;

OPEN_MODE            DATABASE_ROLE
-------------------- ----------------
MOUNTED              PRIMARY

闪回数据库到备库failover之前scn

SQL> flashback database to scn 34188310500;       

Flashback complete.
Thu Aug 08 20:09:40 2024
flashback database to scn 34188310500
Flashback Restore Start
Thu Aug 08 20:10:34 2024
Flashback Restore Complete
Flashback Media Recovery Start
Thu Aug 08 20:10:34 2024
Setting recovery target incarnation to 2
 started logmerger process
Parallel Media Recovery started with 4 slaves
Flashback Media Recovery Log /oradata/fast_recovery_area/XFF/archivelog/2024_08_08/o1_mf_1_189448_mc8dzjxn_.arc
Thu Aug 08 20:10:35 2024
Identified End-Of-Redo (failover) for thread 1 sequence 189448 at SCN 0x7.f5c837f1
Incomplete Recovery applied until change 34188310501 time 08/08/2024 11:32:40
Flashback Media Recovery Complete
Setting recovery target incarnation to 3
Completed: flashback database to scn 34188310500

切换虚拟机库到standby 状态

SQL> alter database convert to physical standby;

Database altered.

SQL> select database_role from v$database;
select database_role from v$database
                          *
ERROR at line 1:
ORA-01507: database not mounted


SQL> alter database mount;
alter database mount
*
ERROR at line 1:
ORA-00750: database has been previously mounted and dismounted

SQL> shutdown immediate;
ORA-01507: database not mounted


ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area 2.5655E+10 bytes
Fixed Size                  2265224 bytes
Variable Size            3892318072 bytes
Database Buffers         2.1743E+10 bytes
Redo Buffers               16896000 bytes
Database mounted.
SQL>  select open_mode,database_role from v$database;

OPEN_MODE            DATABASE_ROLE
-------------------- ----------------
MOUNTED              PHYSICAL STANDBY
Thu Aug 08 20:10:46 2024
alter database convert to physical standby
ALTER DATABASE CONVERT TO PHYSICAL STANDBY (xff)
Flush standby redo logfile failed:1649
Clearing standby activation ID 832379854 (0x319d1bce)
The primary database controlfile was created using the
'MAXLOGFILES 16' clause.
There is space for up to 12 standby redo logfiles
Use the following SQL commands on the standby database to create
standby redo logfiles that match the primary database:
ALTER DATABASE ADD STANDBY LOGFILE 'srl1.f' SIZE 209715200;
ALTER DATABASE ADD STANDBY LOGFILE 'srl2.f' SIZE 209715200;
ALTER DATABASE ADD STANDBY LOGFILE 'srl3.f' SIZE 209715200;
ALTER DATABASE ADD STANDBY LOGFILE 'srl4.f' SIZE 209715200;
ALTER DATABASE ADD STANDBY LOGFILE 'srl5.f' SIZE 209715200;
Shutting down archive processes
Archiving is disabled
Completed: alter database convert to physical standby

开启mrp进程

SQL> alter database open read only;

Database altered.

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT  LOGFILE DISCONNECT FROM SESSION;

Database altered.