网关不通致使vip/lsnr资源异常

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:网关不通致使vip/lsnr资源异常

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

crs_stat显示节点1的listener和vip时断时续(一会online,一会offline)

rac1-> crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.devdb.db   application    ONLINE    ONLINE    rac1
ora....b1.inst application    ONLINE    ONLINE    rac1
ora....b2.inst application    ONLINE    ONLINE    rac2
ora....SM1.asm application    ONLINE    ONLINE    rac1
ora....C1.lsnr application    ONLINE    OFFLINE
ora.rac1.gsd   application    ONLINE    ONLINE    rac1
ora.rac1.ons   application    ONLINE    ONLINE    rac1
ora.rac1.vip   application    ONLINE    ONLINE    rac2
ora....SM2.asm application    ONLINE    ONLINE    rac2
ora....C2.lsnr application    ONLINE    OFFLINE
ora.rac2.gsd   application    ONLINE    ONLINE    rac2
ora.rac2.ons   application    ONLINE    ONLINE    rac2
ora.rac2.vip   application    ONLINE    ONLINE    rac1
rac1-> crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.devdb.db   application    ONLINE    ONLINE    rac1
ora....b1.inst application    ONLINE    ONLINE    rac1
ora....b2.inst application    ONLINE    ONLINE    rac2
ora....SM1.asm application    ONLINE    ONLINE    rac1
ora....C1.lsnr application    ONLINE    OFFLINE
ora.rac1.gsd   application    ONLINE    ONLINE    rac1
ora.rac1.ons   application    ONLINE    ONLINE    rac1
ora.rac1.vip   application    ONLINE    ONLINE    rac2
ora....SM2.asm application    ONLINE    ONLINE    rac2
ora....C2.lsnr application    ONLINE    ONLINE    rac2
ora.rac2.gsd   application    ONLINE    ONLINE    rac2
ora.rac2.ons   application    ONLINE    ONLINE    rac2
ora.rac2.vip   application    ONLINE    ONLINE    rac2

查看crsd.log日志

0Attempting to start `ora.rac1.vip` on member `rac2`
0Start of `ora.rac1.vip` on member `rac2` failed.
0startRunnable: setting CLI values
0Attempting to start `ora.rac1.vip` on member `rac1`
0Start of `ora.rac1.vip` on member `rac1` succeeded.
0startRunnable: setting CLI values
0Attempting to start `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1`
0Start of `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1` succeeded.
u_freem: mem passed is null
0CheckResource error for ora.rac1.vip error code = 1
0In stateChanged, ora.rac1.vip target is ONLINE
0ora.rac1.vip on rac1 went OFFLINE unexpectedly
0StopResource: setting CLI values
0Attempting to stop `ora.rac1.vip` on member `rac1`
0Stop of `ora.rac1.vip` on member `rac1` succeeded.
0ora.rac1.vip RESTART_COUNT=0 RESTART_ATTEMPTS=0
0ora.rac1.vip failed on rac1 relocating.
0StopResource: setting CLI values
0Attempting to stop `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1`
0Stop of `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1` succeeded.
0Attempting to start `ora.rac1.vip` on member `rac2`
0Start of `ora.rac1.vip` on member `rac2` failed.
0Attempting to start `ora.rac1.vip` on member `rac2`
0Start of `ora.rac1.vip` on member `rac2` succeeded.
0CRS-1002: Resource 'ora.rac1.vip' is already running on member 'rac2'

这里可以看出由于vip资源失败,致使lsnr资源也出现失败,紧接着又是启动vip,再启动lsnr。所以使得我们通过crs_stat -t观察资源情况时,看到这两个进程一直处于波动状态

分析ora.rac1.vip.log日志

[ora.rac1.vip]: clsrcexecut:env ORACLE_CONFIG_HOME=/u01/app/oracle/product/10.2.0/crs_1
[ora.rac1.vip]: clsrcexecut:cmd=/u01/app/oracle/product/10.2.0/crs_1/bin/racgeut -e
_USR_ORA_DEBUG=0 54 /u01/app/oracle/product/10.2.0/crs_1/bin/racgvip check rac1
[ora.rac1.vip]: clsrcexecut: rc = 1, time = 6.430s
[ora.rac1.vip]: end for resource = ora.rac1.vip, action=check,status=1,time=6.450s
[ora.rac1.vip]: ping to 192.168.1.1 via eth0 failed, rc = 1 (host=rac1)
ping to 192.168.1.1 via eth0 failed, rc = 1 (host=rac1)
[ora.rac1.vip]: clsrcstartorp: Error with malloc
[ora.rac1.vip]: ping to 192.168.1.1 via eth0 failed, rc = 1 (host=rac1)
ping to 192.168.1.1 via eth0 failed, rc = 1 (host=rac1)
Interface eth0 checked failed (host=rac1)
Invalid parameters, or failed to bring up VIP (host=rac1)

通过这里发现:从eth0网卡ping192.168.1.1(网关)不通,导致VIP资源不能正常工作

核实问题原因/解决
我们人工从节点1上ping 网关(192.168.1.1),果真不通.继续检查发现,网关服务器上意外的开启了防火墙,对部分进来的包进行了过滤,恰好节点1在被禁止之列,使得节点1 ping 网关不成功,从而出现该了该错误.关闭防火墙或者重新设置规则后,rac工作正常,未出现vip和lsnr资源出现波动情况.

OCR/Vote disk 维护操作

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:OCR/Vote disk 维护操作

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库版本

SQL>  select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - Prod
PL/SQL Release 10.2.0.5.0 - Production
CORE    10.2.0.5.0      Production
TNS for Linux: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production

ocr测试(可以online处理)

rac2-> ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     160396
         Used space (kbytes)      :       4376
         Available space (kbytes) :     156020
         ID                       : 1302494786
         Device/File Name         : /dev/raw/raw11
                                    Device/File integrity check succeeded
                                    Device/File not configured
         Cluster registry integrity check succeeded
rac2-> more /etc/oracle/ocr.loc
ocrconfig_loc=/dev/raw/raw11
local_only=false
--增加ocr镜像
[root@rac2 bin]# ./ocrconfig -replace ocrmirror /dev/raw/raw12
rac2-> ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     160396
         Used space (kbytes)      :       4376
         Available space (kbytes) :     156020
         ID                       : 1302494786
         Device/File Name         : /dev/raw/raw11
                                    Device/File integrity check succeeded
         Device/File Name         : /dev/raw/raw12
                                    Device/File integrity check succeeded
         Cluster registry integrity check succeeded
rac2-> more /etc/oracle/ocr.loc
#Device/file  getting replaced by device /dev/raw/raw12
ocrconfig_loc=/dev/raw/raw11
ocrmirrorconfig_loc=/dev/raw/raw12
local_only=false
--删除ocr
[root@rac2 bin]# ./ocrconfig -replace ocr
rac2-> more /etc/oracle/ocr.loc
#Device/file /dev/raw/raw11 being deleted
ocrconfig_loc=/dev/raw/raw12
local_only=false
rac2-> ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     160396
         Used space (kbytes)      :       4376
         Available space (kbytes) :     156020
         ID                       : 1302494786
         Device/File Name         : /dev/raw/raw12
                                    Device/File integrity check succeeded
                                    Device/File not configured
         Cluster registry integrity check succeeded
--补充删除ocr镜像
[root@rac2 bin]# ./ocrconfig -replace ocrmirror

Vote disk测试(10g offline/11g online)

--关闭crs
[root@rac2 bin]# ./crsctl stop crs
[root@rac1 bin]# ./crsctl stop crs
--查询vote disk
rac2-> crsctl query css votedisk
 0.     0    /dev/raw/raw31
--增加vote disk
[root@rac2 bin]# ./crsctl add css votedisk /dev/raw/raw23 -force
Now formatting voting disk: /dev/raw/raw23
successful addition of votedisk /dev/raw/raw23.
[root@rac2 bin]# ./crsctl add css votedisk /dev/raw/raw33 -force
Now formatting voting disk: /dev/raw/raw33
successful addition of votedisk /dev/raw/raw33.
[root@rac2 bin]# ./crsctl add css votedisk /dev/raw/raw32 -force
Now formatting voting disk: /dev/raw/raw32
successful addition of votedisk /dev/raw/raw32.
rac2-> crsctl query css votedisk
 0.     0    /dev/raw/raw31
 1.     0    /dev/raw/raw23
 2.     0    /dev/raw/raw33
 3.     0    /dev/raw/raw32
located 4 votedisk(s).
--删除vote disk
[root@rac2 bin]# ./crsctl delete css votedisk /dev/raw/raw33 -force
successful deletion of votedisk /dev/raw/raw33.
--启动crs
[root@rac2 bin]# ./crsctl start crs
[root@rac1 bin]# ./crsctl start crs

补充官方操作说明[ID 428681.1]
http://www.xifenfei.com/wp-content/uploads/2012/04/OCR_Vote_disk_Maintenance_Operations.pdf

RAC 10g升级到10.2.0.5

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:RAC 10g升级到10.2.0.5

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

1.Back Up database
一般情况下rman备份

2.备份ocr和vote disk

[root@rac2 bin]# ./ocrconfig -export /tmp/ocr_export.bak
[root@rac2 bin]# more /etc/oracle/ocr.loc
ocrconfig_loc=/dev/raw/raw11
local_only=FALSE
[root@rac2 bin]# dd if=/dev/raw/raw11 of=/tmp/ocr_dd.bak
[root@rac2 bin]# dd if=/dev/raw/raw31 of=/tmp/vote_dd.bak

3.Update Oracle Time Zone Definitions
Actions for the DSTv4 update in the 10.2.0.5 patchset [ID 1086400.1]

4.Stopping All Processes
滚动升级关闭一个节点所有进程,非滚动升级关闭所有进程

$ isqlplusctl stop
$ emctl stop dbconsole
$ srvctl stop service -d db_name [-s service_name_list [-i inst_name]]
$ srvctl stop instance -d db_name -i inst_name
$ srvctl stop asm -n node
$ srvctl stop listener -n node [-l listenername]
$ srvctl stop nodeapps -n node
# CRS_home/bin/crsctl stop crs(root执行,滚动升级不需要关闭)

5.Back Up the System
$ORACLE_BASE中文件,主要包括(db和crs安装文件/oraInventory文件)

6.升级crs软件
执行./runInstaller选择crs目录

执行下面命令
# CRS_home/bin/crsctl stop crs
# CRS_home/install/root102.sh

7.升级db软件
关闭crs和db所有进程(步骤同4)
执行./runInstaller选择db目录

执行下面命令
# ORACLE_HOME/root.sh

8.升级数据库
8.1)检查数据库升级需要满足条件,对存在不合适之处,进行修正
How to Download and Run Oracle’s Database Pre-Upgrade Utility [ID 884522.1]

SQL> STARTUP UPGRADE
SQL> SPOOL upgrade_info.log
SQL> @/rdbms/admin/utlu102i.sql
SQL> SPOOL OFF
SQL> ALTER SYSTEM SET CLUSTER_DATABASE=FALSE SCOPE=spfile;
--其他根据upgrade_info.log中提示修改
SQL> SHUTDOWN IMMEDIATE
SQL> STARTUP UPGRADE

8.2)启动监听
srvctl start listener -n node

8.3)升级数据库

SQL> SPOOL patch.log
SQL> @?/rdbms/admin/catupgrd.sql
--检查patch.log,发现有错误查找原因,重新执行catupgrd.sql脚本
SQL> SPOOL OFF
SQL> SHUTDOWN IMMEDIATE
SQL> STARTUP
SQL> @?/rdbms/admin/utlrp.sql
SQL> ALTER SYSTEM SET CLUSTER_DATABASE=TRUE SCOPE=spfile;
--包括其他修改调整参数
SQL> SHUTDOWN IMMEDIATE
--使用rac管理相关命令,启动需要启动资源

9.修改相关目录权限
# ORACLE_HOME/install/changePerm.sh

具体操作步骤请阅读README.html

在RAC中lsnrctl和srvctl操作监听区别

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:在RAC中lsnrctl和srvctl操作监听区别

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

朋友今天询问了一个问题RAC中使用srvctl 操作监听和lsnrctl 操作监听结果不一样,下面我通过实验说明问题
0.listener.ora文件内容

LISTENER_RAC1 =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac1-vip)(PORT = 1521)(IP = FIRST))
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.11)(PORT = 1521)(IP = FIRST))
    )
  )
SID_LIST_LISTENER_RAC1 =
  (SID_LIST =
    (SID_DESC =
      (SID_NAME = PLSExtProc)
      (ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1)
      (PROGRAM = extproc)
    )
  )

1.srvctl 启动监听

rac1-> srvctl start listener -n rac1
rac1-> lsnrctl status
LSNRCTL for Linux: Version 10.2.0.1.0 - Production on 11-MAR-2012 22:09:34
Copyright (c) 1991, 2005, Oracle.  All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_RAC1
Version                   TNSLSNR for Linux: Version 10.2.0.1.0 - Production
Start Date                11-MAR-2012 22:07:21
Uptime                    0 days 0 hr. 2 min. 13 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/oracle/product/10.2.0/db_1/network/admin/listener.ora
Listener Log File         /u01/app/oracle/product/10.2.0/db_1/network/log/listener_rac1.log
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.21)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.11)(PORT=1521)))
Services Summary...
Service "PLSExtProc" has 1 instance(s).
  Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully
rac1-> crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.....XFF.cs application    ONLINE    ONLINE    rac1
ora....db1.srv application    ONLINE    ONLINE    rac2
ora.devdb.db   application    ONLINE    ONLINE    rac2
ora....b1.inst application    ONLINE    ONLINE    rac1
ora....b2.inst application    ONLINE    ONLINE    rac2
ora....SM1.asm application    ONLINE    ONLINE    rac1
ora....C1.lsnr application    ONLINE    ONLINE    rac1
ora.rac1.gsd   application    ONLINE    ONLINE    rac1
ora.rac1.ons   application    ONLINE    ONLINE    rac1
ora.rac1.vip   application    ONLINE    ONLINE    rac1
ora....SM2.asm application    ONLINE    ONLINE    rac2
ora....C2.lsnr application    ONLINE    ONLINE    rac2
ora.rac2.gsd   application    ONLINE    ONLINE    rac2
ora.rac2.ons   application    ONLINE    ONLINE    rac2
ora.rac2.vip   application    ONLINE    ONLINE    rac2

srvctl操作监听,自动反馈到crs中

2.使用srvctl关闭监听

rac1-> srvctl stop listener -n rac1
rac1-> crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.....XFF.cs application    ONLINE    ONLINE    rac1
ora....db1.srv application    ONLINE    ONLINE    rac2
ora.devdb.db   application    ONLINE    ONLINE    rac2
ora....b1.inst application    ONLINE    ONLINE    rac1
ora....b2.inst application    ONLINE    ONLINE    rac2
ora....SM1.asm application    ONLINE    ONLINE    rac1
ora....C1.lsnr application    OFFLINE   OFFLINE
ora.rac1.gsd   application    ONLINE    ONLINE    rac1
ora.rac1.ons   application    ONLINE    ONLINE    rac1
ora.rac1.vip   application    ONLINE    ONLINE    rac1
ora....SM2.asm application    ONLINE    ONLINE    rac2
ora....C2.lsnr application    ONLINE    ONLINE    rac2
ora.rac2.gsd   application    ONLINE    ONLINE    rac2
ora.rac2.ons   application    ONLINE    ONLINE    rac2
ora.rac2.vip   application    ONLINE    ONLINE    rac2

3.使用lsnrctl查看监听状态

rac1-> lsnrctl status
LSNRCTL for Linux: Version 10.2.0.1.0 - Production on 11-MAR-2012 22:15:54
Copyright (c) 1991, 2005, Oracle.  All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521)) <--host为空
TNS-12541: TNS:no listener
 TNS-12560: TNS:protocol adapter error
  TNS-00511: No listener
   Linux Error: 111: Connection refused
rac1-> lsnrctl
LSNRCTL for Linux: Version 10.2.0.1.0 - Production on 11-MAR-2012 22:16:55
Copyright (c) 1991, 2005, Oracle.  All rights reserved.
Welcome to LSNRCTL, type "help" for information.
LSNRCTL> status listener_rac1
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=rac1-vip)(PORT=1521)(IP=FIRST)))
TNS-12541: TNS:no listener
 TNS-12560: TNS:protocol adapter error
  TNS-00511: No listener
   Linux Error: 111: Connection refused
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.11)(PORT=1521)(IP=FIRST)))
TNS-12541: TNS:no listener
 TNS-12560: TNS:protocol adapter error
  TNS-00511: No listener
   Linux Error: 111: Connection refused

这里可以发现问题:
1)如果当前没有监听在运行,使用lsnrctl status的时候,会去检查默认的监听名称为listener的监听,如果该监听不存在不会使用hostname填充到hostname项中(注意下面的启动默认监听过程)
2)lsnrctl查看指定监听为listener_rac1,发现和listener.ora中配置相同

4.lsnrctl 关闭监听

rac1-> srvctl start listener -n rac1
rac1-> lsnrctl stop
LSNRCTL for Linux: Version 10.2.0.1.0 - Production on 11-MAR-2012 22:43:14
Copyright (c) 1991, 2005, Oracle.  All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521)) <--host为空
The command completed successfully
rac1-> crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.....XFF.cs application    ONLINE    ONLINE    rac1
ora....db1.srv application    ONLINE    ONLINE    rac2
ora.devdb.db   application    ONLINE    ONLINE    rac2
ora....b1.inst application    ONLINE    ONLINE    rac1
ora....b2.inst application    ONLINE    ONLINE    rac2
ora....SM1.asm application    ONLINE    ONLINE    rac1
ora....C1.lsnr application    OFFLINE   OFFLINE
ora.rac1.gsd   application    ONLINE    ONLINE    rac1
ora.rac1.ons   application    ONLINE    ONLINE    rac1
ora.rac1.vip   application    ONLINE    ONLINE    rac1
ora....SM2.asm application    ONLINE    ONLINE    rac2
ora....C2.lsnr application    ONLINE    ONLINE    rac2
ora.rac2.gsd   application    ONLINE    ONLINE    rac2
ora.rac2.ons   application    ONLINE    ONLINE    rac2
ora.rac2.vip   application    ONLINE    ONLINE    rac2

这里可以说明问题:
1)lsnrctl stop虽然是要停止掉默认监听,但是也会停止掉非默认监听
2)lsnrctl stop如果默认监听不存在,那么注册host也为空

5.使用lsnrctl启动默认监听

rac1-> lsnrctl start
LSNRCTL for Linux: Version 10.2.0.1.0 - Production on 11-MAR-2012 22:17:37
Copyright (c) 1991, 2005, Oracle.  All rights reserved.
Starting /u01/app/oracle/product/10.2.0/db_1/bin/tnslsnr: please wait...
TNSLSNR for Linux: Version 10.2.0.1.0 - Production
System parameter file is /u01/app/oracle/product/10.2.0/db_1/network/admin/listener.ora
Log messages written to /u01/app/oracle/product/10.2.0/db_1/network/log/listener.log
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rac1)(PORT=1521)))
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 10.2.0.1.0 - Production
Start Date                11-MAR-2012 22:17:37
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/oracle/product/10.2.0/db_1/network/admin/listener.ora
Listener Log File         /u01/app/oracle/product/10.2.0/db_1/network/log/listener.log
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rac1)(PORT=1521))) <--主机名
The listener supports no services
The command completed successfully
rac1-> crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.....XFF.cs application    ONLINE    ONLINE    rac1
ora....db1.srv application    ONLINE    ONLINE    rac2
ora.devdb.db   application    ONLINE    ONLINE    rac2
ora....b1.inst application    ONLINE    ONLINE    rac1
ora....b2.inst application    ONLINE    ONLINE    rac2
ora....SM1.asm application    ONLINE    ONLINE    rac1
ora....C1.lsnr application    OFFLINE   OFFLINE
ora.rac1.gsd   application    ONLINE    ONLINE    rac1
ora.rac1.ons   application    ONLINE    ONLINE    rac1
ora.rac1.vip   application    ONLINE    ONLINE    rac1
ora....SM2.asm application    ONLINE    ONLINE    rac2
ora....C2.lsnr application    ONLINE    ONLINE    rac2
ora.rac2.gsd   application    ONLINE    ONLINE    rac2
ora.rac2.ons   application    ONLINE    ONLINE    rac2
ora.rac2.vip   application    ONLINE    ONLINE    rac2

这里发现问题:
1)监听的ip只有主机名的一个,和srvctl启动的监听不一样
2)虽然监听启动了,crs中依然显示为offline状态

6.使用lsnrctl启动listener_rac1监听

LSNRCTL> start listener_rac1
Starting /u01/app/oracle/product/10.2.0/db_1/bin/tnslsnr: please wait...
TNSLSNR for Linux: Version 10.2.0.1.0 - Production
System parameter file is /u01/app/oracle/product/10.2.0/db_1/network/admin/listener.ora
Log messages written to /u01/app/oracle/product/10.2.0/db_1/network/log/listener_rac1.log
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.21)(PORT=1521)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.11)(PORT=1521)))
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=rac1-vip)(PORT=1521)(IP=FIRST)))
STATUS of the LISTENER
------------------------
Alias                     listener_rac1
Version                   TNSLSNR for Linux: Version 10.2.0.1.0 - Production
Start Date                11-MAR-2012 22:19:04
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/oracle/product/10.2.0/db_1/network/admin/listener.ora
Listener Log File         /u01/app/oracle/product/10.2.0/db_1/network/log/listener_rac1.log
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.21)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.11)(PORT=1521)))
Services Summary...
Service "PLSExtProc" has 1 instance(s).
  Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully
rac1-> crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.....XFF.cs application    ONLINE    ONLINE    rac1
ora....db1.srv application    ONLINE    ONLINE    rac2
ora.devdb.db   application    ONLINE    ONLINE    rac2
ora....b1.inst application    ONLINE    ONLINE    rac1
ora....b2.inst application    ONLINE    ONLINE    rac2
ora....SM1.asm application    ONLINE    ONLINE    rac1
ora....C1.lsnr application    ONLINE    ONLINE    rac1
ora.rac1.gsd   application    ONLINE    ONLINE    rac1
ora.rac1.ons   application    ONLINE    ONLINE    rac1
ora.rac1.vip   application    ONLINE    ONLINE    rac1
ora....SM2.asm application    ONLINE    ONLINE    rac2
ora....C2.lsnr application    ONLINE    ONLINE    rac2
ora.rac2.gsd   application    ONLINE    ONLINE    rac2
ora.rac2.ons   application    ONLINE    ONLINE    rac2
ora.rac2.vip   application    ONLINE    ONLINE    rac2

这里可以说明两个问题:
1)使用lsnrctl启动监听和srvctl启动一样
2)启动listener_rac1后,crs中监听资源变成online

7.问题原因分析

rac1-> srvctl config listener -n rac1
rac1 LISTENER_RAC1

通过这里可以发现,其实srvctl操作的监听就是LISTENER_RAC1,所以当我使用lsnrctl 操作LISTENER_RAC1监听时候crs会自动offline或者online,而lsnrctl 操作默认监听时crs不会online

在RAC中expdp 修改Service_Name

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:在RAC中expdp 修改Service_Name

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

检查数据库日志文件,发现在执行expdp操作前后都有修改service_names操作
1.数据库版本信息

SQL>  select instance_name from v$instance;
INSTANCE_NAME
----------------
ora9i2
SQL>  select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux IA64: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production
spfile文件中,无service_names配置,证明都是在修改MEMORY中。

2.alert日志内容

Thu Jan  5 01:10:06 2012
The value (30) of MAXTRANS parameter ignored.
Thu Jan  5 01:10:09 2012
ALTER SYSTEM SET service_names='ora9i','SYS$SYS.KUPC$C_2_20120105011007.ORA9I' SCOPE=MEMORY SID='ora9i2';
Thu Jan  5 01:10:09 2012
ALTER SYSTEM SET service_names='SYS$SYS.KUPC$C_2_20120105011007.ORA9I','ora9i','SYS$SYS.KUPC$S_2_20120105011007.ORA9I' SCOPE=MEMORY SID='ora9i2';
kupprdp: master process DM00 started with pid=305, OS id=9526
         to execute - SYS.KUPM$MCP.MAIN('SYS_EXPORT_TABLE_05', 'VAS', 'KUPC$C_2_20120105011007', 'KUPC$S_2_20120105011007', 0);
kupprdp: worker process DW01 started with worker id=1, pid=307, OS id=9641
         to execute - SYS.KUPW$WORKER.MAIN('SYS_EXPORT_TABLE_05', 'VAS');
kupprdp: worker process DW02 started with worker id=2, pid=308, OS id=9964
         to execute - SYS.KUPW$WORKER.MAIN('SYS_EXPORT_TABLE_05', 'VAS');
kupprdp: worker process DW03 started with worker id=3, pid=309, OS id=9966
         to execute - SYS.KUPW$WORKER.MAIN('SYS_EXPORT_TABLE_05', 'VAS');
kupprdp: worker process DW04 started with worker id=4, pid=310, OS id=9968
         to execute - SYS.KUPW$WORKER.MAIN('SYS_EXPORT_TABLE_05', 'VAS');
Thu Jan  5 01:13:15 2012
ALTER SYSTEM SET service_names='SYS$SYS.KUPC$S_2_20120105011007.ORA9I','ora9i' SCOPE=MEMORY SID='ora9i2';
Thu Jan  5 01:13:16 2012
ALTER SYSTEM SET service_names='ora9i' SCOPE=MEMORY SID='ora9i2';

3.MOS解决信息[ID 1269319.1]

Depending on the version of your database, Patch:8513146 may exist.
As of Nov. 25th 2010, this patch exists for:
- 10.2.0.4 / IBM AIX on POWER Systems (64-bit)
- 10.2.0.4.3 / Linux x86-64
- 10.2.0.5 / Linux x86 and Linux x86-64

RAC中关于"Immediate Kill Session#" bug记录

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:RAC中关于"Immediate Kill Session#" bug记录

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

今天在rac的一个节点上发现很多Immediate Kill Session#的错误,分析记录如下
1.alert日志内容

Sun Jan  1 02:12:28 2012
ALTER SYSTEM SET service_names='' SCOPE=MEMORY SID='ora9i1';
Sun Jan  1 02:12:28 2012
Immediate Kill Session#: 496, Serial#: 51199
Immediate Kill Session: sess: 0x406bfa26b78  OS pid: 12900
Immediate Kill Session#: 497, Serial#: 38504
Immediate Kill Session: sess: 0x406bfa280e0  OS pid: 12496
Immediate Kill Session#: 499, Serial#: 45296
Immediate Kill Session: sess: 0x406bfa2abb0  OS pid: 12467
Immediate Kill Session#: 502, Serial#: 18910
Immediate Kill Session: sess: 0x406bfa2ebe8  OS pid: 28887
Immediate Kill Session#: 503, Serial#: 26631
Immediate Kill Session: sess: 0x406bfa30150  OS pid: 20749
Immediate Kill Session#: 508, Serial#: 63586
Immediate Kill Session: sess: 0x406bfa36c58  OS pid: 27614
Immediate Kill Session#: 512, Serial#: 43388
Immediate Kill Session: sess: 0x406bfa3c1f8  OS pid: 4021
Immediate Kill Session#: 516, Serial#: 33975
Immediate Kill Session: sess: 0x406bfa41798  OS pid: 18481
Immediate Kill Session#: 517, Serial#: 24240
Immediate Kill Session: sess: 0x406bfa42d00  OS pid: 823
Immediate Kill Session#: 526, Serial#: 59767
Immediate Kill Session: sess: 0x406bfa4eda8  OS pid: 12529
Immediate Kill Session#: 527, Serial#: 45765
Immediate Kill Session: sess: 0x406bfa50310  OS pid: 6059
……………………
Sun Jan  1 02:22:29 2012
ALTER SYSTEM SET service_names='ora9i' SCOPE=MEMORY SID='ora9i1';

2.数据库配置
2.1)A节点相关配置

SQL> select instance_name from v$instance;
INSTANCE_NAME
----------------
ora9i1
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux IA64: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production
SQL> show parameter name;
NAME                                 TYPE       VALUE
------------------------------------ ---------- --------------------
db_file_name_convert                 string
db_name                              string     ora9i
db_unique_name                       string     ora9i
global_names                         boolean    FALSE
instance_name                        string     ora9i1
lock_name_space                      string
log_file_name_convert                string
service_names                        string     ora9i

2.2)B节点相关配置

SQL>  select instance_name from v$instance;
INSTANCE_NAME
----------------
ora9i2
SQL>  select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux IA64: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production
SQL> show parameter name;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_file_name_convert                 string
db_name                              string      ora9i
db_unique_name                       string      ora9i
global_names                         boolean     FALSE
instance_name                        string      ora9i2
lock_name_space                      string
log_file_name_convert                string
service_names                        string      SYS$SYS.KUPC$C_2_2012010601100
                                                 6.ORA9I, ora9i, SYS$SYS.KUPC$S
                                                 _2_20120106011006.ORA9I

3.查看MOS,寻找解决方案
3.1)产生该问题原因

This is caused by unpublished Bug 6955040 ALL THE SESSIONS LOST CONNECTION AFTER KILLING CRSD.BIN.
The problem is when CRSD is killed or crashed and restarted,
CRSD will run resource check action but CRS resource status will not be available at that time.
Then in instance check action,
it fails to get the preferred node VIP resource status and considered the preferred node VIP resource is not running.
Therefore, instance check action will remove the default database service name
and disconnect sessions connected using default database service name.
This causes messages "ALTER SYSTEM" and "Immediate Kill Session" printed in alert log.

3.2)解决方案

1) The fix is included in 10.2.0.5 patchset and 11.1.0.7 patchset.
    Apply the patchset once they are available.
OR
2) Configure a service name other than the default one (same as db_name),
and get user to use the non-default service name for connection.

rac中的spfile探讨

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:rac中的spfile探讨

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

今天朋友的的rac,因为被同事做数据库升级,分别在两个节点的本地创建了spfile,然后使用这个spfile启动了数据库,因为他不是非常懂oracle,所以向我求救,我改他的建议是:
1、利用备份的原来的pfile文件创建在asm中的spfile,规则是:+ASM/SID/spfileSID
2、dbs目录下创建一个本地的initsid.ora,然后在里面加一个spfile=’+ASM pfile path’(两个节点同样操作,注意sid不同)
3、分别重启数据库
出现该问题的原因分析:
做数据库升级的朋友的同事也不懂rac的spfile的相关规则,应该是在重启数据库的时候,提示spfile不存在,然后自己手工创建利用pfile创建的spfile到dbs下面,然后朋友十一后检测数据库,发现spfile都放置在本地了。
1、通常读取参数文件顺序
我们知道,如果不指定参数文件,oracle是按照这个顺序查找文件来启动数据库的:
spfileSID.ora
spfile.ora
initSID.ora
init.ora
如果这些文件都没有找到,启动会失败。

2、RAC中关于spfile的启动探讨

[rac@cent1 dbs]$ echo $ORACLE_SID
RACDB1
[rac@cent1 dbs]$ touch spfileRACDB1.ora  <==手工创建一个空白的spfile
[rac@cent1 dbs]$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.4.0 - Production on Thu Apr 29 13:45:50 2010
Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORA-27091: unable to queue I/O  <== 用sqlplus启动数据库时会报错
ORA-27069: attempt to do I/O beyond the range of the file
Additional information: 1
Additional information: 1
SQL>
SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
[rac@cent1 dbs]$ crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora....B1.inst application    OFFLINE   OFFLINE
ora....B2.inst application    ONLINE    ONLINE    cent2
ora.RACDB.db   application    ONLINE    ONLINE    cent1
ora....SM1.asm application    ONLINE    ONLINE    cent1
ora....T1.lsnr application    ONLINE    ONLINE    cent1
ora.cent1.gsd  application    ONLINE    ONLINE    cent1
ora.cent1.ons  application    ONLINE    ONLINE    cent1
ora.cent1.vip  application    ONLINE    ONLINE    cent1
ora....SM2.asm application    ONLINE    ONLINE    cent2
ora....T2.lsnr application    ONLINE    ONLINE    cent2
ora.cent2.gsd  application    ONLINE    ONLINE    cent2
ora.cent2.ons  application    ONLINE    ONLINE    cent2
ora.cent2.vip  application    ONLINE    ONLINE    cent2
[rac@cent1 dbs]$ srvctl start instance -i racdb1 -d racdb  <== 用srvctl启动成功
[rac@cent1 dbs]$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.4.0 - Production on Thu Apr 29 13:47:25 2010
Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
SQL> select instance_name, status from v$instance;
INSTANCE_NAME    STATUS
---------------- ------------
RACDB1           OPEN
--说明srvctl不是用那个顺序去查找参数文件

3、查看srvctl读取spfile位置

[rac@cent1 dbs]$ srvctl config database -d racdb -a
cent1 RACDB1 /rac/product/10.2.0/db
cent2 RACDB2 /rac/product/10.2.0/db
DB_NAME: RACDB
ORACLE_HOME: /rac/product/10.2.0/db
SPFILE: +DATA/RACDB/spfileRACDB.ora
DOMAIN: WORLD
DB_ROLE: null
START_OPTIONS: null
POLICY:  AUTOMATIC
ENABLE FLAG: DB ENABLED

4、修改CRS中关于spfile位置

[rac@cent1 dbs]$ srvctl modify database -d racdb -p ' +DATA/RACDB/spfileRACDB1.ora'

RAC负载均衡配置

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:RAC负载均衡配置

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

1、客户端均衡(Client-Side LB)
工作原理:当客户端发起连接时,会从地址列表中随机选取一个,再使用随机算法把连接请求分散到各个实例。
存在缺点:
1.1)分配连接时没有考虑每个节点的真实负载,最后分配不过不一定是平衡
1.2)随机算法需要长时间片,如果在短时间内同时发起多个连接,这些连接有可能被分配到一个节点上
1.3)有些情况下,连接可能被分配到故障节点上
配置方法:在tns中添加LOAD_BALANCE = YES条目
2、服务器端均衡(Server-Side LB)
工作原理:
2.1)该均衡实现是依赖于Listener收集的负载信息。在数据库运行过程中,PMON后台进程会收集数系统的负载信息,然后登记到Listener中。
2.2)PMON进程不仅会向本地的Listener注册,也会想其他节点上的Listener注册,但到底向何处注册,是由Remote_Listeners和Local_Listener这两个参数决定。Local_Listener不用设置,而Remote_Listeners需要设置,参数值有一个tnsnames项。
2.3)当收到客户端连接请求时,就会把连接转给负载最小的节点,这个节点可能是自己,也可能是其他节点,也就是Listener会转发客户端的连接请求。
配置方法:

SQL> show parameter listener;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
local_listener                       string
remote_listener                      string      LISTENERS_DEVDB
tnsnames.ora
LISTENERS_DEVDB =
  (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = TCP)(HOST = rac1-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = rac2-vip)(PORT = 1521))
  )
listener.ora(除掉SID_LIST_LISTENER_NAME项)
LISTENER_RAC1 =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac1-vip)(PORT = 1521)(IP = FIRST))
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.11)(PORT = 1521)(IP = FIRST))
    )
  )

3、两者联合使用
Server-Side LB和Client-Side LB不是互斥的,两者可以一起工作,这个时候客户端的连接请求会先从地址列表中随机选择一个地址,然后向该地址的Listener发送请求;Listener接到请求后,根据各个节点负载情况从中挑选出最合适的节点转发连接请求。

RAC Failover三种方式

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:RAC Failover三种方式

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

1、Client-Side Connect Time Failover
1.1)在用户端tnsname中配置了多个地址,用户发起连接请求时,会先尝试连接地址表中的第一个地址,如果这个连接尝试失败,则继续尝试使用第二个地址,直至连接成功或者遍历了所有的地址。
1.2)这种Failover的特点是:在建立连接那一时刻起作用,一旦连接建立之后,节点出现故障都不会作处理,从而客户端的表现就是会话断开,用户程序必须重新建立连接。
启用该方法:在客户端tnsname.ora中添加FAILOVER=ON条目,因为这个参数默认值就是为NO,所以即使客户端不加该条目,也有这种Failover功能。

XFF_F =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.21)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.22)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = devdb)
    )
  )

2、TAF(Transparent Application Failover)
2.1)在连接建立以后、应用系统运行过程中,如果某个实例发生故障,连接到这个实例上的用户会被自动迁移到其他的健康的实例上。对于应用程序而言,这个迁移过程是透明的,不需要用户的介入,当然在迁移过程中,未提交的事物会回滚。
2.2)与Client-Side Connect Time Failover比较起来,就是多了FAILOVER_MODE这一配置项,该配置项包含4个子项目
2.2.1)METHOD:可选值有BASIC和PRECONNECT
BASIC是指在感知到节点故障时才创建到其他实例的连接
PRECONNECT是在最初建立连接时就同时建立到所有实例的连接,当发生故障时,立刻就可以切换到其他链路上。
2.2.2)TYPE:可选值有SESSION和SELECT
两者的区别在于对select语句的处理,select表示如果发生故障迁移,正在执行的select语句将在新的节点上继续返回后续结果集;而session表示重新执行该select查询返回全部的结果。
2.2.3)DELAY表示重试间隔时间
2.2.4)RETRIES表示重试次数

XFF_T =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.21)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.22)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = devdb)
      (FAILOVER_MODE =
        (TYPE = SELECT)
        (METHOD = BASIC)
        (RETRIES = 180)
        (DELAY = 5)
      )
    )
  )

3、Server-Side TAF
3.1)Server-Side TAF具有TAF的所有特点
3.2)这种TAF是在服务器上配置,不需要在客户端进行相关配置,如果修改一个参数,不需要在所有的tns上修改,而只要修改服务器中的service即可
用户有两种角色可以选择
PREFERRED:首选实例,会优先选择拥有这个角色的实例提供服务
AVAILABLE:后备实例,当PREFERRED实例不可用时,才会转到AVAILABLE实例上

XFF_RAC =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.21)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.22)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = XFF)
    )
  )

集群服务启动与关闭(10g)

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:集群服务启动与关闭(10g)

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一、crs开启和关闭
关闭crs
/etc/init.d/init.crs stop
开启crs
/etc/init.d/init.crs start
二、启动和关闭所有的集群服务
关闭
./crs_stop -all
启动
./crs_start -all
三、分步操作crs服务
1、关闭集群
srvctl stop service -d -s
srvctl stop database -d
srvctl stop asm -n
srvctl stop asm -n
srvctl stop nodeapps -n
srvctl stop nodeapps -n
2、关闭集群
srvctl start nodeapps -n
srvctl start nodeapps -n
srvctl start asm -n
srvctl start asm -n
srvctl start database -d
srvctl start service -d -s
3、测试
3.1)关闭
srvctl stop service -d devdb -s XFF
srvctl stop instance -d devdb -i devdb1,devdb2 -o immediate
(srvctl stop database -d devdb -o immediate)
srvctl stop asm -n rac1
srvctl stop asm -n rac2
srvctl stop nodeapps -n rac1
srvctl stop nodeapps -n rac2
3.2)启动
srvctl start nodeapps -n rac1
srvctl start nodeapps -n rac2
srvctl start asm -n rac1
srvctl start asm -n rac2
srvctl start database -d devdb
(srvctl start instance -n devdb -i devdb1,devdb2)
srvctl start service -d devdb -s XFF