11gR2 GI打PSU ORACLE_HOME空间不足案例—需要20G以上空闲空间

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:11gR2 GI打PSU ORACLE_HOME空间不足案例—需要20G以上空闲空间

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

GI打auto打patch命令

#/u01/oracle/app/grid/OPatch/opatch auto /u01/soft/18706472 -ocmrf /u01/soft/ocm.rsp
Executing /u01/oracle/app/grid/perl/bin/perl /u01/oracle/app/grid/OPatch/crs/patch11203.pl
-patchdir /u01/soft -patchn 18706472 -ocmrf /u01/soft/ocm.rsp -paramfile /u01/oracle/app/grid/crs/install/crsconfig_params
This is the main log file: /u01/oracle/app/grid/cfgtoollogs/opatchauto2014-10-15_01-15-28.log
This file will show your detected configuration and all the steps that opatchauto attempted to do on your system:
/u01/oracle/app/grid/cfgtoollogs/opatchauto2014-10-15_01-15-28.report.log
2014-10-15 01:15:28: Starting Clusterware Patch Setup
Using configuration parameter file: /u01/oracle/app/grid/crs/install/crsconfig_params
Stopping CRS...
Stopped CRS successfully
patch /u01/soft/18706472/18522509  apply successful for home  /u01/oracle/app/grid
patch /u01/soft/18706472/18522515  apply failed  for home  /u01/oracle/app/grid
Starting CRS...
Installing Trace File Analyzer
CRS-4123: Oracle High Availability Services has been started.
opatch auto succeeded.

这里可以看到,打patch 18522509成功,后续patch都失败,而不是仅仅18522515失败

分析错误日志

Composite patch 18522509 successfully applied.
 OPatch Session completed with warnings.
 Log file location: /u01/oracle/app/grid/cfgtoollogs/opatch/opatch2014-10-15_01-19-00AM_1.log
 OPatch completed with warnings.
2014-10-15 01:21:44: patch /u01/soft/18706472/18522509  apply successful for home  /u01/oracle/app/grid
2014-10-15 01:21:44: Executing command /u01/oracle/app/grid/OPatch/opatch napply /u01/soft/18706472/18522515 -local -silent -ocmrf /
u01/soft/ocm.rsp -oh /u01/oracle/app/grid -invPtrLoc /u01/oracle/app/grid/oraInst.loc as grid
2014-10-15 01:21:44: Running as user grid: /u01/oracle/app/grid/OPatch/opatch napply /u01/soft/18706472/18522515 -local -silent -ocm
rf /u01/soft/ocm.rsp -oh /u01/oracle/app/grid -invPtrLoc /u01/oracle/app/grid/oraInst.loc
2014-10-15 01:21:44: s_run_as_user2: Running /bin/su grid -c ' /u01/oracle/app/grid/OPatch/opatch napply /u01/soft/18706472/18522515
 -local -silent -ocmrf /u01/soft/ocm.rsp -oh /u01/oracle/app/grid -invPtrLoc /u01/oracle/app/grid/oraInst.loc '
2014-10-15 01:21:49: Removing file /tmp/uaa7jC7eu
2014-10-15 01:21:49: Successfully removed file: /tmp/uaa7jC7eu
2014-10-15 01:21:49: /bin/su exited with rc=73
2014-10-15 01:21:49: status of apply patch is 18688
2014-10-15 01:21:49: The apply patch output is Oracle Interim Patch Installer version 11.2.0.3.6
 Copyright (c) 2013, Oracle Corporation.  All rights reserved.
 Oracle Home       : /u01/oracle/app/grid
 Central Inventory : /u01/oracle/app/oraInventory
    from           : /u01/oracle/app/grid/oraInst.loc
 OPatch version    : 11.2.0.3.6
 OUI version       : 11.2.0.4.0
 Log file location : /u01/oracle/app/grid/cfgtoollogs/opatch/opatch2014-10-15_01-21-44AM_1.log
 Verifying environment and performing prerequisite checks...
 Prerequisite check "CheckSystemSpace" failed.
 The details are:
 Required amount of space(24021.839MB) is not available.
 UtilSession failed:
 Prerequisite check "CheckSystemSpace" failed.
 Log file location: /u01/oracle/app/grid/cfgtoollogs/opatch/opatch2014-10-15_01-21-44AM_1.log
 OPatch failed with error code 73

这里可以看出来,patch 18522509 成功,但是18522515在check的时候失败,由于$ORACLE_HOME(GI)需要可用空间为24021.839MB,现在空闲空间不足

查看空间并腾空间

# df -g
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4          10.00      9.56    5%    12443     1% /
/dev/hd2          15.00     11.42   24%    60339     3% /usr
/dev/hd9var       10.00      9.91    1%     3310     1% /var
/dev/hd3          10.00      9.65    4%      364     1% /tmp
/dev/hd1          10.00     10.00    1%       32     1% /home
/dev/hd11admin      0.25      0.25    1%        5     1% /admin
/proc                 -         -    -         -     -  /proc
/dev/hd10opt      10.00      8.98   11%    12401     1% /opt
/dev/fslv00       50.00     12.94   75%    59137     2% /u01
/dev/odm           0.00      0.00   -1%        6   100% /dev/odm
/dev/vx/dsk/crs_dg/crs_vol      1.96      1.75   12%        5     1% /crsdata
/dev/vx/dsk/ora_dg/data01_vol   2048.00   2031.32    1%        4     1% /oradata/data01
/dev/vx/dsk/ora_dg/sys_vol    400.00    350.29   13%     3702     1% /oradata/sys
# cd /u01
# ls
lost+found  oracle      soft
# cd soft
# du -sg
17.34   .
# mv /u01/soft /oradata/sys/
# df -g
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4          10.00      9.56    5%    12443     1% /
/dev/hd2          15.00     11.42   24%    60339     3% /usr
/dev/hd9var       10.00      9.91    1%     3310     1% /var
/dev/hd3          10.00      9.65    4%      364     1% /tmp
/dev/hd1          10.00     10.00    1%       32     1% /home
/dev/hd11admin      0.25      0.25    1%        5     1% /admin
/proc                 -         -    -         -     -  /proc
/dev/hd10opt      10.00      8.98   11%    12401     1% /opt
/dev/fslv00       50.00     27.56   45%    59137     2% /u01
/dev/odm           0.00      0.00   -1%        6   100% /dev/odm
/dev/vx/dsk/crs_dg/crs_vol      1.96      1.75   12%        5     1% /crsdata
/dev/vx/dsk/ora_dg/data01_vol   2048.00   2031.32    1%        4     1% /oradata/data01
/dev/vx/dsk/ora_dg/sys_vol    400.00    333.54   14%     3702     1% /oradata/sys
# cd soft
# ls -ltr
total 12464202
drwxr-xr-x    5 grid     dba            1024 Jul 07 21:06 18706472
-rw-rw-r--    1 grid     dba            2123 Jul 15 20:06 PatchSearch.xml
-rw-r--r--    1 root     system   1801653734 Oct 14 14:39 p13390677_112040_aix64-5l_1of7.zip
-rw-r--r--    1 root     system   1170882875 Oct 14 14:40 p13390677_112040_aix64-5l_2of7.zip
-rw-r--r--    1 root     system   2127071138 Oct 14 14:41 p13390677_112040_aix64-5l_3of7.zip
-rw-r--r--    1 root     system   1242090126 Oct 14 14:43 p18706472_112040_AIX64-5L.zip
-rw-r--r--    1 root     system     34964928 Oct 14 14:43 p6880880_112000_AIX64-5L.zip
-rwxr-xr-x    1 root     system       127965 Oct 14 14:48 unzip_aix
-rw-r--r--    1 root     system      4871110 Oct 14 14:50 ONEOFF_112043_AIX64.zip
drwxr-xr-x    8 grid     dba            1024 Oct 14 14:55 grid
drwxr-xr-x    8 oracle   dba            1024 Oct 14 14:57 database
-rw-r--r--    1 grid     dba             621 Oct 15 00:11 ocm.rsp

确实/u01空间不足,通过把RAC安装文件迁移到其他目录,确保/u01有足够空间用来打patch

回退命令

# /u01/oracle/app/grid/OPatch/opatch auto /u01/soft/18706472 -rollback -ocmrf /u01/soft/ocm.rsp

重新打patch

#/u01/oracle/app/grid/OPatch/opatch auto /u01/soft/18706472 -ocmrf /u01/soft/ocm.rsp                 <
Executing /u01/oracle/app/grid/perl/bin/perl /u01/oracle/app/grid/OPatch/crs/patch11203.pl -patchdir /oradata/sys/soft
-patchn 18706472 -ocmrf /oradata/sys/soft/ocm.rsp -paramfile /u01/oracle/app/grid/crs/install/crsconfig_params
This is the main log file: /u01/oracle/app/grid/cfgtoollogs/opatchauto2014-10-15_01-57-42.log
This file will show your detected configuration and all the steps that opatchauto attempted to do on your system:
/u01/oracle/app/grid/cfgtoollogs/opatchauto2014-10-15_01-57-42.report.log
2014-10-15 01:57:42: Starting Clusterware Patch Setup
Using configuration parameter file: /u01/oracle/app/grid/crs/install/crsconfig_params
Stopping CRS...
Stopped CRS successfully
patch /oradata/sys/soft/18706472/18522509  apply successful for home  /u01/oracle/app/grid
patch /oradata/sys/soft/18706472/18522515  apply successful for home  /u01/oracle/app/grid
patch /oradata/sys/soft/18706472/18522514  apply successful for home  /u01/oracle/app/grid
Starting CRS...
Installing Trace File Analyzer
CRS-4123: Oracle High Availability Services has been started.
opatch auto succeeded.

检查确认patch成功

# su - grid
xifenfei1:/home/grid> opatch lspatches
18522514;ACFS PATCH SET UPDATE : 11.2.0.4.3 (18522514)
18522515;OCW Patch Set Update : 11.2.0.4.3 (18522515)
18522509;Database Patch Set Update : 11.2.0.4.3 (18522509)
xifenfei1:/home/grid> crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       xifenfei1
               ONLINE  ONLINE       xifenfei2
ora.asm
               OFFLINE OFFLINE      xifenfei1
               OFFLINE OFFLINE      xifenfei2
ora.gsd
               OFFLINE OFFLINE      xifenfei1
               OFFLINE OFFLINE      xifenfei2
ora.net1.network
               ONLINE  ONLINE       xifenfei1
               ONLINE  ONLINE       xifenfei2
ora.ons
               ONLINE  ONLINE       xifenfei1
               ONLINE  ONLINE       xifenfei2
ora.registry.acfs
               OFFLINE OFFLINE      xifenfei1
               OFFLINE OFFLINE      xifenfei2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xifenfei2
ora.cvu
      1        ONLINE  ONLINE       xifenfei2
ora.xifenfei1.vip
      1        ONLINE  ONLINE       xifenfei1
ora.xifenfei2.vip
      1        ONLINE  ONLINE       xifenfei2
ora.oc4j
      1        ONLINE  ONLINE       xifenfei2
ora.scan1.vip
      1        ONLINE  ONLINE       xifenfei2

ORACLE 12C RAC主要gc等待事件

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORACLE 12C RAC主要gc等待事件

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

ORACLE 12C RAC中有很多GC等待事件,这里重点介绍一些常见的GC等待事件,以便在以后遇到类似问题方便分析

RAC等待事件

在本节中我将讨论重要的RAC等待事件。这还不是全部等待事件的完整列表,只是一个最常见的等待事件的列表。

GC当前块2-Way/3-Way

当前块的等待事件意味着被传输的块的版本是块的最新版本。这个等待事件读取和写入活动中都可能遇到。如果该块被访问是以读取活动进行,那么该资源上锁以KJUSERPR(PR)模式获得。刚才我在“资源和锁定”一节中所讨论的示例展示了KJUSERPR模式的锁。
在下面的例子中,我从表t_one中查询一行,造成连接到节点2的磁盘读。查看SQL跟踪文件,没有全局缓存等待事件。原因是该块被本地掌控(本地实例是master),所以FG进程中可直接获取该资源上锁,而不会产生任何全局缓存等待。这种类型的锁也被称为亲和力锁定(相似度锁定)方案。动态掌握资源(DRM,Dynamic Resource Mastering)的部分将详细讨论相似度锁定(亲和力锁定)。
RS@ORCL2:2> @tc_one_row.sql
N1 FNO BLOCK OBJ V1
———- ———- ———- ———- ———-
100 4 180 75742 250
Trace file:
nam=’db file sequential read’ ela= 563 file#=4 block#=180 blocks=1 obj#=75742
我连接到实例1(注释5)查询同一行数据。由于该块已经缓存在实例2,因此该块从实例2传输到实例1。跟踪显示,等待事件gc current block 2-way由于file_id=4,BLOCK_ID=180块而被遇到了。这个块传输是一个两路的块传输,因为该资源的拥有实例和资源主实例(实例2)是相同的。
SYS@ORCL1:1> @tc_one_row.sql
N1 FNO BLOCK OBJ V1
———- ———- ———- ———- ———-
100 4 180 75742 250
Trace file:
nam=’gc current block 2-way’ ela= 629 p1=4 p2=180 p3=1 obj#=75742 tim=1350440823837572
接下来,我将创建一个3路等待事件的条件,但首先让我在所有三个实例上刷新缓冲区,开始用干净的缓冲区。在以下示例中:
1.我将连接到实例1查询(这将加载块到实例1的缓冲区高速缓存中)这行数据。
2.我将连接到一个实例,并从实例3查询同一行。我会话的FG进程将为这一数据块请求运行在实例2上的LMS进程(因为实例2是该资源的掌控者)。
3.实例2的LMS进程将请求转发给实例1的LMS进程。
4.实例1的LMS进程将发送块给运行在实例3上的FG进程。
从本质上讲,三个实例参与一个数据块的传输,因此,这是一个三路块传输。
–alter system flush buffer_cache; –在所有实例上
RS@ORCL1:1> @tc_one_row.sql
N1 FNO BLOCK OBJ V1
———- ———- ———- ———- ———-
100 4 180 75742 250
RS@ORCL3:3> @tc_one_row.sql
N1 FNO BLOCK OBJ V1
———- ———- ———- ———- ———-
100 4 180 75742 250
Trace file:
nam=’gc current block 3-way’ ela= 798 p1=4 p2=180 p3=1 obj#=75742
连接到实例2查看资源锁,我们看到在该资源上有两个锁被持,分别是实例1和实例3(owner_node等于0和2)。
RS@ORCL2:2> SELECT resource_name1, grant_level, state, owner_node
FROM v$ges_enqueue
WHERE resource_name1 LIKE ‘[0xb4][0x4],[BL]%’;
RESOURCE_NAME1 GRANT_LEV STATE OWNER_NODE
—————————— ——— ————- ———-
[0xb4][0x4],[BL][ext 0x0,0x0] KJUSERPR GRANTED 2
[0xb4][0x4],[BL][ext 0x0,0x0] KJUSERPR GRANTED 0
在gc current block 2-way or gc current block 3-way等待事件上的过多等待,通常要么是由于(a)一种低效的执行计划,导致了大量的块访问,或者(b)应用数据相似度(应用亲和力)没有被实施。如果对象访问本地化,考虑实施应用亲和力(应用数据的相似度)。此外,使用前面一节中讨论的技术“所有等待事件的通用分析”。

GC CR Block 2-Way/3-Way

CR模式的块传输发生在只读访问的请求中。考虑这样一个场景:一个块以CURRENT模式驻留在实例2上,实例2以独占模式保持了该资源的BL锁。另一个会话连接到实例1来请求该块,阻止。由于Oracle数据库中“其他人看不到未提交的更改”,SELECT语句请求一个查询开始时间的块的特定版本。SCN被用于标识块版本,本质上,SELECT语句请求的版本与块的SCN一致。LMS进程维护实例2的请求,以CURRENT的模式克隆该块到缓冲区,验证SCN版本与请求是一致的,然后发送该块的CR拷贝给FG进程。
这些CR模式传输和CURRENT模式传输之间的主要区别在于,在CR模式传输的情况下,在GRD中没有资源或锁来维护CR缓冲区。从本质上讲,CR模式块不需要全局缓存资源或锁。接收到的CR副本只能由提出请求的会话使用,并且只适用于这个特定的SQL执行中。这就是为什么Oracle数据库不对CR传输获取BL资源的任何锁。
由于没有全局缓存锁保护缓冲区,连接到实例1再次执行这个SQL语句访问那个块时将遭遇gc cr block 2-way 或者 gc cr block 3-way等待事件。
因此,每次从实例1访问该块都将触发新CR缓冲区的构造。即使在实例2的缓冲区没有发生该块修改,在实例1中的FG进程仍然会遭遇到CR等待事件。驻留在实例1的CR缓冲区是不能重复使用的,因为每SQL执行请求查询时的SCN会有所不同。
下面的跟踪显示了一个块从资源主实例以0.6毫秒的延迟被传输到请求实例。另外,file_id,BLOCK_ID,和跟踪文件中的object_id信息,可以被用来识别正在遭遇这两个等待事件的对象。当然,也可以通过查询ASH数据来识别该对象。
nam=’gc cr block 2-way’ ela= 627 p1=7 p2=6852 p3=1 obj#=76483 tim=37221074057
执行tc_one_row.sql五次,然后查询缓冲区头信息后,你可以看到在两个实例1和2有五个该块的CR缓冲区。请注意,CR_SCN_BAS和CR_SCN_WRP 列(注释6)显示了每个CR缓冲副本不同的值。查询GV$ ges_resource和GV$ ges_enqueue,你也可以看到有没有GC锁保护这些缓冲区。
清单10-10 缓冲区状态

SELECT

DECODE(state,0,’free’,1,’xcur’,2,’scur’,3,’cr’, 4,’read’,5,’mrec’,

6,’irec’,7,’write’,8,’pi’, 9,’memory’,10,’mwrite’,

11,’donated’, 12,’protected’, 13,’securefile’, 14,’siop’,

15,’recckpt’, 16, ‘flashfree’, 17, ‘flashcur’, 18, ‘flashna’) state,

mode_held, le_addr, dbarfil, dbablk, cr_scn_bas, cr_scn_wrp , class

FROM sys.x$bh

WHERE obj= &&obj

AND dbablk= &&block

AND state!=0 ;

Enter value for obj: 75742

Enter value for block: 180

STATE MODE_HELD LE DBARFIL DBABLK CR_SCN_BAS CR_SCN_WRP CLASS

———- ———- — ————– ———- ———- ———- ———-

cr 0 00 1 75742 649314930 3015 1

cr 0 00 1 75742 648947873 3015 1

cr 0 00 1 75742 648926281 3015 1

cr 0 00 1 75742 648810300 3015 1

cr 0 00 1 75742 1177328436 3013 1
CR缓冲区的创建是一个特例,它并不需要获取全局缓存锁来保护CR缓冲区。如果频繁访问的对象上有长时间未提交的事物就可能会出现CR风暴。因此,明智的做法是把更新大量表数据的批处理安排在一个不太繁忙的时间段。

GC CR Grant 2-Way/Gc Current Grant 2-Way

如果被请求的块没有驻留在任何缓冲区中,就会遭遇gc cr grant 2-way 和 gc current grant 2-way等待事件。FG进程向LMS进程请求一个块,但块没有驻留在任何缓冲区。因此,LMS进程回复一条授权FG进程从磁盘读取的块的消息。FG进程从磁盘读取该块并继续后续的处理。
下面一行显示了为了访问file_id=4和lock_id=180数据块,FG进程接收到来自LMS进程的授权响应。下一行显示了,执行了一个从磁盘读取块的物理读。
nam=’gc cr grant 2-way’ ela= 402 p1=4 p2=180 p3=1 obj#=75742
nam=’db file sequential read’ ela= 553 file#=4 block#=180 blocks=1 obj#=75742
过多的此类等待意味着,要么缓冲区高速缓存太小,要么SQL语句的执行过分的刷新了缓冲区高速缓存。识别正在遭遇此类等待事件的SQL语句和对象,并优化这些SQL语句。
DRM功能的设计就是为了减少发生这类授权相关的等待事件。

GC CR Block Busy/GC Current Block Busy

繁忙事件(Busy events)表明,LMS执行了额外的工作去处理并发相关的问题。例如,要建立一个CR块,LMS进程可能要应用撤消记录(undo records),重构一个与查询的SCN一致的块。在把该块回传给FG过程时,LMS将标记块传输是否遇到gc cr block busy或gc current block busy等待事件,这取决于块传输的类型。

GC CR Block Congested/GC Current Block Congested

如果LMS进程在接收到请求后没有在1毫秒内处理该请求,那么LMS进程标记这个响应为:该块正遭遇拥堵相关的等待事件。堵塞相关的等待事件有很多原因,比如说,LMS进程被大量全局高速缓存的请求所淹没。LMS进程正遭遇CPU的调度延迟,LMS进程已经遇到了另一种资源耗尽(如内存)等。
通常情况下,LMS进程运行在实时CPU调度优先级,因此,CPU调度的延迟将是最小的。大量这类的等待此事件表明出现了全局缓存请求的突然飙升,且LMS进程无法快速处理这些请求。服务器内存匮乏也可能导致LMS进程的分页,影响全局缓存的性能。
您可以去检查为什么LMS进程不能够有效地处理请求。

因asm sga_target设置不当导致11gr2 rac无法正常启动

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:因asm sga_target设置不当导致11gr2 rac无法正常启动

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

2014年第一个故障排查和解决:同事反馈给我说solaris 11.2 两节点rac无法启动,让我帮忙看下。通过分析是因为sga_target参数设置不合理导致asm无法正常启动
GI无法正常启动

grid@zwq-rpt1:~$crsctl status resource -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
grid@zwq-rpt1:~$crsctl status resource -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                               Instance Shutdown
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       zwq-rpt1
ora.crf
      1        ONLINE  ONLINE       zwq-rpt1
ora.crsd
      1        ONLINE  OFFLINE
ora.cssd
      1        ONLINE  ONLINE       zwq-rpt1
ora.cssdmonitor
      1        ONLINE  ONLINE       zwq-rpt1
ora.ctssd
      1        ONLINE  ONLINE       zwq-rpt1                 ACTIVE:0
ora.diskmon
      1        OFFLINE OFFLINE
ora.evmd
      1        ONLINE  INTERMEDIATE zwq-rpt1
ora.gipcd
      1        ONLINE  ONLINE       zwq-rpt1
ora.gpnpd
      1        ONLINE  ONLINE       zwq-rpt1
ora.mdnsd
      1        ONLINE  ONLINE       zwq-rpt1

asm未正常启动

GI日志报错

2014-01-01 00:40:47.708
[cssd(1418)]CRS-1605:CSSD voting file is online: /dev/rdsk/emcpower0a; details in /export/home/app/grid/log/zwq-rpt1/cssd/ocssd.log.
2014-01-01 00:40:53.234
[cssd(1418)]CRS-1601:CSSD Reconfiguration complete. Active nodes are zwq-rpt1 zwq-rpt2 .
2014-01-01 00:40:56.659
[ctssd(1483)]CRS-2407:The new Cluster Time Synchronization Service reference node is host zwq-rpt2.
2014-01-01 00:40:56.661
[ctssd(1483)]CRS-2401:The Cluster Time Synchronization Service started on host zwq-rpt1.
2014-01-01 00:41:02.016
[ctssd(1483)]CRS-2408:The clock on host zwq-rpt1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
2014-01-01 00:43:23.874
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".
2014-01-01 00:45:42.837
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".
2014-01-01 00:48:02.087
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".
2014-01-01 00:48:18.836
[ohasd(1083)]CRS-2807:Resource 'ora.asm' failed to start automatically.
2014-01-01 00:48:18.837
[ohasd(1083)]CRS-2807:Resource 'ora.crsd' failed to start automatically.
2014-01-01 01:05:15.396
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [CRSDG], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".
2014-01-01 01:05:45.101
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [CRSDG], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".
2014-01-01 01:06:15.104
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [CRSDG], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".

这里较为明显的看到,因为asm磁盘组异常导致ocr无法被访问导致crs无法正常启动

ORAAGENT日志

2014-01-01 00:43:23.870: [ora.asm][9] {0:0:2} [start] InstConnection::connectInt (2) Exception OCIException
2014-01-01 00:43:23.870: [ora.asm][9] {0:0:2} [start] InstConnection:connect:excp OCIException OCI error 604
2014-01-01 00:43:23.870: [ora.asm][9] {0:0:2} [start] DgpAgent::queryDgStatus excp ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^34f764db","kglHeapInitialize:temp")

报了较为清晰的ORA-4031错误,检查asm日志

ASM日志报错

Wed Jan 01 00:47:33 2014
ORACLE_BASE not set in environment. It is recommended
that ORACLE_BASE be set in the environment
Reusing ORACLE_BASE from an earlier startup = /export/home/app/oracle
Wed Jan 01 00:47:39 2014
Errors in file /export/home/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_1728.trc  (incident=291447):
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^34f764db","kglHeapInitialize:temp")
Incident details in: /export/home/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_291447/+ASM1_ora_1728_i291447.trc
Wed Jan 01 00:47:48 2014
Dumping diagnostic data in directory=[cdmp_20140101004748], requested by (instance=1, osid=1728), summary=[incident=291447].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Jan 01 00:47:53 2014
Errors in file /export/home/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_1730.trc  (incident=291448):
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^34f764db","kglHeapInitialize:temp")
Incident details in: /export/home/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_291448/+ASM1_ora_1730_i291448.trc
Wed Jan 01 00:48:01 2014
Dumping diagnostic data in directory=[cdmp_20140101004801], requested by (instance=1, osid=1730), summary=[incident=291448].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Jan 01 00:48:07 2014
Errors in file /export/home/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_1732.trc  (incident=291449):
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^34f764db","kglHeapInitialize:temp")
Incident details in: /export/home/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_291449/+ASM1_ora_1732_i291449.trc
Wed Jan 01 00:48:16 2014
Dumping diagnostic data in directory=[cdmp_20140101004816], requested by (instance=1, osid=1732), summary=[incident=291449].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Jan 01 00:48:16 2014
License high water mark = 1
USER (ospid: 1736): terminating the instance
Instance terminated by USER, pid = 1736

这里可以清晰的看到,因为shared pool不足,导致asm报ora-4031错误,从而使得asm无法正常启动

分析原因

Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options.
ORACLE_HOME = /export/home/app/grid
System name:	SunOS
Node name:	zwq-rpt1
Release:	5.11
Version:	11.1
Machine:	sun4v
Using parameter settings in server-side spfile +CRSDG/zwq-rpt-cluster/asmparameterfile/registry.253.823992831
System parameters with non-default values:
  sga_max_size             = 2G
  large_pool_size          = 16M
  instance_type            = "asm"
  sga_target               = 0
  remote_login_passwordfile= "EXCLUSIVE"
  asm_diskstring           = "/dev/rdsk/*"
  asm_diskgroups           = "FRADG"
  asm_diskgroups           = "DATADG"
  asm_power_limit          = 1
  diagnostic_dest          = "/export/home/app/oracle"

这里可以看到sga_target被设置为了0,而shared pool又未被配置,这里因为shared pool不足从而出现了ORA-4031,从而导致crs在启动asm的过程失败,从而使得ocr不能被访问,进而使得crs不能正常启动.

处理方法
1.编辑pfile

grid@zwq-rpt1:/export/home/app/oracle/diag/asm/+asm/+ASM1/trace$vi /tmp/asm.pfile
  memory_target = 2G
  large_pool_size          = 16M
  instance_type            = "asm"
  sga_target               = 0
  remote_login_passwordfile= "EXCLUSIVE"
  asm_diskstring           = "/dev/rdsk/*"
  asm_diskgroups           = "FRADG"
  asm_diskgroups           = "DATADG"
  asm_power_limit          = 1
  diagnostic_dest          = "/export/home/app/oracle"

2.启动asm

grid@zwq-rpt1:/export/home/app/oracle/diag/asm/+asm/+ASM1/trace$sqlplus / as sysasm
SQL*Plus: Release 11.2.0.3.0 Production on Wed Jan 1 01:04:10 2014
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup pfile='/tmp/asm.pfile'
ASM instance started
Total System Global Area 2138521600 bytes
Fixed Size                  2161024 bytes
Variable Size            2102806144 bytes
ASM Cache                  33554432 bytes
ASM diskgroups mounted

3. 创建spfile

SQL> create spfile='+CRSDG' FROM PFILE='/tmp/asm.pfile';
File created.
--asm alert日志
Wed Jan 01 01:08:59 2014
NOTE: updated gpnp profile ASM SPFILE to
NOTE: updated gpnp profile ASM diskstring: /dev/rdsk/*
NOTE: updated gpnp profile ASM diskstring: /dev/rdsk/*
NOTE: updated gpnp profile ASM SPFILE to +CRSDG/zwq-rpt-cluster/asmparameterfile/registry.253.835664939

4. 关闭asm

SQL> shutdown immediate
ORA-15097: cannot SHUTDOWN ASM instance with connected client (process 1971)
SQL> shutdown abort
ASM instance shutdown

5. 重启crs

root@zwq-rpt1:~# crsctl stop crs -f
root@zwq-rpt1:~# crsctl start crs

6. 重启其他节点crs

root@zwq-rpt2:~# crsctl stop crs -f
root@zwq-rpt2:~# crsctl start crs

7. 检查结果

root@zwq-rpt1:~# crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRSDG.dg
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
ora.DATADG.dg
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
ora.FRADG.dg
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
ora.LISTENER.lsnr
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
ora.asm
               ONLINE  ONLINE       zwq-rpt1                 Started
               ONLINE  ONLINE       zwq-rpt2                 Started
ora.gsd
               OFFLINE OFFLINE      zwq-rpt1
               OFFLINE OFFLINE      zwq-rpt2
ora.net1.network
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
ora.ons
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       zwq-rpt1
ora.cvu
      1        ONLINE  ONLINE       zwq-rpt1
ora.oc4j
      1        ONLINE  ONLINE       zwq-rpt1
ora.rptdb.db
      1        ONLINE  ONLINE       zwq-rpt1                 Open
      2        ONLINE  ONLINE       zwq-rpt2                 Open
ora.scan1.vip
      1        ONLINE  ONLINE       zwq-rpt1
ora.zwq-rpt1.vip
      1        ONLINE  ONLINE       zwq-rpt1
ora.zwq-rpt2.vip
      1        ONLINE  ONLINE       zwq-rpt2

至此恢复正常,2014年第一个故障顺利解决

ORACLE 12C RAC hub AND leaf 相互转换

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORACLE 12C RAC hub AND leaf 相互转换

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

感谢Lunar的指导,完成ORACLE 12C RAC hub和leaf相互转换,参考官方文档Oracle Flex Clusters部分
当前数据库状态

--集群状态
[root@rac1 ~]# crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.DATA.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.SYSDB_NEW.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.SYSDG.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.net1.network
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.ons
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.proxy_advm
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac1                     STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       rac2                     STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       rac2                     169.254.177.226 10.1
                                                             .1.104,STABLE
ora.asm
      1        ONLINE  ONLINE       rac1                     STABLE
      2        ONLINE  ONLINE       rac2                     STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       rac2                     STABLE
ora.gns
      1        ONLINE  ONLINE       rac2                     STABLE
ora.gns.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       rac2                     Open,STABLE
ora.oc4j
      1        ONLINE  ONLINE       rac2                     STABLE
ora.ora12c.db
      1        ONLINE  ONLINE       rac1                     Open,STABLE
      2        ONLINE  ONLINE       rac2                     Open,STABLE
ora.rac1.vip
      1        ONLINE  ONLINE       rac1                     STABLE
ora.rac2.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       rac1                     STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       rac2                     STABLE
--------------------------------------------------------------------------------
--rac运行在flex模式
[root@rac1 ~]#  crsctl get cluster mode status
Cluster is running in "flex" mode
--asm运行在flex模式
[grid@rac1 ~]$ asmcmd
ASMCMD> showclustermode
ASM cluster : Flex mode enabled
--节点角色
[root@rac1 ~]# crsctl get node role config
Node 'rac1' configured role is 'hub'
[root@rac2 ~]# crsctl get node role config
Node 'rac2' configured role is 'hub'

转换hub to leaf

--转换hub为leaf
[root@rac1 ~]# crsctl set node role leaf
CRS-4408: Node 'rac1' configured role successfully changed; restart Oracle High Availability Services for new role to take effect.
--关闭集群
[root@rac1 ~]# crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.crsd' on 'rac1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'rac1'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'rac1'
CRS-2673: Attempting to stop 'ora.SYSDB_NEW.dg' on 'rac1'
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'rac1'
CRS-2673: Attempting to stop 'ora.SYSDG.dg' on 'rac1'
CRS-2673: Attempting to stop 'ora.ora12c.db' on 'rac1'
CRS-2673: Attempting to stop 'ora.proxy_advm' on 'rac1'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.rac1.vip' on 'rac1'
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'rac1'
CRS-2677: Stop of 'ora.rac1.vip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.rac1.vip' on 'rac2'
CRS-2677: Stop of 'ora.ora12c.db' on 'rac1' succeeded
CRS-2677: Stop of 'ora.scan1.vip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.scan1.vip' on 'rac2'
CRS-2676: Start of 'ora.rac1.vip' on 'rac2' succeeded
CRS-2677: Stop of 'ora.SYSDB_NEW.dg' on 'rac1' succeeded
CRS-2676: Start of 'ora.scan1.vip' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN1.lsnr' on 'rac2'
CRS-2677: Stop of 'ora.DATA.dg' on 'rac1' succeeded
CRS-2677: Stop of 'ora.SYSDG.dg' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'rac1'
CRS-2677: Stop of 'ora.asm' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.ASMNET1LSNR_ASM.lsnr' on 'rac1'
CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'rac2' succeeded
CRS-2677: Stop of 'ora.proxy_advm' on 'rac1' succeeded
CRS-2677: Stop of 'ora.ASMNET1LSNR_ASM.lsnr' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.ons' on 'rac1'
CRS-2677: Stop of 'ora.ons' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'rac1'
CRS-2677: Stop of 'ora.net1.network' on 'rac1' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'rac1' has completed
CRS-2677: Stop of 'ora.crsd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.storage' on 'rac1'
CRS-2673: Attempting to stop 'ora.crf' on 'rac1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'rac1'
CRS-2677: Stop of 'ora.storage' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'rac1'
CRS-2677: Stop of 'ora.drivers.acfs' on 'rac1' succeeded
CRS-2677: Stop of 'ora.crf' on 'rac1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.asm' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'rac1'
CRS-2673: Attempting to stop 'ora.evmd' on 'rac1'
CRS-2677: Stop of 'ora.evmd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rac1'
CRS-2677: Stop of 'ora.cssd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1'
CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
--启动集群
[root@rac1 ~]# crsctl start crs -wait
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2672: Attempting to start 'ora.evmd' on 'rac1'
CRS-2676: Start of 'ora.evmd' on 'rac1' succeeded
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2789: Cannot stop resource 'ora.diskmon' as it is not running on server 'rac1'
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'rac1'
CRS-2676: Start of 'ora.storage' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'rac1'
CRS-2676: Start of 'ora.crf' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rac1'
CRS-2676: Start of 'ora.crsd' on 'rac1' succeeded
CRS-6017: Processing resource auto-start for servers: rac1
CRS-6016: Resource auto-start has completed for server rac1
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
--hub转换为leaf后状态
[root@rac1 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       rac2                     STABLE
ora.DATA.dg
               ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac2                     STABLE
ora.SYSDB_NEW.dg
               ONLINE  ONLINE       rac2                     STABLE
ora.SYSDG.dg
               ONLINE  ONLINE       rac2                     STABLE
ora.net1.network
               ONLINE  ONLINE       rac2                     STABLE
ora.ons
               ONLINE  ONLINE       rac2                     STABLE
ora.proxy_advm
               ONLINE  ONLINE       rac2                     STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       rac2                     STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       rac2                     169.254.177.226 10.1
                                                             .1.104,STABLE
ora.asm
      1        ONLINE  OFFLINE                               STABLE
      2        ONLINE  ONLINE       rac2                     STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       rac2                     STABLE
ora.gns
      1        ONLINE  ONLINE       rac2                     STABLE
ora.gns.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       rac2                     Open,STABLE
ora.oc4j
      1        ONLINE  ONLINE       rac2                     STABLE
ora.ora12c.db
      1        ONLINE  OFFLINE                               Instance Shutdown,ST
                                                             ABLE
      2        ONLINE  ONLINE       rac2                     Open,STABLE
ora.rac1.vip
      1        ONLINE  INTERMEDIATE rac2                     FAILED OVER,STABLE
ora.rac2.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       rac2                     STABLE
--------------------------------------------------------------------------------
--集群角色
[root@rac1 ~]# crsctl get node role config
Node 'rac1' configured role is 'leaf'
[root@rac2 ~]# crsctl get node role config
Node 'rac2' configured role is 'hub'

leaf转换为hub

--leaf转换为hub
[root@rac1 ~]# crsctl set node role hub
CRS-4408: Node 'rac1' configured role successfully changed; restart Oracle High Availability Services for new role to take effect.
--关闭集群
[root@rac1 ~]# crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.crsd' on 'rac1'
CRS-2677: Stop of 'ora.crsd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.storage' on 'rac1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'rac1'
CRS-2677: Stop of 'ora.storage' on 'rac1' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.crf' on 'rac1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'rac1'
CRS-2673: Attempting to stop 'ora.evmd' on 'rac1'
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.crf' on 'rac1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rac1'
CRS-2677: Stop of 'ora.cssd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1'
CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
--启动集群
[root@rac1 ~]# crsctl start crs -wait
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2672: Attempting to start 'ora.evmd' on 'rac1'
CRS-2676: Start of 'ora.evmd' on 'rac1' succeeded
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2789: Cannot stop resource 'ora.diskmon' as it is not running on server 'rac1'
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'
CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2676: Start of 'ora.asm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'rac1'
CRS-2676: Start of 'ora.storage' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'rac1'
CRS-2676: Start of 'ora.crf' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rac1'
CRS-2676: Start of 'ora.crsd' on 'rac1' succeeded
CRS-6017: Processing resource auto-start for servers: rac1
CRS-2672: Attempting to start 'ora.ons' on 'rac1'
CRS-2673: Attempting to stop 'ora.rac1.vip' on 'rac2'
CRS-2672: Attempting to start 'ora.ASMNET1LSNR_ASM.lsnr' on 'rac1'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'rac2'
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'rac2'
CRS-2677: Stop of 'ora.rac1.vip' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.rac1.vip' on 'rac1'
CRS-2677: Stop of 'ora.scan1.vip' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.scan1.vip' on 'rac1'
CRS-2676: Start of 'ora.rac1.vip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.LISTENER.lsnr' on 'rac1'
CRS-2676: Start of 'ora.scan1.vip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN1.lsnr' on 'rac1'
CRS-2676: Start of 'ora.ASMNET1LSNR_ASM.lsnr' on 'rac1' succeeded
CRS-2676: Start of 'ora.ons' on 'rac1' succeeded
CRS-2676: Start of 'ora.LISTENER.lsnr' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'rac1' succeeded
CRS-2676: Start of 'ora.asm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.proxy_advm' on 'rac1'
CRS-2676: Start of 'ora.proxy_advm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.ora12c.db' on 'rac1'
CRS-2676: Start of 'ora.ora12c.db' on 'rac1' succeeded
CRS-6016: Resource auto-start has completed for server rac1
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
--集群状态
[root@rac1 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.DATA.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.SYSDB_NEW.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.SYSDG.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.net1.network
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.ons
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.proxy_advm
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac1                     STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       rac2                     STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       rac2                     169.254.177.226 10.1
                                                             .1.104,STABLE
ora.asm
      1        ONLINE  ONLINE       rac1                     STABLE
      2        ONLINE  ONLINE       rac2                     STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       rac2                     STABLE
ora.gns
      1        ONLINE  ONLINE       rac2                     STABLE
ora.gns.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       rac2                     Open,STABLE
ora.oc4j
      1        ONLINE  ONLINE       rac2                     STABLE
ora.ora12c.db
      1        ONLINE  ONLINE       rac1                     Open,STABLE
      2        ONLINE  ONLINE       rac2                     Open,STABLE
ora.rac1.vip
      1        ONLINE  ONLINE       rac1                     STABLE
ora.rac2.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       rac1                     STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       rac2                     STABLE
--------------------------------------------------------------------------------
--集群角色
[root@rac1 ~]# crsctl get node role config
Node 'rac1' configured role is 'hub'
[root@rac2 ~]# crsctl get node role config
Node 'rac2' configured role is 'hub'

这里实现了ORACLE 12C RAC的leaf和hub 角色相互转换,在转换的过程中需要转移确认集群和ASM均为flex mode,如果参考相关文档完成转换

OLR相关维护

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:OLR相关维护

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

官方关于OLR描述
OLR is a registry similar to OCR located on each node in a cluster, but contains information specific to each node. It contains manageability information about Oracle Clusterware, including dependencies between various services. Oracle High Availability Services uses this information. OLR is located on local storage on each node in a cluster. Its default location is in the path Grid_home/cdata/host_name.olr, where Grid_home is the Oracle Grid Infrastructure home, and host_name is the host name of the node.
OLR是类似OCR的东西,存储在集群的每个节点本地

查看OLR位置

[root@rac2 cdata]# cd /etc/oracle
[root@rac2 oracle]# ls -l
total 2868
drwxrwx--- 2 root oinstall    4096 Nov 24 20:00 lastgasp
drwxrwxrwt 2 root oinstall    4096 Dec 21 20:51 maps
-rw-r--r-- 1 root oinstall      96 Nov 25 18:38 ocr.loc
-rw-r--r-- 1 root root           0 Nov 24 19:58 ocr.loc.orig
-rw-r--r-- 1 root oinstall      80 Nov 24 19:58 olr.loc
-rw-r--r-- 1 root root           0 Nov 24 19:58 olr.loc.orig
drwxrwxr-x 5 root oinstall    4096 Nov 24 19:57 oprocd
drwxr-xr-x 3 root oinstall    4096 Nov 24 19:57 scls_scr
-rws--x--- 1 root oinstall 2904377 Nov 24 19:57 setasmgid
[root@rac2 oracle]# more olr.loc
olrconfig_loc=/u01/app/12.1.0/grid/cdata/rac2.olr
crs_home=/u01/app/12.1.0/grid
--在部分平台olr.loc文件可能在/var/opt/oracle/目录下
[root@rac2 oracle]#  ocrcheck -config -local
Oracle Local Registry configuration is :
         Device/File Name         : /u01/app/12.1.0/grid/cdata/rac2.olr
[root@rac2 oracle]# ocrcheck -local
Status of Oracle Local Registry is as follows :
         Version                  :          4
         Total space (kbytes)     :     409568
         Used space (kbytes)      :        996
         Available space (kbytes) :     408572
         ID                       :  816087519
         Device/File Name         : /u01/app/12.1.0/grid/cdata/rac2.olr
                                    Device/File integrity check succeeded
         Local registry integrity check succeeded
         Logical corruption check succeeded
[root@rac2 oracle]# ls -l /u01/app/12.1.0/grid/cdata/rac2.olr
-rw------- 1 root oinstall 503484416 Dec 22 12:09 /u01/app/12.1.0/grid/cdata/rac2.olr

查看OLR备份

[root@rac2 oracle]# ocrconfig -local -showbackup
rac2     2013/11/24 20:02:38     /u01/app/12.1.0/grid/cdata/rac2/backup_20131124_200238.olr

备份OLR

[root@rac2 oracle]# ocrconfig -local -manualbackup
rac2     2013/12/22 12:09:33     /u01/app/12.1.0/grid/cdata/rac2/backup_20131222_120933.olr
rac2     2013/11/24 20:02:38     /u01/app/12.1.0/grid/cdata/rac2/backup_20131124_200238.olr
[root@rac2 oracle]# ls -l /u01/app/12.1.0/grid/cdata/rac2/
total 1908
-rw-r--r-- 1 root root  860160 Nov 24 20:02 backup_20131124_200238.olr
-rw-r--r-- 1 root root 1085440 Dec 22 12:09 backup_20131222_120933.olr

OLR异常恢复

--破坏OLR
[root@rac2 oracle]# ls -l /u01/app/12.1.0/grid/cdata/rac2.olr
-rw------- 1 root oinstall 503484416 Dec 22 12:09 /u01/app/12.1.0/grid/cdata/rac2.olr
[root@rac2 oracle]# /u01/app/12.1.0/grid/cdata/rac2.olr /u01/app/12.1.0/grid/cdata/rac2.olr_bak
--关闭crs
[root@rac2 oracle]# crsctl stop crs
--启动crs报错
[root@rac2 oracle]# crsctl start crs
PROCL-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
CRS-4000: Command Start failed, or completed with errors.
--跟踪crs启动
[root@rac2 oracle]# strace crsctl start crs
……
uname({sys="Linux", node="rac2", ...})  = 0
open("/etc/oracle/olr.loc", O_RDONLY)   = 14
fstat(14, {st_mode=S_IFREG|0644, st_size=80, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd8ac628000
read(14, "olrconfig_loc=/u01/app/12.1.0/gr"..., 4096) = 80
read(14, "", 4096)                      = 0
close(14)                               = 0
munmap(0x7fd8ac628000, 4096)            = 0
stat("/u01/app/12.1.0/grid/cdata/rac2.olr", 0x7fffa215a580) = -1 ENOENT (No such file or directory)
--这里可以看到先是读取/etc/oracle/olr.loc,然后获取/u01/app/12.1.0/grid/cdata/rac2.olr失败
……
--确定ohasd.bin关闭
[root@rac2 cdata]# ps -ef|grep ohasd
root     15715 31578  0 14:34 pts/3    00:00:00 grep ohasd
--还原OLR
[root@rac2 oracle]# ocrconfig -local -restore /u01/app/12.1.0/grid/cdata/rac2/backup_20131124_200238.olr
PROTL-35: The configured OLR location is not accessible
[root@rac2 oracle]# cd /u01/app/12.1.0/grid/cdata/
[root@rac2 cdata]# ls
localhost  rac12c-cluster  rac2  rac2.olr_bak
[root@rac2 cdata]# touch rac2.olr
[root@rac2 cdata]# chmod 600 rac2.olr
[root@rac2 cdata]# ocrconfig -local -restore /u01/app/12.1.0/grid/cdata/rac2/backup_20131124_200238.olr
--确定还原成功
[root@rac2 cdata]# ls -l
total 84200
drwxr-xr-x 2 grid oinstall      4096 Nov 24 19:37 localhost
drwxrwxr-x 2 grid oinstall      4096 Dec 22 09:07 rac12c-cluster
drwxr-xr-x 2 grid oinstall      4096 Dec 22 12:09 rac2
-rw------- 1 root root     503484416 Dec 22 14:29 rac2.olr
-rw------- 1 root oinstall 503484416 Dec 22 12:43 rac2.olr_bak
--启动crs
[root@rac2 cdata]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

其他OLR命令

To export OLR to a file:
# ocrconfig –local –export file_name
To import a specified file to OLR:
# ocrconfig –local –import file_name
To view the contents of the OLR file:
ocrdump -local file_name
To view the contents of the OLR backup file:
ocrdump -local -backupfile olr_backup_file_name
To change the OLR backup location:
ocrconfig -local -backuploc new_olr_backup_path

当OLR异常时,RAC节点不能正常启动,而且OLR不像OCR会定时自动备份,建议人工定时备份OLR

ORACLE 12C RAC修改ocr/votedisk/asm spfile所在磁盘组名称

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORACLE 12C RAC修改ocr/votedisk/asm spfile所在磁盘组名称

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

今天看着我这个单节点的12C rac,突然觉得ocr所在的磁盘组叫做+DG_SYS有点不舒服,想改成+SYS_DG。处理方法是先把ocr/votedisk/asm spfile迁移到已经存在的asm中,然后修改磁盘组名称,最后迁移到新名称磁盘组中(本次处理流程+DG_SYS—>+DATA—>+SYS_DG)
当前运行情况

[grid@xifenfei ~]$ crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       xifenfei                 STABLE
ora.DATA.dg
               ONLINE  ONLINE       xifenfei                 STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       xifenfei                 STABLE
ora.net1.network
               ONLINE  ONLINE       xifenfei                 STABLE
ora.ons
               ONLINE  ONLINE       xifenfei                 STABLE
ora.proxy_advm
               ONLINE  OFFLINE      xifenfei                 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       xifenfei                 169.254.196.108 10.1
                                                             0.30.22,STABLE
ora.asm
      1        ONLINE  ONLINE       xifenfei                 Started,STABLE
      2        OFFLINE OFFLINE                               STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cdb.db
      1        ONLINE  ONLINE       xifenfei                 Open,STABLE
ora.cvu
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.oc4j
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.xifenfei.vip
      1        ONLINE  ONLINE       xifenfei                 STABLE
--------------------------------------------------------------------------------
SQL> select * from v$version;
BANNER                                                                               CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production              0
PL/SQL Release 12.1.0.1.0 - Production                                                    0
CORE    12.1.0.1.0      Production                                                                0
TNS for Linux: Version 12.1.0.1.0 - Production                                            0
NLSRTL Version 12.1.0.1.0 - Production                                                    0
SQL> select name,state from v$asm_diskgroup;
NAME                           STATE
------------------------------ -----------
DG_SYS                         MOUNTED
DATA                           MOUNTED
[grid@xifenfei ~]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   60a037da30714f6bbfe5d90206ff27a7 (/dev/sdc2) [DG_SYS]
Located 1 voting disk(s).
[grid@xifenfei ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          4
         Total space (kbytes)     :     409568
         Used space (kbytes)      :       1380
         Available space (kbytes) :     408188
         ID                       : 2132096904
         Device/File Name         :    +DG_SYS
                                    Device/File integrity check succeeded
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
         Cluster registry integrity check succeeded
         Logical corruption check bypassed due to non-privileged user
SQL> show parameter spfile;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      +DG_SYS/xff-cluster/ASMPARAMET
                                                 ERFILE/registry.253.825640465

修改ocr路径
ocrconfig -add和ocrconfig -delete完成ocr更换磁盘组,该过程可以在线处理

[root@xifenfei ~]# ocrconfig -add +data
--alert 日志
2013-09-09 22:32:40.799:
[crsd(5064)]CRS-1007:The OCR/OCR mirror location was replaced by +data.
[root@xifenfei ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          4
         Total space (kbytes)     :     409568
         Used space (kbytes)      :       1380
         Available space (kbytes) :     408188
         ID                       : 2132096904
         Device/File Name         :    +DG_SYS
                                    Device/File integrity check succeeded
         Device/File Name         :      +data
                                    Device/File integrity check succeeded
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
         Cluster registry integrity check succeeded
[root@xifenfei ~]# ocrconfig -delete +DG_SYS
--alert 日志
2013-09-09 22:35:53.585:
[crsd(5064)]CRS-1010:The OCR mirror location +DG_SYS was removed.
[root@xifenfei ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          4
         Total space (kbytes)     :     409568
         Used space (kbytes)      :       1380
         Available space (kbytes) :     408188
         ID                       : 2132096904
         Device/File Name         :      +data
                                    Device/File integrity check succeeded
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
         Cluster registry integrity check succeeded
         Logical corruption check succeeded

修改votedisk路径
通过crsctl replace votedisk命令修改

[root@xifenfei ~]# crsctl replace votedisk +DATA
Successful addition of voting disk 161ddea0a5fe4f28bfb67536e6105122.
Successful deletion of voting disk 60a037da30714f6bbfe5d90206ff27a7.
Successfully replaced voting disk group with +DATA.
CRS-4266: Voting file(s) successfully replaced
-alert日志
2013-09-09 22:38:15.259:
[cssd(4685)]CRS-1605:CSSD voting file is online: /dev/sdb; details in /u01/app/12.1/grid/product/log/xifenfei/cssd/ocssd.log.
2013-09-09 22:38:15.259:
[cssd(4685)]CRS-1626:A Configuration change request completed successfully
2013-09-09 22:38:15.285:
[cssd(4685)]CRS-1601:CSSD Reconfiguration complete. Active nodes are xifenfei .
[root@xifenfei ~]# crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   161ddea0a5fe4f28bfb67536e6105122 (/dev/sdb) [DATA]
Located 1 voting disk(s).

修改asm spfile位置

[grid@xifenfei ~]$  gpnptool get -o-
Success.
…………
<orcl:ASM-Profile id="asm" DiscoveryString="/dev/sd*" SPFile="+DG_SYS/xff-cluster/ASMPARAMETERFILE/registry.253.825640465" Mode="legacy"/>
…………
[grid@xifenfei ~]$ sqlplus / as sysasm
SQL*Plus: Release 12.1.0.1.0 Production on Mon Sep 9 22:42:05 2013
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> create pfile='/tmp/pfile.asm' from spfile;
File created.
SQL> create spfile='+DATA' FROM PFILE='/tmp/pfile.asm';
File created.
[grid@xifenfei ~]$  gpnptool get -o-
Success.
…………
<orcl:ASM-Profile id="asm" DiscoveryString="/dev/sd*" SPFile="+DATA/xff-cluster/ASMPARAMETERFILE/registry.253.825720159" Mode="legacy"/>
…………

这里证明create asm spfile会自动修改spfile在gpnptool对应的profile里面的配置,无需人工干预

重启crs
为了使得asm使用新的磁盘组中的spfile文件

[root@xifenfei ~]# crsctl stop crs
[root@xifenfei ~]# crsctl start crs

验证+DG_SYS磁盘组未被使用

[grid@xifenfei ~]$ sqlplus / as sysasm
SQL*Plus: Release 12.1.0.1.0 Production on Mon Sep 9 22:59:49 2013
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> show parameter spfile;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      +DATA/xff-cluster/ASMPARAMETER
                                                 FILE/registry.253.825720159
ASMCMD> lsof
DB_Name  Instance_Name  Path
+ASM     +ASM1          +DATA.255.819326577
cdb      cdb1           +DATA/CDB/CONTROLFILE/current.274.819356503
cdb      cdb1           +DATA/CDB/DATAFILE/sysaux.278.819355829
cdb      cdb1           +DATA/CDB/DATAFILE/system.269.819356101
cdb      cdb1           +DATA/CDB/DATAFILE/undotbs1.276.819356317
cdb      cdb1           +DATA/CDB/DATAFILE/users.279.819356309
cdb      cdb1           +DATA/CDB/DD7C48AA5A4404A2E04325AAE80A403C/DATAFILE/pdbseed_temp01.dbf
cdb      cdb1           +DATA/CDB/DD7C48AA5A4404A2E04325AAE80A403C/DATAFILE/sysaux.272.819356709
cdb      cdb1           +DATA/CDB/DD7C48AA5A4404A2E04325AAE80A403C/DATAFILE/system.271.819356709
cdb      cdb1           +DATA/CDB/ONLINELOG/group_1.277.822736453
cdb      cdb1           +DATA/CDB/ONLINELOG/group_2.280.822736461
cdb      cdb1           +DATA/CDB/ONLINELOG/group_3.275.822736397
cdb      cdb1           +DATA/CDB/TEMPFILE/temp.273.819356649

dismount +DG_SYS磁盘组

ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576     20480     3369                0            3369              0             Y  DATA/
MOUNTED  EXTERN  N         512   4096  1048576      5451     5231                0            5231              0             N  DG_SYS/
ASMCMD> umount dg_sys
ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576     20480     3369                0            3369              0             Y  DATA/

修改asm dg名称
修改磁盘组+DG_SYS为+SYS_DG

[grid@xifenfei ~]$ renamedg phase=both dgname=DG_SYS newdgname=SYS_DG verbose=true
Parsing parameters..
Parameters in effect:
         Old DG name       : DG_SYS
         New DG name          : SYS_DG
         Phases               :
                 Phase 1
                 Phase 2
         Discovery str        : (null)
         Clean              : TRUE
         Raw only           : TRUE
renamedg operation: phase=both dgname=DG_SYS newdgname=SYS_DG verbose=true
Executing phase 1
Discovering the group
Performing discovery with string:
Identified disk UFS:/dev/sdc2 with disk number:0 and timestamp (32990496 1727895552)
Checking for hearbeat...
Re-discovering the group
Performing discovery with string:
Identified disk UFS:/dev/sdc2 with disk number:0 and timestamp (32990496 1727895552)
Checking if the diskgroup is mounted or used by CSS
Checking disk number:0
Generating configuration file..
Completed phase 1
Executing phase 2
Looking for /dev/sdc2
Modifying the header
Completed phase 2
Terminating kgfd context 0x7fceeb02a0a0

mount +SYS_DG

SQL> select name,state from v$asm_diskgroup;
NAME                           STATE
------------------------------ -----------
DATA                           MOUNTED
SYS_DG                         DISMOUNTED
SQL> alter diskgroup sys_dg mount;
Diskgroup altered.
SQL>  select name,state from v$asm_diskgroup;
NAME                           STATE
------------------------------ -----------
DATA                           MOUNTED
SYS_DG                         MOUNTED

asm spfile/ocr/votedisk迁移从+DATA到+SYS_DG

SQL> create spfile='+SYS_DG' FROM pfile='/tmp/pfile.asm';
File created.
[root@xifenfei ~]# ocrconfig -add +SYS_DG
[root@xifenfei ~]# ocrconfig -DELETE +DATA
[root@xifenfei ~]# crsctl replace votedisk +SYS_DG
Successful addition of voting disk 9694a31053ea4ff4bfb57891461a1296.
Successful deletion of voting disk 161ddea0a5fe4f28bfb67536e6105122.
Successfully replaced voting disk group with +SYS_DG.
CRS-4266: Voting file(s) successfully replaced
[root@xifenfei ~]# crsctl stop crs
[root@xifenfei ~]# crsctl start crs

删除ocr里面老磁盘组(+DG_SYS)信息

[root@xifenfei ~]# crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       xifenfei                 STABLE
ora.DATA.dg
               ONLINE  ONLINE       xifenfei                 STABLE
ora.DG_SYS.dg
               ONLINE  OFFLINE      xifenfei                 STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       xifenfei                 STABLE
ora.SYS_DG.dg
               ONLINE  ONLINE       xifenfei                 STABLE
ora.net1.network
               ONLINE  ONLINE       xifenfei                 STABLE
ora.ons
               ONLINE  ONLINE       xifenfei                 STABLE
ora.proxy_advm
               ONLINE  OFFLINE      xifenfei                 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       xifenfei                 169.254.196.108 10.1
                                                             0.30.22,STABLE
ora.asm
      1        ONLINE  ONLINE       xifenfei                 Started,STABLE
      2        OFFLINE OFFLINE                               STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cdb.db
      1        ONLINE  ONLINE       xifenfei                 Open,STABLE
ora.cvu
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.oc4j
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.xifenfei.vip
      1        ONLINE  ONLINE       xifenfei                 STABLE
--------------------------------------------------------------------------------
[root@xifenfei ~]# srvctl remove diskgroup -diskgroup dg_sys
[root@xifenfei ~]# crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       xifenfei                 STABLE
ora.DATA.dg
               ONLINE  ONLINE       xifenfei                 STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       xifenfei                 STABLE
ora.SYS_DG.dg
               ONLINE  ONLINE       xifenfei                 STABLE
ora.net1.network
               ONLINE  ONLINE       xifenfei                 STABLE
ora.ons
               ONLINE  ONLINE       xifenfei                 STABLE
ora.proxy_advm
               ONLINE  OFFLINE      xifenfei                 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       xifenfei                 169.254.196.108 10.1
                                                             0.30.22,STABLE
ora.asm
      1        ONLINE  ONLINE       xifenfei                 Started,STABLE
      2        OFFLINE OFFLINE                               STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cdb.db
      1        ONLINE  ONLINE       xifenfei                 Open,STABLE
ora.cvu
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.oc4j
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.xifenfei.vip
      1        ONLINE  ONLINE       xifenfei                 STABLE
--------------------------------------------------------------------------------

至此ocr/votedisk/asm spfile所在磁盘组修改名称完成,因为该库是一个单节点的rac,如果是两个或者更多节点的rac可以实现不停机的情况下进行(分步重启节点).该处理过程和11.2 rac完全相同,未有任何的改变

oracle 12.1 RAC的ocr磁盘组异常恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:oracle 12.1 RAC的ocr磁盘组异常恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在11.2或者12.1的RAC中,ocr和votedisk可以放到asm中,而很多人安装系统把ocr和votedisk放到一个单独的asm 磁盘组里面,但是如果这个磁盘组坏了,而数据所在的磁盘组是好的,这个时候该怎么恢复呢?这里的恢复分两种情况,一种是有ocr备份的恢复,另外一种是无ocr备份的恢复。但是在一般情况下ocr是每4个小时自动备份一份,因此大部分的系统中都会有ocr的备份。本blog主要对于oracle 12c rac在有ocr备份,存储ocr,votedisk的asm磁盘组异常恢复
确定ocr,votedisk,asm spfile存在一个独立asm diskgroup中

[grid@xifenfei ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          4
         Total space (kbytes)     :     409568
         Used space (kbytes)      :       1360
         Available space (kbytes) :     408208
         ID                       : 2132096904
         Device/File Name         :    +DG_SYS
                                    Device/File integrity check succeeded
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
         Cluster registry integrity check succeeded
         Logical corruption check bypassed due to non-privileged user
SQL> show parameter spfile;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      +DG_SYS/xff-cluster/ASMPARAMET
                                                 ERFILE/registry.253.825628977
[grid@xifenfei ~]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   3e20d13ae98a4fcfbffa489ab4df68a3 (/dev/sdc2) [DG_SYS]
Located 1 voting disk(s).
ASMCMD>  lsdsk -t -G dg_sys
Create_Date  Mount_Date  Repair_Timer  Path
08-SEP-13    08-SEP-13   0             /dev/sdc2

查看当前rac状态

[grid@xifenfei ~]$ crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       xifenfei                 STABLE
ora.DATA.dg
               ONLINE  ONLINE       xifenfei                 STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       xifenfei                 STABLE
ora.net1.network
               ONLINE  ONLINE       xifenfei                 STABLE
ora.ons
               ONLINE  ONLINE       xifenfei                 STABLE
ora.proxy_advm
               ONLINE  OFFLINE      xifenfei                 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       xifenfei                 169.254.196.108 10.1
                                                             0.30.22,STABLE
ora.asm
      1        ONLINE  ONLINE       xifenfei                 Started,STABLE
      2        OFFLINE OFFLINE                               STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cdb.db
      1        ONLINE  ONLINE       xifenfei                 Open,STABLE
ora.cvu
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.oc4j
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.xifenfei.vip
      1        ONLINE  ONLINE       xifenfei                 STABLE
--------------------------------------------------------------------------------

kfed查看磁盘头

[grid@xifenfei ~]$ kfed read /dev/sdc2
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  2879801080 ; 0x00c: 0xaba646f8
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:         ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]:            0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                202375168 ; 0x020: 0x0c100000
kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:             DG_SYS_0000 ; 0x028: length=11
kfdhdb.grpname:                  DG_SYS ; 0x048: length=6
kfdhdb.fgname:              DG_SYS_0000 ; 0x068: length=11
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             32990483 ; 0x0a8: HOUR=0x13 DAYS=0x8 MNTH=0x9 YEAR=0x7dd
kfdhdb.crestmp.lo:            303455232 ; 0x0ac: USEC=0x0 MSEC=0x197 SECS=0x21 MINS=0x4
kfdhdb.mntstmp.hi:             32990485 ; 0x0b0: HOUR=0x15 DAYS=0x8 MNTH=0x9 YEAR=0x7dd
kfdhdb.mntstmp.lo:           1776845824 ; 0x0b4: USEC=0x0 MSEC=0x221 SECS=0x1e MINS=0x1a
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80
kfdhdb.dsksize:                    5451 ; 0x0c4: 0x0000154b
kfdhdb.pmcnt:                         3 ; 0x0c8: 0x00000003
kfdhdb.fstlocn:                       1 ; 0x0cc: 0x00000001
kfdhdb.altlocn:                       2 ; 0x0d0: 0x00000002
kfdhdb.f1b1locn:                     10 ; 0x0d4: 0x0000000a
kfdhdb.redomirrors[0]:                0 ; 0x0d8: 0x0000
kfdhdb.redomirrors[1]:                0 ; 0x0da: 0x0000
kfdhdb.redomirrors[2]:                0 ; 0x0dc: 0x0000
kfdhdb.redomirrors[3]:                0 ; 0x0de: 0x0000
kfdhdb.dbcompat:              168820736 ; 0x0e0: 0x0a100000
kfdhdb.grpstmp.hi:             32990483 ; 0x0e4: HOUR=0x13 DAYS=0x8 MNTH=0x9 YEAR=0x7dd
kfdhdb.grpstmp.lo:            301063168 ; 0x0e8: USEC=0x0 MSEC=0x77 SECS=0x1f MINS=0x4
kfdhdb.vfstart:                     224 ; 0x0ec: 0x000000e0
kfdhdb.vfend:                       256 ; 0x0f0: 0x00000100
kfdhdb.spfile:                      219 ; 0x0f4: 0x000000db   ----asm spfile的起点
kfdhdb.spfflg:                        1 ; 0x0f8: 0x00000001
kfdhdb.flags:                         1 ; 0x0fc: 0x00000001

备份ocr

[root@xifenfei xff-cluster]# ocrconfig  -manualbackup
xifenfei     2013/09/08 23:48:57     /u01/app/12.1/grid/product/cdata/xff-cluster/backup_20130908_234857.ocr
[root@xifenfei xff-cluster]# ocrconfig -showbackup
xifenfei     2013/08/08 21:11:00     /u01/app/12.1/grid/product/cdata/xifenfe-cluster/backup00.ocr
xifenfei     2013/08/08 17:10:56     /u01/app/12.1/grid/product/cdata/xifenfe-cluster/backup01.ocr
xifenfei     2013/07/08 20:23:18     /u01/app/12.1/grid/product/cdata/xifenfe-cluster/backup02.ocr
xifenfei     2013/08/08 17:10:56     /u01/app/12.1/grid/product/cdata/xifenfe-cluster/day.ocr
xifenfei     2013/08/08 17:10:56     /u01/app/12.1/grid/product/cdata/xifenfe-cluster/week.ocr
xifenfei     2013/09/08 23:48:57     /u01/app/12.1/grid/product/cdata/xff-cluster/backup_20130908_234857.ocr
xifenfei     2013/06/28 22:55:02     /u01/app/12.1/grid/product/cdata/xifenfe-cluster/backup_20130628_225502.ocr

破坏asm disk

[grid@xifenfei ~]$ dd if=/dev/zero of=/dev/sdc2 bs=4096 count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 6.6061e-05 seconds, 62.0 MB/s

关闭crs

[root@xifenfei xff-cluster]# crsctl stop crs

启动crs

[root@xifenfei xff-cluster]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[grid@xifenfei admin]$ crsctl status res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                               Instance Shutdown,ST
                                                             ABLE
ora.cluster_interconnect.haip
      1        ONLINE  OFFLINE                               STABLE
ora.crf
      1        ONLINE  OFFLINE                               STABLE
ora.crsd
      1        ONLINE  OFFLINE                               STABLE
ora.cssd
      1        ONLINE  OFFLINE      xifenfei                 STARTING
ora.cssdmonitor
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.ctssd
      1        ONLINE  OFFLINE                               STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.drivers.acfs
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.evmd
      1        ONLINE  INTERMEDIATE xifenfei                 STABLE
ora.gipcd
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.gpnpd
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.mdnsd
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.storage
      1        ONLINE  OFFLINE                               STABLE
--------------------------------------------------------------------------------

GI相关日志

--alert日志
2013-09-08 23:53:37.662:
[gpnpd(1507)]CRS-2328:GPNPD started on node xifenfei.
2013-09-08 23:54:10.244:
[cssd(1584)]CRS-1713:CSSD daemon is started in hub mode
2013-09-08 23:54:10.915:
[cssd(1584)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/12.1/grid/product/log/xifenfei/cssd/ocssd.log
2013-09-08 23:54:11.183:
[ohasd(1367)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
2013-09-08 23:54:11.183:
[ohasd(1367)]CRS-2769:Unable to failover resource 'ora.diskmon'.
2013-09-08 23:54:26.044:
[cssd(1584)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/12.1/grid/product/log/xifenfei/cssd/ocssd.log
2013-09-08 23:54:41.146:
[cssd(1584)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/12.1/grid/product/log/xifenfei/cssd/ocssd.log
2013-09-08 23:54:56.195:
[cssd(1584)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/12.1/grid/product/log/xifenfei/cssd/ocssd.log
--ocssd日志
2013-09-08 23:54:25.976: [    GPNP][1090226496]clsgpnp_profileCallUrlInt: [at clsgpnp.c:2360] Result: (0) CLSGPNP_OK. Successful get-profile CALL to remote "ipc://GPNPD_xifenfei" disco ""
2013-09-08 23:54:25.976: [    CSSD][1090226496]clssnmReadDiscoveryProfile: voting file discovery string(/dev/sd*)
2013-09-08 23:54:25.976: [    CSSD][1090226496]clssnmvDDiscThread: using discovery string /dev/sd* for initial discovery
2013-09-08 23:54:25.976: [   SKGFD][1090226496]Discovery with str:/dev/sd*:
2013-09-08 23:54:25.976: [   SKGFD][1090226496]UFS discovery with :/dev/sd*:
2013-09-08 23:54:26.032: [   SKGFD][1090226496]Fetching UFS disk :/dev/sdc:
2013-09-08 23:54:26.037: [   SKGFD][1090226496]Fetching UFS disk :/dev/sdc1:
2013-09-08 23:54:26.037: [   SKGFD][1090226496]Fetching UFS disk :/dev/sdb:
2013-09-08 23:54:26.037: [   SKGFD][1090226496]Fetching UFS disk :/dev/sdc2:
2013-09-08 23:54:26.037: [   SKGFD][1090226496]Fetching UFS disk :/dev/sdd:
2013-09-08 23:54:26.037: [   SKGFD][1090226496]Fetching UFS disk :/dev/sdd1:
2013-09-08 23:54:26.037: [   SKGFD][1090226496]Fetching UFS disk :/dev/sda:
2013-09-08 23:54:26.037: [   SKGFD][1090226496]Fetching UFS disk :/dev/sda1:
2013-09-08 23:54:26.037: [   SKGFD][1090226496]Fetching UFS disk :/dev/sda2:
2013-09-08 23:54:26.037: [   SKGFD][1090226496]OSS discovery with :/dev/sd*:
2013-09-08 23:54:26.042: [   SKGFD][1090226496]Handle 0x1d65c10 from lib :UFS:: for disk :/dev/sdb:
2013-09-08 23:54:26.043: [   SKGFD][1090226496]Handle 0x20c95a0 from lib :UFS:: for disk :/dev/sdc2:
2013-09-08 23:54:26.043: [   SKGFD][1090226496]Handle 0x20c9dd0 from lib :UFS:: for disk :/dev/sdd:
2013-09-08 23:54:26.044: [   SKGFD][1090226496]Lib :UFS:: closing handle 0x1d65c10 for disk :/dev/sdb:
2013-09-08 23:54:26.044: [   SKGFD][1090226496]Lib :UFS:: closing handle 0x20c95a0 for disk :/dev/sdc2:
2013-09-08 23:54:26.044: [   SKGFD][1090226496]Lib :UFS:: closing handle 0x20c9dd0 for disk :/dev/sdd:
2013-09-08 23:54:26.044: [    CSSD][1090226496]clssnmvDiskVerify: Successful discovery of 0 disks
2013-09-08 23:54:26.044: [    CSSD][1090226496]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2013-09-08 23:54:26.044: [    CSSD][1090226496]clssnmvFindInitialConfigs: No voting files found
2013-09-08 23:54:26.044: [    CSSD][1090226496](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds

在我们破坏了ocr所在的asm disk的磁盘后,启动crs明显提示无法找到votedisk信息

强制关闭crs

[root@xifenfei xff-cluster]# crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'xifenfei'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'xifenfei'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'xifenfei'
CRS-2677: Stop of 'ora.drivers.acfs' on 'xifenfei' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'xifenfei' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'xifenfei'
CRS-2673: Attempting to stop 'ora.gipcd' on 'xifenfei'
CRS-2673: Attempting to stop 'ora.evmd' on 'xifenfei'
CRS-2677: Stop of 'ora.gpnpd' on 'xifenfei' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'xifenfei' succeeded
CRS-2677: Stop of 'ora.evmd' on 'xifenfei' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'xifenfei' has completed
CRS-4133: Oracle High Availability Services has been stopped.

exclusive模式启动crs

[root@xifenfei xff-cluster]# crsctl start crs -excl -nocrs
CRS-4123: Oracle High Availability Services has been started.
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'xifenfei'
CRS-2677: Stop of 'ora.drivers.acfs' on 'xifenfei' succeeded
CRS-2672: Attempting to start 'ora.evmd' on 'xifenfei'
CRS-2672: Attempting to start 'ora.mdnsd' on 'xifenfei'
CRS-2676: Start of 'ora.evmd' on 'xifenfei' succeeded
CRS-2676: Start of 'ora.mdnsd' on 'xifenfei' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'xifenfei'
CRS-2676: Start of 'ora.gpnpd' on 'xifenfei' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'xifenfei'
CRS-2672: Attempting to start 'ora.gipcd' on 'xifenfei'
CRS-2676: Start of 'ora.cssdmonitor' on 'xifenfei' succeeded
CRS-2676: Start of 'ora.gipcd' on 'xifenfei' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'xifenfei'
CRS-2672: Attempting to start 'ora.diskmon' on 'xifenfei'
CRS-2676: Start of 'ora.diskmon' on 'xifenfei' succeeded
CRS-2676: Start of 'ora.cssd' on 'xifenfei' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'xifenfei'
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'xifenfei'
CRS-2672: Attempting to start 'ora.ctssd' on 'xifenfei'
CRS-2676: Start of 'ora.ctssd' on 'xifenfei' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'xifenfei' succeeded
CRS-2676: Start of 'ora.drivers.acfs' on 'xifenfei' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'xifenfei'
CRS-2676: Start of 'ora.asm' on 'xifenfei' succeeded
[grid@xifenfei xifenfei]$ crsctl stat res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  INTERMEDIATE xifenfei                 OCR not started,STAB
                                                             LE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.crf
      1        OFFLINE OFFLINE                               STABLE
ora.crsd
      1        OFFLINE OFFLINE                               STABLE
ora.cssd
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.ctssd
      1        ONLINE  ONLINE       xifenfei                 ACTIVE:0,STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.drivers.acfs
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.evmd
      1        ONLINE  INTERMEDIATE xifenfei                 STABLE
ora.gipcd
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.gpnpd
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.mdnsd
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.storage
      1        OFFLINE OFFLINE                               STABLE

创建磁盘组

[grid@xifenfei xifenfei]$ sqlplus / as sysasm
SQL*Plus: Release 12.1.0.1.0 Production on Mon Sep 9 00:23:40 2013
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> create diskgroup DG_OCR external redundancy disk '/dev/sdc2' attribute 'COMPATIBLE.ASM' = '12.1.0';
Diskgroup created.
[root@xifenfei xff-cluster]# ls -l
total 1472
-rw-r--r-- 1 root root 1503232 Sep  8 23:48 backup_20130908_234857.ocr
[root@xifenfei xff-cluster]# ocrconfig -restore backup_20130908_234857.ocr
PROT-35: The configured OCR locations are not accessible
SQL> conn / as sysasm
Connected.
SQL> drop diskgroup DG_OCR force including contents;
Diskgroup dropped.
SQL>  create diskgroup DG_SYS  external redundancy disk '/dev/sdc2' attribute 'COMPATIBLE.ASM' = '12.1.0';
Diskgroup created.

为了操作方便,建议创建磁盘组和以前ocr所在异常的磁盘组一致

还原ocr

[root@xifenfei xff-cluster]# ocrconfig -restore backup_20130908_234857.ocr
--ALERT 日志
2013-09-09 00:26:50.584:
[client(3015)]CRS-1002:The OCR was restored from file backup_20130908_234857.ocr.

处理votedisk

[root@xifenfei xff-cluster]# crsctl replace votedisk +DG_SYS
Successful addition of voting disk 60a037da30714f6bbfe5d90206ff27a7.
Successfully replaced voting disk group with +DG_SYS.
CRS-4266: Voting file(s) successfully replaced

创建asm spfile

[grid@xifenfei dbs]$ vi /tmp/asm.txt
instance_type='asm'
large_pool_size=12M
remote_login_passwordfile= "EXCLUSIVE"
asm_diskstring           = "/dev/sd*"
asm_power_limit          = 1
[grid@xifenfei dbs]$ sqlplus '/ as sysasm'
SQL*Plus: Release 12.1.0.1.0 Production on Mon Sep 9 00:34:18 2013
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL>  create spfile='+DG_SYS' FROM pfile='/tmp/asm.txt';
File created.

重启crs

[root@xifenfei xff-cluster]# crsctl stop crs -f
[root@xifenfei xff-cluster]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[grid@xifenfei dbs]$ crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       xifenfei                 STABLE
ora.DATA.dg
               ONLINE  ONLINE       xifenfei                 STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       xifenfei                 STABLE
ora.net1.network
               ONLINE  ONLINE       xifenfei                 STABLE
ora.ons
               ONLINE  ONLINE       xifenfei                 STABLE
ora.proxy_advm
               ONLINE  OFFLINE      xifenfei                 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       xifenfei                 169.254.196.108 10.1
                                                             0.30.22,STABLE
ora.asm
      1        ONLINE  ONLINE       xifenfei                 Started,STABLE
      2        OFFLINE OFFLINE                               STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cdb.db
      1        ONLINE  ONLINE       xifenfei                 Open,STABLE
ora.cvu
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.oc4j
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.xifenfei.vip
      1        ONLINE  ONLINE       xifenfei                 STABLE
--------------------------------------------------------------------------------

这里crs已经恢复正常,进一步检查ocr,votedisk,asm spfile情况

[grid@xifenfei ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          4
         Total space (kbytes)     :     409568
         Used space (kbytes)      :       1380
         Available space (kbytes) :     408188
         ID                       : 2132096904
         Device/File Name         :    +DG_SYS
                                    Device/File integrity check succeeded
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
         Cluster registry integrity check succeeded
         Logical corruption check bypassed due to non-privileged user
[grid@xifenfei ~]$ sqlplus / as sysasm
SQL*Plus: Release 12.1.0.1.0 Production on Mon Sep 9 16:12:21 2013
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> show parameter spfile;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      +DG_SYS/xff-cluster/ASMPARAMET
                                                 ERFILE/registry.253.825640465
SQL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
[grid@xifenfei ~]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   60a037da30714f6bbfe5d90206ff27a7 (/dev/sdc2) [DG_SYS]
Located 1 voting disk(s).

至此在ocr 磁盘组异常,而有ocr备份的情况下故障恢复完毕,对于没有ocr备份的故障,只能通过重建ocr来完成,大概步骤为

--deconfigure(root)
remote node
# <$GRID_HOME>/crs/install/rootcrs.pl -deconfig -force -verbose
lastnode
# <$GRID_HOME>/crs/install/rootcrs.pl -deconfig -force -verbose -lastnode
--配置信息重建ocr等(grid)
# $GRID_HOME/crs/config/config.sh

误杀进程导致rac hang住

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:误杀进程导致rac hang住

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户反馈系统hang住,不能归档,需要我们紧急介入分析
节点1日志
出现redo不能归档,redo日志都已经被写满,人工执行了ALTER SYSTEM ARCHIVE LOG CURRENT,数据库就开始把redo全部归档,但是后面产生的redo又不能归档,当redo全部写满之后,数据库有出现大量log file switch (archiving needed)等待

Tue Sep 24 22:05:37 2013
Thread 1 advanced to log sequence 47282 (LGWR switch)
  Current log# 6 seq# 47282 mem# 0: +DATA/q9db/onlinelog/group_6.1244.818697409
Tue Sep 24 22:07:31 2013
ORACLE Instance q9db1 - Can not allocate log, archival required
Thread 1 cannot allocate new log, sequence 47283
All online logs needed archiving
  Current log# 6 seq# 47282 mem# 0: +DATA/q9db/onlinelog/group_6.1244.818697409
Tue Sep 24 22:28:17 2013
ALTER SYSTEM ARCHIVE LOG
Archived Log entry 259646 added for thread 1 sequence 47266 ID 0x354620c2 dest 1:
Tue Sep 24 22:28:18 2013
Thread 1 advanced to log sequence 47283 (LGWR switch)
  Current log# 7 seq# 47283 mem# 0: +DATA/q9db/onlinelog/group_7.1243.818697415
Archived Log entry 259647 added for thread 1 sequence 47267 ID 0x354620c2 dest 1:
Archived Log entry 259648 added for thread 1 sequence 47268 ID 0x354620c2 dest 1:
Archived Log entry 259649 added for thread 1 sequence 47269 ID 0x354620c2 dest 1:
Archived Log entry 259650 added for thread 1 sequence 47270 ID 0x354620c2 dest 1:
Archived Log entry 259651 added for thread 1 sequence 47271 ID 0x354620c2 dest 1:
Archived Log entry 259652 added for thread 1 sequence 47272 ID 0x354620c2 dest 1:
Tue Sep 24 22:28:28 2013
Archived Log entry 259653 added for thread 1 sequence 47273 ID 0x354620c2 dest 1:
Archived Log entry 259654 added for thread 1 sequence 47274 ID 0x354620c2 dest 1:
Archived Log entry 259655 added for thread 1 sequence 47275 ID 0x354620c2 dest 1:
Archived Log entry 259656 added for thread 1 sequence 47276 ID 0x354620c2 dest 1:
Archived Log entry 259657 added for thread 1 sequence 47277 ID 0x354620c2 dest 1:
Archived Log entry 259658 added for thread 1 sequence 47278 ID 0x354620c2 dest 1:
Archived Log entry 259659 added for thread 1 sequence 47279 ID 0x354620c2 dest 1:
Tue Sep 24 22:28:39 2013
Archived Log entry 259660 added for thread 1 sequence 47280 ID 0x354620c2 dest 1:
Archived Log entry 259661 added for thread 1 sequence 47281 ID 0x354620c2 dest 1:
Archived Log entry 259662 added for thread 1 sequence 47282 ID 0x354620c2 dest 1:
Tue Sep 24 22:29:39 2013
Thread 1 advanced to log sequence 47284 (LGWR switch)
  Current log# 8 seq# 47284 mem# 0: +DATA/q9db/onlinelog/group_8.1242.818697417
Tue Sep 24 22:31:18 2013
Thread 1 advanced to log sequence 47285 (LGWR switch)
  Current log# 16 seq# 47285 mem# 0: +DATA/q9db/onlinelog/group_16.1884.827003545
Thread 1 advanced to log sequence 47286 (LGWR switch)
  Current log# 17 seq# 47286 mem# 0: +DATA/q9db/onlinelog/group_17.1885.827003587

节点2日志
节点2中出现大量的IPC Send timeout

Tue Sep 24 15:22:19 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
…………
Tue Sep 24 18:51:55 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
Tue Sep 24 18:57:54 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
Receiver: inst 1 binc 464003926 ospid 1566
Tue Sep 24 19:03:57 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
Receiver: inst 1 binc 464003926 ospid 1566
Tue Sep 24 19:09:53 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
…………
Tue Sep 24 20:22:00 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]

节点1因为不能归档hang住,节点2紧接着也就hang住。对节点1hang住之时对两个节点分别做systemstate dump,使用ass进行分析得到节点1和节点2的记录大体如下:
节点1

393:waiting for 'log file switch (archiving needed)'
394:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
395:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
397:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
398:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
451:waiting for 'SQL*Net message from client'
469:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
470:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
471:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
618:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
626:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
NO BLOCKING PROCESSES FOUND

节点2

515:waiting for 'gc buffer busy acquire'
     Cmd: Insert
516:waiting for 'gc buffer busy acquire'
     Cmd: Insert
517:waiting for 'gc buffer busy acquire'
     Cmd: Insert
518:waiting for 'gc buffer busy acquire'
     Cmd: Insert
519:waiting for 'gc buffer busy acquire'
     Cmd: Insert
520:waiting for 'gc buffer busy acquire'
     Cmd: Select
521:waiting for 'gc current request'
     Cmd: Insert
522:waiting for 'enq: TX - row lock contention'[Enq TX-00BA0020-001E3E3C]
     Cmd: Select
523:waiting for 'gc buffer busy acquire'
     Cmd: Insert
524:waiting for 'SQL*Net message from client'
525:waiting for 'gc buffer busy acquire'
     Cmd: Insert
526:waiting for 'gc buffer busy acquire'
     Cmd: Insert
527:waiting for 'enq: TX - row lock contention'[Enq TX-00BA0020-001E3E3C]
     Cmd: Select
528:waiting for 'SQL*Net message from client'
529:waiting for 'gc buffer busy acquire'
     Cmd: Select
                    Resource Holder State
    Enq TX-0005001E-0022374F   223: waiting for 'gc current request'
    Enq TX-0047001B-002BCEB2   247: waiting for 'gc current request'
    Enq TX-015B001E-000041FF   330: waiting for 'gc current request'
    Enq TX-00010010-002EA7CD   179: waiting for 'gc current request'
    Enq TX-00BA0020-001E3E3C    ??? Blocker
Object Names
~~~~~~~~~~~~
Enq TX-0005001E-0022374F
Enq TX-0047001B-002BCEB2
Enq TX-015B001E-000041FF
Enq TX-00010010-002EA7CD
Enq TX-00BA0020-001E3E3C

通过这里,我们可以明白,节点2的很多事务hang住是因为请求gc current request,而该等待是因为节点1无法归档,有些block无法正常传输到节点2,导致节点2一直hang在这里,然后就出现IPC Send timeout;节点1上的事务阻塞甚至hang住是因为无法归档导致.到此需要定位的问题是为什么节点1不能归档
继续分析节点1 alert日志

Tue Sep 24 15:18:20 2013
opidrv aborting process O000 ospid (7332) as a result of ORA-28
Immediate Kill Session#: 1904, Serial#: 1065
Immediate Kill Session: sess: 0x24a2522a38  OS pid: 7338
Immediate Kill Session#: 3597, Serial#: 11107
Immediate Kill Session: sess: 0x24c27cf498  OS pid: 7320
Tue Sep 24 15:18:23 2013
opidrv aborting process W000 ospid (7980) as a result of ORA-28
Tue Sep 24 15:18:23 2013
opidrv aborting process W001 ospid (8560) as a result of ORA-28
Tue Sep 24 15:18:35 2013
LGWR: Detected ARCH process failure
LGWR: Detected ARCH process failure
LGWR: Detected ARCH process failure
LGWR: Detected ARCH process failure
LGWR: STARTING ARCH PROCESSES
Tue Sep 24 15:18:35 2013
ARC0 started with pid=66, OS id=10793
Tue Sep 24 15:18:35 2013
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db1/trace/q9db1_nsa2_12635.trc:
ORA-00028: your session has been killed
LNS: Failed to archive log 8 thread 1 sequence 47156 (28)
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
Thread 1 advanced to log sequence 47157 (LGWR switch)
ARC0: STARTING ARCH PROCESSES
  Current log# 9 seq# 47157 mem# 0: +DATA/q9db/onlinelog/group_9.1241.818697421
Tue Sep 24 15:18:36 2013
ARC1 started with pid=81, OS id=10805
Tue Sep 24 15:18:36 2013
ARC2 started with pid=84, OS id=10807
Tue Sep 24 15:18:36 2013
ARC3 started with pid=87, OS id=10809
ARC1: Archival started
ARC2: Archival started
ARC2: Becoming the 'no FAL' ARCH
ARC2: Becoming the 'no SRL' ARCH
ARC1: Becoming the heartbeat ARCH
Error 1031 received logging on to the standby
PING[ARC1]: Heartbeat failed to connect to standby 'q9adgdg'. Error is 1031.
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Archived Log entry 259135 added for thread 1 sequence 47156 ID 0x354620c2 dest 1:
Error 1031 received logging on to the standby
FAL[server, ARC3]: Error 1031 creating remote archivelog file 'q9adgdg'
FAL[server, ARC3]: FAL archive failed, see trace file.
ARCH: FAL archive failed. Archiver continuing
ORACLE Instance q9db1 - Archival Error. Archiver continuing.
Tue Sep 24 15:18:46 2013
opidrv aborting process O001 ospid (9605) as a result of ORA-28
Tue Sep 24 15:18:46 2013
opidrv aborting process O000 ospid (10813) as a result of ORA-28
Tue Sep 24 15:18:46 2013
Immediate Kill Session#: 2909, Serial#: 369
Immediate Kill Session: sess: 0x24226c7200  OS pid: 9091
Immediate Kill Session#: 3380, Serial#: 30271
Immediate Kill Session: sess: 0x2422782c58  OS pid: 10265
Immediate Kill Session#: 3597, Serial#: 11109
Immediate Kill Session: sess: 0x24c27cf498  OS pid: 10267
Tue Sep 24 15:20:14 2013
Restarting dead background process DIAG
Tue Sep 24 15:20:14 2013
DIAG started with pid=64, OS id=20568
Restarting dead background process PING
Tue Sep 24 15:20:14 2013
PING started with pid=68, OS id=20570
Restarting dead background process LMHB
Tue Sep 24 15:20:14 2013
LMHB started with pid=70, OS id=20572
Restarting dead background process SMCO
…………
Tue Sep 24 15:23:13 2013
ARC0: Detected ARCH process failure
Tue Sep 24 15:23:13 2013
Thread 1 advanced to log sequence 47158 (LGWR switch)
  Current log# 10 seq# 47158 mem# 0: +DATA/q9db/onlinelog/group_10.1240.818697423
ARC0: STARTING ARCH PROCESSES
ARC0: STARTING ARCH PROCESSES COMPLETE
ARC0: Becoming the heartbeat ARCH
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance q9db1 - Archival Error
ORA-00028: your session has been killed

查看ARCn进程

[oracle@q9db01 ~]$ ps -ef|grep ora_ar
oracle   20718 12870  0 22:07 pts/14   00:00:00 grep ora_ar
[oracle@q9db01 ~]$ ps -ef|grep ora_ar
oracle   25998 12870  0 22:07 pts/14   00:00:00 grep ora_ar

这里基本上明白了,因为客户的系统从15:15开始由于中间件程序异常,导致大量会话连接数据库,然后dba为了防止其他业务不受影响,然后开始大量通过alter system kill session,误杀了不少系统进程,包括ARCn(0,1,2,3)进程,在后面ARCn进程因为某种原因无法正常启动,导致redo无法归档,所有的redo组写满系统即hang住,该系统由于大量kill session已经导致了实例本身异常(正常情况ARCn进程kill之后会自动重启),处理方案:先增加redo组配合定时人工归档,等待业务低峰重启节点1,解决问题。温馨提醒:kill进程请小心

安装 ORACLE 12C 单节点RAC

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:安装 ORACLE 12C 单节点RAC

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

装过ORACLE 12C RAC 的朋友应该感觉到12C的RAC简直就是一个怪物,需要消耗太多的内存、IO、CPU资源,在没有物理机器的情况下,使用虚拟机装ORACLE 12C RAC那可能需要比较好的主机资源(8G的内存不运行主机,仅仅够2个节点的虚拟机运行,所以8G的内存主机基本上无法ORACLE 12C 2节点RAC),而没有好的资源情况下,又需要玩12C RAC功能的朋友,我这里展示了单节点RAC(一个节点的RAC的rac),这里主要显示的是单节点RAC和多节点RAC安装不同之处,同时这里的安装也仅仅是为了玩
内存要求
12c_rac_require


准备环境

[oracle@xifenfei ~]$ more /etc/oracle-release
Oracle Linux Server release 5.8
[oracle@xifenfei ~]$ free -m
             total       used       free     shared    buffers     cached
Mem:          4350       4036        313          0          4       2805
-/+ buffers/cache:       1226       3124
Swap:         2047        853       1193
[oracle@xifenfei ~]$ /sbin/ifconfig
eth0      Link encap:Ethernet  HWaddr 00:0C:29:02:1B:A7
          inet addr:192.168.30.22  Bcast:192.168.30.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:66152 errors:0 dropped:0 overruns:0 frame:0
          TX packets:83647 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:21143877 (20.1 MiB)  TX bytes:44756747 (42.6 MiB)
eth1      Link encap:Ethernet  HWaddr 00:0C:29:02:1B:B1
          inet addr:10.10.30.22  Bcast:10.10.30.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:8064 errors:0 dropped:0 overruns:0 frame:0
          TX packets:366 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:645158 (630.0 KiB)  TX bytes:64125 (62.6 KiB)
[oracle@xifenfei ~]$ more /etc/hosts
127.0.0.1       localhost.localdomain localhost
10.10.30.22     xifenfei-priv
192.168.30.22   xifenfei
192.168.30.32   xifenfei-vip
192.168.30.42   scan-ip
[root@xifenfei ~]# yum install oracle-validated
[root@xifenfei rpm]# rpm -ivh cvuqdisk-1.0.9-1.rpm

安装GRID软件
single_rac_gi0.jpg
single_rac_gi1
single_rac_gi2.jpg


安装ORACE DB软件
single_rac_db1


创建数据库
single_rac_dbca


安装结果

[root@xifenfei ~]# crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       xifenfei                 STABLE
ora.DATA.dg
               ONLINE  ONLINE       xifenfei                 STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       xifenfei                 STABLE
ora.net1.network
               ONLINE  ONLINE       xifenfei                 STABLE
ora.ons
               ONLINE  ONLINE       xifenfei                 STABLE
ora.proxy_advm
               ONLINE  ONLINE       xifenfei                 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       xifenfei                 169.254.243.16 10.10
                                                             .30.22,STABLE
ora.asm
      1        ONLINE  ONLINE       xifenfei                 STABLE
      2        OFFLINE OFFLINE                               STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cdb.db
      1        ONLINE  ONLINE       xifenfei                 Open,STABLE
ora.cvu
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       xifenfei                 Open,STABLE
ora.oc4j
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.xifenfei.vip
      1        ONLINE  ONLINE       xifenfei                 STABLE
--------------------------------------------------------------------------------
[root@xifenfei ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:0C:29:02:1B:A7
          inet addr:192.168.30.22  Bcast:192.168.30.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:66419 errors:0 dropped:0 overruns:0 frame:0
          TX packets:83849 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:21168865 (20.1 MiB)  TX bytes:44798962 (42.7 MiB)
eth0:1    Link encap:Ethernet  HWaddr 00:0C:29:02:1B:A7
          inet addr:192.168.30.42  Bcast:192.168.30.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
eth0:2    Link encap:Ethernet  HWaddr 00:0C:29:02:1B:A7
          inet addr:192.168.30.32  Bcast:192.168.30.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
eth1      Link encap:Ethernet  HWaddr 00:0C:29:02:1B:B1
          inet addr:10.10.30.22  Bcast:10.10.30.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:12244 errors:0 dropped:0 overruns:0 frame:0
          TX packets:368 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:994988 (971.6 KiB)  TX bytes:64536 (63.0 KiB)
eth1:1    Link encap:Ethernet  HWaddr 00:0C:29:02:1B:B1
          inet addr:169.254.243.16  Bcast:169.254.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:91800 errors:0 dropped:0 overruns:0 frame:0
          TX packets:91800 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:68999524 (65.8 MiB)  TX bytes:68999524 (65.8 MiB)

在11GR2 GI上配置第二个监听

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:在11GR2 GI上配置第二个监听

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户因为归档日志量每天很大,为了不影响业务,需要配置一个单独的万兆网络来专门的传输归档日志到DG库,这里就涉及到在11G(11203 Linux) RAC中增加一个监听用来使用专门的网络.这里提供在主库配置第二个监听的整体操作过程,主要涉及配置解析,增加网络,增加vip,配置监听,配置listener_networks
网卡情况

[oracle@q9db01 admin]$ /sbin/ifconfig eth2
eth2      Link encap:Ethernet  HWaddr 90:E2:BA:1E:14:34
          inet addr:192.168.5.60  Bcast:192.168.5.255  Mask:255.255.255.0
          inet6 addr: fe80::92e2:baff:fe1e:1434/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3932094203 errors:0 dropped:0 overruns:0 frame:0
          TX packets:176073749 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:5752649063554 (5.2 TiB)  TX bytes:31298228144 (29.1 GiB)
[grid@q9db02 ~]$ /sbin/ifconfig eth2
eth2      Link encap:Ethernet  HWaddr 90:E2:BA:1E:13:5C
          inet addr:192.168.5.61  Bcast:192.168.5.255  Mask:255.255.255.0
          inet6 addr: fe80::92e2:baff:fe1e:135c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:22861938 errors:0 dropped:0 overruns:0 frame:0
          TX packets:187447459 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2603356463 (2.4 GiB)  TX bytes:276672018723 (257.6 GiB)

配置hosts文件

#public dg ip
192.168.5.60     q9db01-dg
192.168.5.61     q9db02-dg
192.168.5.64    q9db01-dg-vip
192.168.5.65    q9db02-dg-vip

CRS中配置

--增加网络资源
[root@q9db01 ~]# srvctl add network -k 2 -S 192.168.5.0/255.255.255.0/eth2 -w static -v
--启动网络资源
[root@q9db01 ~]# crsctl start res ora.net2.network
--增加vip资源
[root@q9db01 ~]# srvctl add vip -n q9db01 -A 192.168.5.64/255.255.255.0 -k 2
[root@q9db01 ~]# srvctl add vip -n q9db02 -A 192.168.5.65/255.255.255.0 -k 2
--启动vip资源
[root@q9db01 ~]# srvctl start vip -i q9db01-dg-vip
[root@q9db01 ~]# srvctl start vip -i q9db02-dg-vip
--netca创建监听
[root@q9db01 ~]# su - grid
[grid@q9db01 ~]$ export DISPLAY=172.18.50.150:0.0
[grid@q9db01 ~]$ netca
------选择Subnet 2(选择网络192.168.5.60网段),选择非1521端口---------
--查看资源状态
[grid@q9db01 ~]$ crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCH.dg
               ONLINE  ONLINE       q9db01
               ONLINE  ONLINE       q9db02
ora.DATA.dg
               ONLINE  ONLINE       q9db01
               ONLINE  ONLINE       q9db02
ora.LISTENER.lsnr
               ONLINE  ONLINE       q9db01
               ONLINE  ONLINE       q9db02
ora.LISTENER_DG.lsnr
               ONLINE  ONLINE       q9db01
               ONLINE  ONLINE       q9db02
ora.OCR_VOTE.dg
               ONLINE  ONLINE       q9db01
               ONLINE  ONLINE       q9db02
ora.asm
               ONLINE  ONLINE       q9db01                   Started
               ONLINE  ONLINE       q9db02                   Started
ora.gsd
               OFFLINE OFFLINE      q9db01
               OFFLINE OFFLINE      q9db02
ora.net1.network
               ONLINE  ONLINE       q9db01
               ONLINE  ONLINE       q9db02
ora.net2.network
               ONLINE  ONLINE       q9db01
               ONLINE  ONLINE       q9db02
ora.ons
               ONLINE  ONLINE       q9db01
               ONLINE  ONLINE       q9db02
ora.registry.acfs
               ONLINE  ONLINE       q9db01
               ONLINE  ONLINE       q9db02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       q9db01
ora.cvu
      1        ONLINE  ONLINE       q9db01
ora.oc4j
      1        ONLINE  ONLINE       q9db01
ora.q9db.db
      1        ONLINE  ONLINE       q9db01                   Open
      2        ONLINE  ONLINE       q9db02                   Open
ora.q9db01-dg-vip.vip
      1        ONLINE  ONLINE       q9db01
ora.q9db01.vip
      1        ONLINE  ONLINE       q9db01
ora.q9db02-dg-vip.vip
      1        ONLINE  ONLINE       q9db02
ora.q9db02.vip
      1        ONLINE  ONLINE       q9db02
ora.scan1.vip
      1        ONLINE  ONLINE       q9db01
--查看监听状态
[grid@q9db01 ~]$ lsnrctl status listener_dg
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 19-JUN-2013 14:40:24
Copyright (c) 1991, 2011, Oracle.  All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_DG)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_DG
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                19-JUN-2013 14:38:43
Uptime                    0 days 0 hr. 1 min. 42 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/q9db01/listener_dg/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_DG)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.5.64)(PORT=1522)))
The listener supports no services
The command completed successfully

配置listener_networks

--在RDBMS的tnsnames.ora中配置
Q9DB01_LOCAL_NET1 =(DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = q9db01-vip )(PORT = 1521)))
Q9DB01_LOCAL_NET2 =(DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = q9db01-dg-vip )(PORT = 1522)))
Q9DB02_LOCAL_NET1 =(DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = q9db02-vip )(PORT = 1521)))
Q9DB02_LOCAL_NET2 =(DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = q9db02-dg-vip )(PORT = 1522)))
Q9DB_REMOTE_NET2 =(DESCRIPTION_LIST =(DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = q9db01-dg-vip )
(PORT = 1522)))(DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = q9db02-dg-vip )(PORT = 1522))))
--配置listener_networks
----节点1
SQL> ALTER SYSTEM SET listener_networks='((NAME=network1)(LOCAL_LISTENER=Q9DB01_LOCAL_NET1)
     (REMOTE_LISTENER=q9dbscan:1521))','((NAME=network2)(LOCAL_LISTENER=Q9DB01_LOCAL_NET2)
     (REMOTE_LISTENER=Q9DB_REMOTE_NET2))'SCOPE=BOTH SID='q9db1';
System altered.
----节点2
SQL> ALTER SYSTEM SET listener_networks='((NAME=network1)(LOCAL_LISTENER=Q9DB02_LOCAL_NET1)
     (REMOTE_LISTENER=q9db-scan:1521))','((NAME=network2)(LOCAL_LISTENER=Q9DB02_LOCAL_NET2)
     (REMOTE_LISTENER=Q9DB_REMOTE_NET2))'SCOPE=BOTH SID='q9db2';
System altered.

查看监听状态

--节点1
[grid@q9db01 ~]$ lsnrctl status listener_dg
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 19-JUN-2013 17:12:45
Copyright (c) 1991, 2011, Oracle.  All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_DG)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_DG
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                19-JUN-2013 16:47:03
Uptime                    0 days 0 hr. 25 min. 42 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/q9db01/listener_dg/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_DG)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.5.64)(PORT=1522)))
Services Summary...
Service "q9db" has 2 instance(s).
  Instance "q9db1", status READY, has 2 handler(s) for this service...
  Instance "q9db2", status READY, has 1 handler(s) for this service...
The command completed successfully
--节点2
[grid@q9db02 ~]$ lsnrctl status listener_dg
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 19-JUN-2013 17:12:02
Copyright (c) 1991, 2011, Oracle.  All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_DG)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_DG
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                19-JUN-2013 16:52:24
Uptime                    0 days 0 hr. 19 min. 37 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/q9db02/listener_dg/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_DG)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.5.65)(PORT=1522)))
Services Summary...
Service "q9db" has 2 instance(s).
  Instance "q9db1", status READY, has 1 handler(s) for this service...
  Instance "q9db2", status READY, has 2 handler(s) for this service...
The command completed successfully

测试新监听

--RDBMS目录中tns配置
q9db1dg=
(DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = q9db01-dg-vip)(PORT = 1522))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SID = q9db1)
    )
 )
q9db2dg=
(DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = q9db02-dg-vip)(PORT = 1522))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SID = q9db2)
    )
 )
--验证结果
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> conn test/test@q9db1dg
Connected.
SQL> show parameter instance_name
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
instance_name                        string      q9db1
SQL> conn test/test@q9db2dg
Connected.
SQL>  show parameter instance_name
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
instance_name                        string      q9db2

到这里,可以正常的访问两个库,监听已经配置完成,数据库之间也可以使用特定网络