11.2.0.3 adg库因 bug 16427872 导致smon占用大量cpu

检查数据库发现客户有一套核心的ADG库smon进程负载异常,单进程一直持有cpu 100%

[oracle@q9adg01 trace]$ top -c
top - 14:00:14 up 83 days, 21:39,  4 users,  load average: 10.34, 11.55, 11.25
Tasks: 1162 total,   3 running, 1157 sleeping,   0 stopped,   2 zombie
Cpu(s):  1.7%us,  1.2%sy,  0.0%ni, 86.2%id, 10.7%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  264253752k total, 200445076k used, 63808676k free,   757684k buffers
Swap: 33554424k total,        0k used, 33554424k free,  6529220k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5707 oracle    25   0  150g  20m  16m R 99.9  0.0  14273:00 ora_smon_q9db1
 5285 oracle    16   0 13564 1952  820 R 31.5  0.0   0:02.49 top -c
 5713 oracle    18   0  150g  20m  17m S  5.3  0.0 410:01.33 ora_asmb_q9db1
 5821 oracle    15   0  150g  23m  17m S  5.3  0.0   4883:29 ora_lck0_q9db1
 7596 oracle    15   0  150g  69m  37m S  5.3  0.0   5368:28 ora_pr00_q9db1
[oracle@q9adg02 ~]$ top -c
top - 14:00:03 up 84 days, 19:36,  3 users,  load average: 6.46, 6.96, 6.76
Tasks: 1045 total,   5 running, 1040 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.8%us,  1.0%sy,  0.0%ni, 93.4%id,  3.7%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  264253752k total, 196879216k used, 67374536k free,   425320k buffers
Swap: 33554424k total,        0k used, 33554424k free,  4727836k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
11615 oracle    25   0  150g  16m  14m R 100.0  0.0  14272:55 ora_smon_q9db2
18173 oracle    16   0  150g  73m  37m D 18.6  0.0  24:33.91 oracleq9db2 (LOCAL=NO)
 6561 oracle    15   0  150g  31m  25m R 12.2  0.0   0:48.50 oracleq9db2 (LOCAL=NO)

数据库版本和patch信息

14:18:05 sys@Q9DB>select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL>  SELECT INST_ID,DATABASE_ROLE,OPEN_MODE FROM GV$DATABASE;
   INST_ID DATABASE_ROLE    OPEN_MODE
---------- ---------------- --------------------
         2 PHYSICAL STANDBY READ ONLY WITH APPLY
         1 PHYSICAL STANDBY READ ONLY WITH APPLY
SQL >select inst_id,STARTUP_TIME from gv$instance;
   INST_ID STARTUP_T
---------- ---------
         2 01-NOV-13
         1 01-NOV-13
[oracle@q9adg01 trace]$ /u01/app/oracle/product/11.2.0/db_1/OPatch/opatch  lspatches
16056266;Database Patch Set Update : 11.2.0.3.6 (16056266)
16315641;Grid Infrastructure Patch Set Update : 11.2.0.3.6 (16083653)

SYSAUX表空间增加数据文件

SQL> select ts# from v$tablespace where name='SYSAUX';
       TS#
----------
         2
SQL> select file#,name,creation_time from v$datafile where ts#=2;
     FILE# NAME                                               CREATION_
---------- -------------------------------------------------- ---------
         3 +DATA/q9db/datafile/sysaux.1412.818566605          12-MAR-08
       151 +DATA/q9db/datafile/sysaux.1431.818566885          26-MAR-12
       221 +DATA/q9db/datafile/sysaux.828.818547945           16-APR-12
      1744 +DATA/q9db_adg/datafile/sysaux.2050.835459505      29-DEC-13

核对数据库确实在2013年12月29日对SYSAUX表空间增加了数据文件而且未重启数据库,触发Bug 16427872 Standby SMON spins on CPU after add/drop SYSAUX datafile on primary
bug-16427872
Bug 16427872 Standby SMON spins on CPU after add/drop SYSAUX datafile on primary
在12.1.0.1中修复,在未修复前增加/删除sysaux的数据文件后,通过重启实例来解决该问题

multipath实现设备用户组设置

现在的Linux系统中,很多都会使用系统自带的multipath多路径软件,在以前的版本中,我们一般通过multipath+udev或者multipath+rc.local来实现多路径和权限设置,而在redhat 5.3-5.11的版本中multipath就直接可以实现多路径聚合、设备持久化、用户组设置
操作系统版本

[root@rac1 dev]# uname -r
2.6.39-300.26.1.el5uek
[root@rac1 dev]# more /etc/issue
Oracle Linux Server release 5.9
Kernel \r on an \m

fdisk记录

[root@rac1 dev]# fdisk -l
…………
Disk /dev/sdh: 134.2 GB, 134217728000 bytes
255 heads, 63 sectors/track, 16317 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
Disk /dev/sdi: 33.5 GB, 33554432000 bytes
64 heads, 32 sectors/track, 32000 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
   Device Boot      Start         End      Blocks   Id  System

multipath包
检查安装multipath相关包(该版本系统默认安装)

[root@rac1 dev]# rpm -aq|grep mapper
device-mapper-multipath-libs-0.4.9-56.0.3.el5
device-mapper-event-1.02.67-2.el5
device-mapper-1.02.67-2.el5
device-mapper-multipath-0.4.9-56.0.3.el5

获取wwid值

[root@rac1 dev]# /sbin/scsi_id -g -u -s /block/sdh
14f504e46494c45527049754962662d395751372d68356743
[root@rac1 dev]# /sbin/scsi_id -g -u -s /block/sdi
14f504e46494c4552484d486249782d464471382d354f4b58

获取uid和gid

[root@rac1 dev]# id grid
uid=1100(grid) gid=54321(oinstall) groups=54321(oinstall),1020(asmadmin),1021(asmdba)

multipath.conf配置

[root@rac1 dev]# vi /etc/multipath.conf
defaults {
user_friendly_names no
queue_without_daemon no
flush_on_last_del yes
max_fds max
}
blacklist {
devnode "^hd[a-z]"
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^cciss.*"
}
devices {
        device {
                vendor                  "OPNFILER "
                product                 "LUN"
                path_grouping_policy    group_by_prio
                features             "3 queue_if_no_path pg_init_retries 50"
                getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
                path_checker            tur
                path_selector           "round-robin 0"
                hardware_handler        "1 alua"
                failback               immediate
                rr_weight              uniform
                rr_min_io             128
        }
}
multipaths {
        multipath {
                wwid                    14f504e46494c45527049754962662d395751372d68356743  #wwid
                alias                   xifenfei128
                uid                     1100                                               #uid
                gid                     1020                                               #gid
        }
        multipath {
                wwid                    14f504e46494c4552484d486249782d464471382d354f4b58   #wwid
                alias                   xifenfei32
                uid                     1100                                                #uid
                gid                     1020                                                #gid
         }
}

启动multipath

[root@rac1 dev]# modprobe dm-multipath
[root@rac1 dev]# modprobe dm-round-robin
[root@rac1 dev]# chkconfig multipathd on
[root@rac1 dev]# service multipathd start
Starting multipathd daemon: [  OK  ]
[root@rac1 dev]# multipath -F
[root@rac1 dev]# multipath -v2
create: xifenfei128 (14f504e46494c45527049754962662d395751372d68356743) undef OPNFILER,VIRTUAL-DISK
size=125G features='0' hwhandler='0' wp=undef
`-+- policy='round-robin 0' prio=1 status=undef
  `- 3:0:0:9  sdh 8:112 undef ready  running
create: xifenfei32 (14f504e46494c4552484d486249782d464471382d354f4b58) undef OPNFILER,VIRTUAL-DISK
size=31G features='0' hwhandler='0' wp=undef
`-+- policy='round-robin 0' prio=1 status=undef
  `- 3:0:0:10 sdi 8:128 undef ready  running

查看生成多路径设备
注意设备名称、组、用户

[root@rac1 dev]# ls -l /dev/mapper/xifenfei*
brw-rw---- 1 grid asmadmin 252, 2 Jan  7 21:21 /dev/mapper/xifenfei128
brw-rw---- 1 grid asmadmin 252, 3 Jan  7 21:21 /dev/mapper/xifenfei32

补充Linux 6.x中udev设置所属组和权限
对于linux 6.x,multipath不能设置磁盘所属组和权限,可以通过udev进行实现,类似配置如下

[root@bxrac03 mapper]#cat 99-diskownership.rules
SUBSYSTEM!="block", GOTO="quickexit"
KERNEL!="dm-*", GOTO="quickexit"
PROGRAM=="/sbin/dmsetup info -c --noheadings -o name -m %m -j %M"
RESULT=="*ocr*", OWNER="grid", GROUP="oinstall", MODE="0660"
RESULT=="*oradata", OWNER="grid", GROUP="oinstall", MODE="0660"
RESULT=="*backup", OWNER="grid", GROUP="oinstall", MODE="0660"
LABEL="quickexit"

其中RESULT和dm的别名向匹配

数据库恢复检查脚本(Oracle Database Recovery Check)

Oracle Database Recovery Check 介绍
根据多年来的数据库恢复经验,提炼出来数据库恢复关键点信息收集脚本(Oracle Database Recovery Check),该脚本主要是在数据库mount状态情况下查询数据库的一些基础表信息等信息,不对数据库进行任何写操作(只做读和dump操作),不会在坏的数据库基础之上带来任何破坏,不影响任何数据库后续的恢复工作。通过该脚本收集信息能够快速定位数据库异常原因,并初步判断数据库恢复疑难程度,减少数据库异常恢复诊断时间,提供恢复效率和准确性。

Oracle Database Recovery Check下载
Oracle Database Recovery Check FOR ORACE 11G及其以前版本下载
Oracle Database Recovery Check For Linux/Uinx
Oracle Database Recovery Check For Windows
Oracle Database Recovery Check For ALL
Oracle Database Recovery Check FOR ORACE 12C下载
Oracle Database Recovery Check For Linux/Uinx
Oracle Database Recovery Check For Windows
Oracle Database Recovery Check For ALL

Oracle Database Recovery Check使用说明
1. 根据系统平台下载对应的版本,相应平台,使用unzip/winrar软件解压成sql文件
2. 上传到服务器之上(oracle用户需要有读取权限)
3. 启动数据库到mount状态
4. 在sqlplus中使用@path/check_recover_db.sql文件
5. 发送xifenfei_db_recover_YYYYMMDD.html到dba@xifenfei.com
6. 联系我:手机(17813235971)或者qq(107644445)

操作演示(所有平台,版本操作相同)
unix/liunx平台使用方法(for 12c以前版本,12c版本操作步骤完全相同)

[oracle@xifenfei ~]$ ls -l /home/oracle/check_recover_db_linux.sql
-rw-r--r-- 1 oracle oinstall 13008 Mar 30 10:36 check_recover_db_linux.sql
[oracle@xifenfei ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Sun Mar 30 10:55:58 2014
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup mount
ORACLE instance started.
Total System Global Area  418484224 bytes
Fixed Size                  1385052 bytes
Variable Size             331353508 bytes
Database Buffers           79691776 bytes
Redo Buffers                6053888 bytes
Database mounted.
SQL> @/home/oracle/check_recover_db_linux.sql
+----------------------------------------------------------------------------+
|                   Oracle Database Recovery Check Result                    |
|----------------------------------------------------------------------------+
|  Copyright (c) 2012-2014 xifenfei. All rights reserved. (www.xifenfei.com) |
+----------------------------------------------------------------------------+
Please start the database to mount state.
Note: Do not modify any inspection results
Please refer to the use of the script:http://www.xifenfei.com/5056.html
To send xifenfei_db_recover_YYYYMMDD.html to dba@xifenfei.com or QQ(107644445)
-----Oracle Database Recovery Check STRAT-----
----Starting Collect Data Dictionary Information----
----Starting Collect PATCH Information----
----Starting Collect ALERT LOG Information----
----Starting DUMP file_hdrs Information----
----Starting DUMP controlf Information----
----Starting DUMP redohdr Information----
-----Oracle Database Recovery Check END-----
********************************************************************************
Please check and upload /home/oracle/xifenfei_db_recover_20140330.html
********************************************************************************
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
生成的html文件为:/home/oracle/xifenfei_db_recover_20140330.html

win平台使用方法(for 12c版本,12c以前版本操作步骤完全相同)

C:\Users\XIFENFEI>dir E:\SkyDrive\ORACLE\tools\db_recover\check_recover_db_win_12c.sql
 驱动器 E 中的卷是 本地磁盘
 卷的序列号是 000C-3B41
 E:\SkyDrive\ORACLE\tools\db_recover 的目录
2014-03-29  23:43            11,101 check_recover_db_win_12c.sql
               1 个文件         11,101 字节
               0 个目录 19,557,310,464 可用字节
C:\Users\XIFENFEI>sqlplus / as sysdba
SQL*Plus: Release 12.1.0.1.0 Production on 星期日 3月 30 11:01:37 2014
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
已连接到空闲例程。
SQL> startup mount
ORACLE 例程已经启动。
Total System Global Area  400846848 bytes
Fixed Size                  2440024 bytes
Variable Size             289408168 bytes
Database Buffers          100663296 bytes
Redo Buffers                8335360 bytes
SQL> @E:\SkyDrive\ORACLE\tools\db_recover\check_recover_db_win_12c.sql
+----------------------------------------------------------------------------+
|                   Oracle Database Recovery Check Result                    |
|----------------------------------------------------------------------------+
|  Copyright (c) 2012-2014 xifenfei. All rights reserved. (www.xifenfei.com) |
+----------------------------------------------------------------------------+
Please start the database to mount state.
Note: Do not modify any inspection results
Please refer to the use of the script:http://www.xifenfei.com/5056.html
To send xifenfei_db_recover_YYYYMMDD.html to dba@xifenfei.com or QQ(107644445)
 驱动器 C 中的卷没有标签。
 卷的序列号是 D053-8FE1
 C:\Users\XIFENFEI 的目录
2014-03-30  11:02            32,796 xifenfei_db_recover_20140330.html
               1 个文件         32,796 字节
               0 个目录  8,444,190,720 可用字节
********************************************************************************
Please check and upload "xifenfei_db_recover_YYYYMMDD.html" in current directory
********************************************************************************
从 Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options 断开
生成的html文件为C:\Users\XIFENFEI\xifenfei_db_recover_20140330.html

因asm sga_target设置不当导致11gr2 rac无法正常启动

2014年第一个故障排查和解决:同事反馈给我说solaris 11.2 两节点rac无法启动,让我帮忙看下。通过分析是因为sga_target参数设置不合理导致asm无法正常启动
GI无法正常启动

grid@zwq-rpt1:~$crsctl status resource -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
grid@zwq-rpt1:~$crsctl status resource -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                               Instance Shutdown
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       zwq-rpt1
ora.crf
      1        ONLINE  ONLINE       zwq-rpt1
ora.crsd
      1        ONLINE  OFFLINE
ora.cssd
      1        ONLINE  ONLINE       zwq-rpt1
ora.cssdmonitor
      1        ONLINE  ONLINE       zwq-rpt1
ora.ctssd
      1        ONLINE  ONLINE       zwq-rpt1                 ACTIVE:0
ora.diskmon
      1        OFFLINE OFFLINE
ora.evmd
      1        ONLINE  INTERMEDIATE zwq-rpt1
ora.gipcd
      1        ONLINE  ONLINE       zwq-rpt1
ora.gpnpd
      1        ONLINE  ONLINE       zwq-rpt1
ora.mdnsd
      1        ONLINE  ONLINE       zwq-rpt1

asm未正常启动

GI日志报错

2014-01-01 00:40:47.708
[cssd(1418)]CRS-1605:CSSD voting file is online: /dev/rdsk/emcpower0a; details in /export/home/app/grid/log/zwq-rpt1/cssd/ocssd.log.
2014-01-01 00:40:53.234
[cssd(1418)]CRS-1601:CSSD Reconfiguration complete. Active nodes are zwq-rpt1 zwq-rpt2 .
2014-01-01 00:40:56.659
[ctssd(1483)]CRS-2407:The new Cluster Time Synchronization Service reference node is host zwq-rpt2.
2014-01-01 00:40:56.661
[ctssd(1483)]CRS-2401:The Cluster Time Synchronization Service started on host zwq-rpt1.
2014-01-01 00:41:02.016
[ctssd(1483)]CRS-2408:The clock on host zwq-rpt1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
2014-01-01 00:43:23.874
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".
2014-01-01 00:45:42.837
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".
2014-01-01 00:48:02.087
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".
2014-01-01 00:48:18.836
[ohasd(1083)]CRS-2807:Resource 'ora.asm' failed to start automatically.
2014-01-01 00:48:18.837
[ohasd(1083)]CRS-2807:Resource 'ora.crsd' failed to start automatically.
2014-01-01 01:05:15.396
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [CRSDG], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".
2014-01-01 01:05:45.101
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [CRSDG], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".
2014-01-01 01:06:15.104
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [CRSDG], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".

这里较为明显的看到,因为asm磁盘组异常导致ocr无法被访问导致crs无法正常启动

ORAAGENT日志

2014-01-01 00:43:23.870: [ora.asm][9] {0:0:2} [start] InstConnection::connectInt (2) Exception OCIException
2014-01-01 00:43:23.870: [ora.asm][9] {0:0:2} [start] InstConnection:connect:excp OCIException OCI error 604
2014-01-01 00:43:23.870: [ora.asm][9] {0:0:2} [start] DgpAgent::queryDgStatus excp ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^34f764db","kglHeapInitialize:temp")

报了较为清晰的ORA-4031错误,检查asm日志

ASM日志报错

Wed Jan 01 00:47:33 2014
ORACLE_BASE not set in environment. It is recommended
that ORACLE_BASE be set in the environment
Reusing ORACLE_BASE from an earlier startup = /export/home/app/oracle
Wed Jan 01 00:47:39 2014
Errors in file /export/home/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_1728.trc  (incident=291447):
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^34f764db","kglHeapInitialize:temp")
Incident details in: /export/home/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_291447/+ASM1_ora_1728_i291447.trc
Wed Jan 01 00:47:48 2014
Dumping diagnostic data in directory=[cdmp_20140101004748], requested by (instance=1, osid=1728), summary=[incident=291447].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Jan 01 00:47:53 2014
Errors in file /export/home/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_1730.trc  (incident=291448):
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^34f764db","kglHeapInitialize:temp")
Incident details in: /export/home/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_291448/+ASM1_ora_1730_i291448.trc
Wed Jan 01 00:48:01 2014
Dumping diagnostic data in directory=[cdmp_20140101004801], requested by (instance=1, osid=1730), summary=[incident=291448].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Jan 01 00:48:07 2014
Errors in file /export/home/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_1732.trc  (incident=291449):
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^34f764db","kglHeapInitialize:temp")
Incident details in: /export/home/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_291449/+ASM1_ora_1732_i291449.trc
Wed Jan 01 00:48:16 2014
Dumping diagnostic data in directory=[cdmp_20140101004816], requested by (instance=1, osid=1732), summary=[incident=291449].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Jan 01 00:48:16 2014
License high water mark = 1
USER (ospid: 1736): terminating the instance
Instance terminated by USER, pid = 1736

这里可以清晰的看到,因为shared pool不足,导致asm报ora-4031错误,从而使得asm无法正常启动

分析原因

Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options.
ORACLE_HOME = /export/home/app/grid
System name:	SunOS
Node name:	zwq-rpt1
Release:	5.11
Version:	11.1
Machine:	sun4v
Using parameter settings in server-side spfile +CRSDG/zwq-rpt-cluster/asmparameterfile/registry.253.823992831
System parameters with non-default values:
  sga_max_size             = 2G
  large_pool_size          = 16M
  instance_type            = "asm"
  sga_target               = 0
  remote_login_passwordfile= "EXCLUSIVE"
  asm_diskstring           = "/dev/rdsk/*"
  asm_diskgroups           = "FRADG"
  asm_diskgroups           = "DATADG"
  asm_power_limit          = 1
  diagnostic_dest          = "/export/home/app/oracle"

这里可以看到sga_target被设置为了0,而shared pool又未被配置,这里因为shared pool不足从而出现了ORA-4031,从而导致crs在启动asm的过程失败,从而使得ocr不能被访问,进而使得crs不能正常启动.

处理方法
1.编辑pfile

grid@zwq-rpt1:/export/home/app/oracle/diag/asm/+asm/+ASM1/trace$vi /tmp/asm.pfile
  memory_target = 2G
  large_pool_size          = 16M
  instance_type            = "asm"
  sga_target               = 0
  remote_login_passwordfile= "EXCLUSIVE"
  asm_diskstring           = "/dev/rdsk/*"
  asm_diskgroups           = "FRADG"
  asm_diskgroups           = "DATADG"
  asm_power_limit          = 1
  diagnostic_dest          = "/export/home/app/oracle"

2.启动asm

grid@zwq-rpt1:/export/home/app/oracle/diag/asm/+asm/+ASM1/trace$sqlplus / as sysasm
SQL*Plus: Release 11.2.0.3.0 Production on Wed Jan 1 01:04:10 2014
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup pfile='/tmp/asm.pfile'
ASM instance started
Total System Global Area 2138521600 bytes
Fixed Size                  2161024 bytes
Variable Size            2102806144 bytes
ASM Cache                  33554432 bytes
ASM diskgroups mounted

3. 创建spfile

SQL> create spfile='+CRSDG' FROM PFILE='/tmp/asm.pfile';
File created.
--asm alert日志
Wed Jan 01 01:08:59 2014
NOTE: updated gpnp profile ASM SPFILE to
NOTE: updated gpnp profile ASM diskstring: /dev/rdsk/*
NOTE: updated gpnp profile ASM diskstring: /dev/rdsk/*
NOTE: updated gpnp profile ASM SPFILE to +CRSDG/zwq-rpt-cluster/asmparameterfile/registry.253.835664939

4. 关闭asm

SQL> shutdown immediate
ORA-15097: cannot SHUTDOWN ASM instance with connected client (process 1971)
SQL> shutdown abort
ASM instance shutdown

5. 重启crs

root@zwq-rpt1:~# crsctl stop crs -f
root@zwq-rpt1:~# crsctl start crs

6. 重启其他节点crs

root@zwq-rpt2:~# crsctl stop crs -f
root@zwq-rpt2:~# crsctl start crs

7. 检查结果

root@zwq-rpt1:~# crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRSDG.dg
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
ora.DATADG.dg
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
ora.FRADG.dg
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
ora.LISTENER.lsnr
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
ora.asm
               ONLINE  ONLINE       zwq-rpt1                 Started
               ONLINE  ONLINE       zwq-rpt2                 Started
ora.gsd
               OFFLINE OFFLINE      zwq-rpt1
               OFFLINE OFFLINE      zwq-rpt2
ora.net1.network
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
ora.ons
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       zwq-rpt1
ora.cvu
      1        ONLINE  ONLINE       zwq-rpt1
ora.oc4j
      1        ONLINE  ONLINE       zwq-rpt1
ora.rptdb.db
      1        ONLINE  ONLINE       zwq-rpt1                 Open
      2        ONLINE  ONLINE       zwq-rpt2                 Open
ora.scan1.vip
      1        ONLINE  ONLINE       zwq-rpt1
ora.zwq-rpt1.vip
      1        ONLINE  ONLINE       zwq-rpt1
ora.zwq-rpt2.vip
      1        ONLINE  ONLINE       zwq-rpt2

至此恢复正常,2014年第一个故障顺利解决

ORACLE DUL汇总

oracle数据库恢复三板斧,最大限度减少因为ORACLE不能open导致的数据损失
第一板:HIDE PARAMETER AND EVENT
第二板:BBED
第三板:DUL

当我们使用第一和第二板斧头无法解决问题之时,我们就需要考虑使用ORACLE数据库恢复终极工具DUL,这里对于dul的相关测试进行总结,便于查询
dul处理分区表
Oracle dul支持18c
关于dul有效期描述
dul恢复drop表测试
dul抽取异常asm文件
oracle dul 11 正式发布
dul 10支持oracle 11g r2
使用dul恢复asm中数据
dul恢复truncate表测试
使用 dul 挖数据文件初试
DUL挖ORACLE 8.0数据库
dul实现对数据文件内容更新
DUL10直接支持ORACLE 8.0
Oracle dul支持Oracle 12.2(12c)
最新版Oracle dul支持Oracle 7.2.3
dul 10 export_mode=true功能增强
dul实现exp dump文件转换sqlldr格式
dul支持ORACLE 12C CDB数据库恢复
dul实现expdp dump文件转换sqlldr格式
使用DUL挖数据文件恢复非数据外对象方法
dul无法加载bootstrap实现unload table/user恢复
为推进国内DUL的发展,欢迎在DUL使用过程中的问题探讨.如果在数据库恢复中需要使用dul,或者你们使用dul无法解决问题,欢迎联系我们(专业Oracle数据库恢复技术支持)协助你解决问题

ORACLE 12C RAC hub AND leaf 相互转换

感谢Lunar的指导,完成ORACLE 12C RAC hub和leaf相互转换,参考官方文档Oracle Flex Clusters部分
当前数据库状态

--集群状态
[root@rac1 ~]# crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.DATA.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.SYSDB_NEW.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.SYSDG.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.net1.network
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.ons
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.proxy_advm
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac1                     STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       rac2                     STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       rac2                     169.254.177.226 10.1
                                                             .1.104,STABLE
ora.asm
      1        ONLINE  ONLINE       rac1                     STABLE
      2        ONLINE  ONLINE       rac2                     STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       rac2                     STABLE
ora.gns
      1        ONLINE  ONLINE       rac2                     STABLE
ora.gns.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       rac2                     Open,STABLE
ora.oc4j
      1        ONLINE  ONLINE       rac2                     STABLE
ora.ora12c.db
      1        ONLINE  ONLINE       rac1                     Open,STABLE
      2        ONLINE  ONLINE       rac2                     Open,STABLE
ora.rac1.vip
      1        ONLINE  ONLINE       rac1                     STABLE
ora.rac2.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       rac1                     STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       rac2                     STABLE
--------------------------------------------------------------------------------
--rac运行在flex模式
[root@rac1 ~]#  crsctl get cluster mode status
Cluster is running in "flex" mode
--asm运行在flex模式
[grid@rac1 ~]$ asmcmd
ASMCMD> showclustermode
ASM cluster : Flex mode enabled
--节点角色
[root@rac1 ~]# crsctl get node role config
Node 'rac1' configured role is 'hub'
[root@rac2 ~]# crsctl get node role config
Node 'rac2' configured role is 'hub'

转换hub to leaf

--转换hub为leaf
[root@rac1 ~]# crsctl set node role leaf
CRS-4408: Node 'rac1' configured role successfully changed; restart Oracle High Availability Services for new role to take effect.
--关闭集群
[root@rac1 ~]# crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.crsd' on 'rac1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'rac1'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'rac1'
CRS-2673: Attempting to stop 'ora.SYSDB_NEW.dg' on 'rac1'
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'rac1'
CRS-2673: Attempting to stop 'ora.SYSDG.dg' on 'rac1'
CRS-2673: Attempting to stop 'ora.ora12c.db' on 'rac1'
CRS-2673: Attempting to stop 'ora.proxy_advm' on 'rac1'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.rac1.vip' on 'rac1'
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'rac1'
CRS-2677: Stop of 'ora.rac1.vip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.rac1.vip' on 'rac2'
CRS-2677: Stop of 'ora.ora12c.db' on 'rac1' succeeded
CRS-2677: Stop of 'ora.scan1.vip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.scan1.vip' on 'rac2'
CRS-2676: Start of 'ora.rac1.vip' on 'rac2' succeeded
CRS-2677: Stop of 'ora.SYSDB_NEW.dg' on 'rac1' succeeded
CRS-2676: Start of 'ora.scan1.vip' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN1.lsnr' on 'rac2'
CRS-2677: Stop of 'ora.DATA.dg' on 'rac1' succeeded
CRS-2677: Stop of 'ora.SYSDG.dg' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'rac1'
CRS-2677: Stop of 'ora.asm' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.ASMNET1LSNR_ASM.lsnr' on 'rac1'
CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'rac2' succeeded
CRS-2677: Stop of 'ora.proxy_advm' on 'rac1' succeeded
CRS-2677: Stop of 'ora.ASMNET1LSNR_ASM.lsnr' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.ons' on 'rac1'
CRS-2677: Stop of 'ora.ons' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'rac1'
CRS-2677: Stop of 'ora.net1.network' on 'rac1' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'rac1' has completed
CRS-2677: Stop of 'ora.crsd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.storage' on 'rac1'
CRS-2673: Attempting to stop 'ora.crf' on 'rac1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'rac1'
CRS-2677: Stop of 'ora.storage' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'rac1'
CRS-2677: Stop of 'ora.drivers.acfs' on 'rac1' succeeded
CRS-2677: Stop of 'ora.crf' on 'rac1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.asm' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'rac1'
CRS-2673: Attempting to stop 'ora.evmd' on 'rac1'
CRS-2677: Stop of 'ora.evmd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rac1'
CRS-2677: Stop of 'ora.cssd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1'
CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
--启动集群
[root@rac1 ~]# crsctl start crs -wait
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2672: Attempting to start 'ora.evmd' on 'rac1'
CRS-2676: Start of 'ora.evmd' on 'rac1' succeeded
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2789: Cannot stop resource 'ora.diskmon' as it is not running on server 'rac1'
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'rac1'
CRS-2676: Start of 'ora.storage' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'rac1'
CRS-2676: Start of 'ora.crf' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rac1'
CRS-2676: Start of 'ora.crsd' on 'rac1' succeeded
CRS-6017: Processing resource auto-start for servers: rac1
CRS-6016: Resource auto-start has completed for server rac1
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
--hub转换为leaf后状态
[root@rac1 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       rac2                     STABLE
ora.DATA.dg
               ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac2                     STABLE
ora.SYSDB_NEW.dg
               ONLINE  ONLINE       rac2                     STABLE
ora.SYSDG.dg
               ONLINE  ONLINE       rac2                     STABLE
ora.net1.network
               ONLINE  ONLINE       rac2                     STABLE
ora.ons
               ONLINE  ONLINE       rac2                     STABLE
ora.proxy_advm
               ONLINE  ONLINE       rac2                     STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       rac2                     STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       rac2                     169.254.177.226 10.1
                                                             .1.104,STABLE
ora.asm
      1        ONLINE  OFFLINE                               STABLE
      2        ONLINE  ONLINE       rac2                     STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       rac2                     STABLE
ora.gns
      1        ONLINE  ONLINE       rac2                     STABLE
ora.gns.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       rac2                     Open,STABLE
ora.oc4j
      1        ONLINE  ONLINE       rac2                     STABLE
ora.ora12c.db
      1        ONLINE  OFFLINE                               Instance Shutdown,ST
                                                             ABLE
      2        ONLINE  ONLINE       rac2                     Open,STABLE
ora.rac1.vip
      1        ONLINE  INTERMEDIATE rac2                     FAILED OVER,STABLE
ora.rac2.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       rac2                     STABLE
--------------------------------------------------------------------------------
--集群角色
[root@rac1 ~]# crsctl get node role config
Node 'rac1' configured role is 'leaf'
[root@rac2 ~]# crsctl get node role config
Node 'rac2' configured role is 'hub'

leaf转换为hub

--leaf转换为hub
[root@rac1 ~]# crsctl set node role hub
CRS-4408: Node 'rac1' configured role successfully changed; restart Oracle High Availability Services for new role to take effect.
--关闭集群
[root@rac1 ~]# crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.crsd' on 'rac1'
CRS-2677: Stop of 'ora.crsd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.storage' on 'rac1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'rac1'
CRS-2677: Stop of 'ora.storage' on 'rac1' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.crf' on 'rac1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'rac1'
CRS-2673: Attempting to stop 'ora.evmd' on 'rac1'
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.crf' on 'rac1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rac1'
CRS-2677: Stop of 'ora.cssd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1'
CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
--启动集群
[root@rac1 ~]# crsctl start crs -wait
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2672: Attempting to start 'ora.evmd' on 'rac1'
CRS-2676: Start of 'ora.evmd' on 'rac1' succeeded
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2789: Cannot stop resource 'ora.diskmon' as it is not running on server 'rac1'
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'
CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2676: Start of 'ora.asm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'rac1'
CRS-2676: Start of 'ora.storage' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'rac1'
CRS-2676: Start of 'ora.crf' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rac1'
CRS-2676: Start of 'ora.crsd' on 'rac1' succeeded
CRS-6017: Processing resource auto-start for servers: rac1
CRS-2672: Attempting to start 'ora.ons' on 'rac1'
CRS-2673: Attempting to stop 'ora.rac1.vip' on 'rac2'
CRS-2672: Attempting to start 'ora.ASMNET1LSNR_ASM.lsnr' on 'rac1'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'rac2'
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'rac2'
CRS-2677: Stop of 'ora.rac1.vip' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.rac1.vip' on 'rac1'
CRS-2677: Stop of 'ora.scan1.vip' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.scan1.vip' on 'rac1'
CRS-2676: Start of 'ora.rac1.vip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.LISTENER.lsnr' on 'rac1'
CRS-2676: Start of 'ora.scan1.vip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN1.lsnr' on 'rac1'
CRS-2676: Start of 'ora.ASMNET1LSNR_ASM.lsnr' on 'rac1' succeeded
CRS-2676: Start of 'ora.ons' on 'rac1' succeeded
CRS-2676: Start of 'ora.LISTENER.lsnr' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'rac1' succeeded
CRS-2676: Start of 'ora.asm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.proxy_advm' on 'rac1'
CRS-2676: Start of 'ora.proxy_advm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.ora12c.db' on 'rac1'
CRS-2676: Start of 'ora.ora12c.db' on 'rac1' succeeded
CRS-6016: Resource auto-start has completed for server rac1
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
--集群状态
[root@rac1 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.DATA.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.SYSDB_NEW.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.SYSDG.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.net1.network
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.ons
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.proxy_advm
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac1                     STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       rac2                     STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       rac2                     169.254.177.226 10.1
                                                             .1.104,STABLE
ora.asm
      1        ONLINE  ONLINE       rac1                     STABLE
      2        ONLINE  ONLINE       rac2                     STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       rac2                     STABLE
ora.gns
      1        ONLINE  ONLINE       rac2                     STABLE
ora.gns.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       rac2                     Open,STABLE
ora.oc4j
      1        ONLINE  ONLINE       rac2                     STABLE
ora.ora12c.db
      1        ONLINE  ONLINE       rac1                     Open,STABLE
      2        ONLINE  ONLINE       rac2                     Open,STABLE
ora.rac1.vip
      1        ONLINE  ONLINE       rac1                     STABLE
ora.rac2.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       rac1                     STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       rac2                     STABLE
--------------------------------------------------------------------------------
--集群角色
[root@rac1 ~]# crsctl get node role config
Node 'rac1' configured role is 'hub'
[root@rac2 ~]# crsctl get node role config
Node 'rac2' configured role is 'hub'

这里实现了ORACLE 12C RAC的leaf和hub 角色相互转换,在转换的过程中需要转移确认集群和ASM均为flex mode,如果参考相关文档完成转换

OLR相关维护

官方关于OLR描述
OLR is a registry similar to OCR located on each node in a cluster, but contains information specific to each node. It contains manageability information about Oracle Clusterware, including dependencies between various services. Oracle High Availability Services uses this information. OLR is located on local storage on each node in a cluster. Its default location is in the path Grid_home/cdata/host_name.olr, where Grid_home is the Oracle Grid Infrastructure home, and host_name is the host name of the node.
OLR是类似OCR的东西,存储在集群的每个节点本地

查看OLR位置

[root@rac2 cdata]# cd /etc/oracle
[root@rac2 oracle]# ls -l
total 2868
drwxrwx--- 2 root oinstall    4096 Nov 24 20:00 lastgasp
drwxrwxrwt 2 root oinstall    4096 Dec 21 20:51 maps
-rw-r--r-- 1 root oinstall      96 Nov 25 18:38 ocr.loc
-rw-r--r-- 1 root root           0 Nov 24 19:58 ocr.loc.orig
-rw-r--r-- 1 root oinstall      80 Nov 24 19:58 olr.loc
-rw-r--r-- 1 root root           0 Nov 24 19:58 olr.loc.orig
drwxrwxr-x 5 root oinstall    4096 Nov 24 19:57 oprocd
drwxr-xr-x 3 root oinstall    4096 Nov 24 19:57 scls_scr
-rws--x--- 1 root oinstall 2904377 Nov 24 19:57 setasmgid
[root@rac2 oracle]# more olr.loc
olrconfig_loc=/u01/app/12.1.0/grid/cdata/rac2.olr
crs_home=/u01/app/12.1.0/grid
--在部分平台olr.loc文件可能在/var/opt/oracle/目录下
[root@rac2 oracle]#  ocrcheck -config -local
Oracle Local Registry configuration is :
         Device/File Name         : /u01/app/12.1.0/grid/cdata/rac2.olr
[root@rac2 oracle]# ocrcheck -local
Status of Oracle Local Registry is as follows :
         Version                  :          4
         Total space (kbytes)     :     409568
         Used space (kbytes)      :        996
         Available space (kbytes) :     408572
         ID                       :  816087519
         Device/File Name         : /u01/app/12.1.0/grid/cdata/rac2.olr
                                    Device/File integrity check succeeded
         Local registry integrity check succeeded
         Logical corruption check succeeded
[root@rac2 oracle]# ls -l /u01/app/12.1.0/grid/cdata/rac2.olr
-rw------- 1 root oinstall 503484416 Dec 22 12:09 /u01/app/12.1.0/grid/cdata/rac2.olr

查看OLR备份

[root@rac2 oracle]# ocrconfig -local -showbackup
rac2     2013/11/24 20:02:38     /u01/app/12.1.0/grid/cdata/rac2/backup_20131124_200238.olr

备份OLR

[root@rac2 oracle]# ocrconfig -local -manualbackup
rac2     2013/12/22 12:09:33     /u01/app/12.1.0/grid/cdata/rac2/backup_20131222_120933.olr
rac2     2013/11/24 20:02:38     /u01/app/12.1.0/grid/cdata/rac2/backup_20131124_200238.olr
[root@rac2 oracle]# ls -l /u01/app/12.1.0/grid/cdata/rac2/
total 1908
-rw-r--r-- 1 root root  860160 Nov 24 20:02 backup_20131124_200238.olr
-rw-r--r-- 1 root root 1085440 Dec 22 12:09 backup_20131222_120933.olr

OLR异常恢复

--破坏OLR
[root@rac2 oracle]# ls -l /u01/app/12.1.0/grid/cdata/rac2.olr
-rw------- 1 root oinstall 503484416 Dec 22 12:09 /u01/app/12.1.0/grid/cdata/rac2.olr
[root@rac2 oracle]# /u01/app/12.1.0/grid/cdata/rac2.olr /u01/app/12.1.0/grid/cdata/rac2.olr_bak
--关闭crs
[root@rac2 oracle]# crsctl stop crs
--启动crs报错
[root@rac2 oracle]# crsctl start crs
PROCL-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
CRS-4000: Command Start failed, or completed with errors.
--跟踪crs启动
[root@rac2 oracle]# strace crsctl start crs
……
uname({sys="Linux", node="rac2", ...})  = 0
open("/etc/oracle/olr.loc", O_RDONLY)   = 14
fstat(14, {st_mode=S_IFREG|0644, st_size=80, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd8ac628000
read(14, "olrconfig_loc=/u01/app/12.1.0/gr"..., 4096) = 80
read(14, "", 4096)                      = 0
close(14)                               = 0
munmap(0x7fd8ac628000, 4096)            = 0
stat("/u01/app/12.1.0/grid/cdata/rac2.olr", 0x7fffa215a580) = -1 ENOENT (No such file or directory)
--这里可以看到先是读取/etc/oracle/olr.loc,然后获取/u01/app/12.1.0/grid/cdata/rac2.olr失败
……
--确定ohasd.bin关闭
[root@rac2 cdata]# ps -ef|grep ohasd
root     15715 31578  0 14:34 pts/3    00:00:00 grep ohasd
--还原OLR
[root@rac2 oracle]# ocrconfig -local -restore /u01/app/12.1.0/grid/cdata/rac2/backup_20131124_200238.olr
PROTL-35: The configured OLR location is not accessible
[root@rac2 oracle]# cd /u01/app/12.1.0/grid/cdata/
[root@rac2 cdata]# ls
localhost  rac12c-cluster  rac2  rac2.olr_bak
[root@rac2 cdata]# touch rac2.olr
[root@rac2 cdata]# chmod 600 rac2.olr
[root@rac2 cdata]# ocrconfig -local -restore /u01/app/12.1.0/grid/cdata/rac2/backup_20131124_200238.olr
--确定还原成功
[root@rac2 cdata]# ls -l
total 84200
drwxr-xr-x 2 grid oinstall      4096 Nov 24 19:37 localhost
drwxrwxr-x 2 grid oinstall      4096 Dec 22 09:07 rac12c-cluster
drwxr-xr-x 2 grid oinstall      4096 Dec 22 12:09 rac2
-rw------- 1 root root     503484416 Dec 22 14:29 rac2.olr
-rw------- 1 root oinstall 503484416 Dec 22 12:43 rac2.olr_bak
--启动crs
[root@rac2 cdata]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

其他OLR命令

To export OLR to a file:
# ocrconfig –local –export file_name
To import a specified file to OLR:
# ocrconfig –local –import file_name
To view the contents of the OLR file:
ocrdump -local file_name
To view the contents of the OLR backup file:
ocrdump -local -backupfile olr_backup_file_name
To change the OLR backup location:
ocrconfig -local -backuploc new_olr_backup_path

当OLR异常时,RAC节点不能正常启动,而且OLR不像OCR会定时自动备份,建议人工定时备份OLR

通过bbed替换bootstarp$表

在11G和12C中,我们可以通过DBMS_DDL_INTERNAL.SWAP_BOOTSTRAP过程来替换bootstarp$表(见:替换bootstarp$表),但是对于10G或者其他版本,oracle没有提供相关程序来完成使用其他表替换bootstarp$,通过分析,使用bbed修改root rdba也可以完成DBMS_DDL_INTERNAL.SWAP_BOOTSTRAP的任务

SQL> SELECT * FROM V$VERSION;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Prod
PL/SQL Release 10.2.0.3.0 - Production
CORE    10.2.0.3.0      Production
TNS for 32-bit Windows: Version 10.2.0.3.0 - Production
NLSRTL Version 10.2.0.3.0 - Production
SQL> DESC DBMS_DDL_INTERNAL
PROCEDURE CHECK_TRIGGER_FIRING_PROPERTY
参数名称                       类型                    输入/输出默认值?
------------------------------ ----------------------- ------ --------
 TRIG_OWNER                     VARCHAR2                IN
 TRIG_NAME                      VARCHAR2                IN
 CANON_OWNER                    VARCHAR2                OUT
 CANON_ONAME                    VARCHAR2                OUT
 P_PROPERTY                     NUMBER                  IN/OUT
 UNSUPPORTED_TRIG               BOOLEAN                 OUT
FUNCTION HAS_ALTER_ANY_TRIGGER_PRIV RETURNS BOOLEAN
参数名称                       类型                    输入/输出默认值?
------------------------------ ----------------------- ------ --------
 P_USER                         VARCHAR2                IN
 P_TRIG_PROPERTY                NUMBER                  IN
FUNCTION HAS_EXP_IMP_PRIV RETURNS BOOLEAN
参数名称                       类型                    输入/输出默认值?
------------------------------ ----------------------- ------ --------
 P_UID                          NUMBER                  IN
 P_PRIVS_TO_CHECK               VARCHAR2                IN
FUNCTION IS_DDL_TRIGGER RETURNS BOOLEAN
参数名称                       类型                    输入/输出默认值?
------------------------------ ----------------------- ------ --------
 SYS_EVTS                       NUMBER                  IN

跟踪数据库启动过程

SQL> startup mount
ORACLE 例程已经启动。
Total System Global Area  209715200 bytes
Fixed Size                  1289724 bytes
Variable Size             100663812 bytes
Database Buffers          100663296 bytes
Redo Buffers                7098368 bytes
数据库装载完毕。
SQL> oradebug setmypid
已处理的语句
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
已处理的语句
SQL> alter session set db_file_multiblocK_read_count=1;
会话已更改。
SQL> oradebug TRACEFILE_NAME
e:\oracle\product\10.2.0\admin\ora10g\udump\ora10g_ora_8360.trc
SQL> alter database Open;
数据库已更改。
SQL> oradebug EVENT 10046 trace name context off
已处理的语句

阅读10046 trace文件

WAIT #1: nam='instance state change' ela= 28 layer=2 value=1 waited=1 obj#=-1 tim=377999209439
WAIT #1: nam='db file sequential read' ela= 94860 file#=1 block#=377 blocks=1 obj#=-1 tim=377999304467
=====================
PARSING IN CURSOR #2 len=188 dep=1 uid=0 oct=1 lid=0 tim=377999305344 hv=1365064427 ad='8baee680'
create table bootstrap$ ( line#         number not null,   obj#           number not null,   sql_text   varchar2(4000) not null)   storage (initial 50K objno 56 extents (file 1 block 377))
END OF STMT
PARSE #2:c=0,e=662,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=377999305341
BINDS #2:
EXEC #2:c=0,e=102,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=377999305545
=====================
PARSING IN CURSOR #2 len=55 dep=1 uid=0 oct=3 lid=0 tim=377999305925 hv=2111436465 ad='8baedf0c'
select line#, sql_text from bootstrap$ where obj# != :1
END OF STMT
PARSE #2:c=0,e=308,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=377999305922
BINDS #2:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=00288360  bln=22  avl=02  flg=05
  value=56
EXEC #2:c=0,e=580,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=377999306621
WAIT #2: nam='db file sequential read' ela= 328 file#=1 block#=377 blocks=1 obj#=-1 tim=377999307005
WAIT #2: nam='db file sequential read' ela= 345 file#=1 block#=378 blocks=1 obj#=-1 tim=377999307423

这里可以发现,数据库是在启动的时候读file 1 block 377,然后create table bootstrap$(注意:这里的语句小写),对于bootstarp$的查询除掉了obj#<>56

分析bootstarp$对象

SQL> select header_file,header_block from dba_segments where segment_name='BOOTSTRAP$';
HEADER_FILE HEADER_BLOCK
----------- ------------
          1          377
SQL> SELECT OBJECT_ID FROM DBA_OBJECTS WHERE OBJECT_NAME='BOOTSTRAP$';
 OBJECT_ID
----------
        56
SQL> select line#, sql_text from bootstrap$ where obj# =56;
     LINE#
----------
SQL_TEXT
--------------------------------------------------------------------------------
        56
CREATE TABLE BOOTSTRAP$("LINE#" NUMBER NOT NULL,"OBJ#" NUMBER NOT NULL,"SQL_TEXT
" VARCHAR2(4000) NOT NULL) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 STORAGE
 (  INITIAL 56K NEXT 1024K MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 OBJN
O 56 EXTENTS (FILE 1 BLOCK 377))

这里可以发现bootstrap$中obj#=56的那条记录为CREATE TABLE BOOTSTRAP$(注意:该表里面保存为大写)

bbed查看root rdba

C:\Windows\system32>e:\oracle\product\10.2.0\dbhome_1\bin\bbed password=blockedit blocksize=8192
BBED: Release 2.0.0.0.0 - Limited Production on Tue Dec 17 18:36:01 2013
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.
************* !!! For Oracle Internal Use only !!! ***************
BBED> set block 1
BBED-00310: no datafile specified
BBED> set filename 'E:\APP\XIFENFEI\ORADATA\ORA10G\SYSTEM01.DBF'
        FILENAME        E:\APP\XIFENFEI\ORADATA\ORA10G\SYSTEM01.DBF
BBED> set block 2
        BLOCK#          2
BBED> map
 File: E:\APP\XIFENFEI\ORADATA\ORA10G\SYSTEM01.DBF (0)
 Block: 2                                     Dba:0x00000000
------------------------------------------------------------
 Data File Header
 struct kcvfh, 360 bytes                    @0
 ub4 tailchk                                @8188
BBED> p kcvfhrdb
ub4 kcvfhrdb                                @96       0x00400179
SQL> Select to_number('00400179','xxxxxxxxxxxxxxxxxx') from dual;
TO_NUMBER('00400179','XXXXXXXXXXXXXXXXXX')
------------------------------------------
                                   4194681
SQL> select dbms_utility.data_block_address_block(4194681) "block",
  2  dbms_utility.data_block_address_file(4194681) "file" from dual;
     block       file
---------- ----------
       377          1

通过bbed查看kcvfhrdb(root rdba)指向的地址和数据库启动扫描block一致(file 1b block 377)

创建bootstarp$替换表(xifenfei)

SQL> create table xifenfei as select * from bootstrap$;
表已创建。
SQL> select count(*) from bootstrap$;
  COUNT(*)
----------
        57
SQL> select count(*) from xifenfei;
  COUNT(*)
----------
        57
SQL> SELECT OBJECT_ID FROM DBA_OBJECTS WHERE OBJECT_NAME='XIFENFEI';
 OBJECT_ID
----------
     51736
SQL> select header_file,header_block from dba_segments where segment_name='XIFENFEI';
HEADER_FILE HEADER_BLOCK
----------- ------------
          1        60241
SQL> SELECT TO_CHAR(60241,'XXXX') FROM DUAL;
TO_CH
-----
 EB51
--对应rdba为0040EB51

这里可以确定创建的xifenfei的segment header rdba为0x0040EB51,obj#为51736

清理bootstarp$中对象

SQL> DELETE FROM BOOTSTRAP$;
已删除57行。
SQL> COMMIT;
提交完成。
SQL> SHUTDOWN IMMEDIATE;
数据库已经关闭。
已经卸载数据库。
ORACLE 例程已经关闭。
SQL> STARTUP MOUNT
ORACLE 例程已经启动。
Total System Global Area  209715200 bytes
Fixed Size                  1289724 bytes
Variable Size             104858116 bytes
Database Buffers           96468992 bytes
Redo Buffers                7098368 bytes
数据库装载完毕。
SQL> oradebug setmypid
已处理的语句
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
已处理的语句
SQL> alter session set db_file_multiblocK_read_count=1;
会话已更改。
SQL> oradebug TRACEFILE_NAME
e:\oracle\product\10.2.0\admin\ora10g\udump\ora10g_ora_7704.trc
SQL>
SQL> alter database Open;
alter database Open
*
第 1 行出现错误:
ORA-01092: ORACLE 实例终止。强制断开连接
--trace文件
PARSING IN CURSOR #2 len=55 dep=1 uid=0 oct=3 lid=0 tim=379061819061 hv=2111436465 ad='8baedf18'
select line#, sql_text from bootstrap$ where obj# != :1
END OF STMT
PARSE #2:c=0,e=346,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=379061819058
BINDS #2:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=00288360  bln=22  avl=02  flg=05
  value=56
EXEC #2:c=0,e=681,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=379061819868
WAIT #2: nam='db file sequential read' ela= 343 file#=1 block#=377 blocks=1 obj#=-1 tim=379061820273
WAIT #2: nam='db file sequential read' ela= 291 file#=1 block#=378 blocks=1 obj#=-1 tim=379061820651
WAIT #2: nam='db file sequential read' ela= 309 file#=1 block#=379 blocks=1 obj#=-1 tim=379061821012
WAIT #2: nam='db file sequential read' ela= 293 file#=1 block#=380 blocks=1 obj#=-1 tim=379061821416
FETCH #2:c=0,e=1542,p=4,cr=5,cu=0,mis=0,r=0,dep=1,og=4,tim=379061821450
ORA-00704: 引导程序进程失败
ORA-00702: 引导程序版本 '' 与版本 '8.0.0.0.0' 不一致
*** 2013-12-17 18:50:07.325
EXEC #1:c=62400,e=4990345,p=10,cr=6,cu=0,mis=0,r=0,dep=0,og=1,tim=379065822300
ERROR #1:err=1092 tim=37915057

删除掉bootstarp中记录后,数据库无法正常启动,报错误为ORA-00704/ORA-00702,因为数据库读取bootstarp$中记录出错导致.

bbed修改root rdba

BBED> set mode edit
        MODE            Edit
BBED> set count 32
        COUNT           32
BBED> d
 File: E:\APP\XIFENFEI\ORADATA\ORA10G\SYSTEM01.DBF (0)
 Block: 2                Offsets:   96 to  127           Dba:0x00000000
------------------------------------------------------------------------
 79014000 0b000000 00000000 d10ff624 485dbc31 4bf60700 00000000 00000000
 <32 bytes per line>
BBED> m /x 51eb
 File: E:\APP\XIFENFEI\ORADATA\ORA10G\SYSTEM01.DBF (0)
 Block: 2                Offsets:   96 to  127           Dba:0x00000000
------------------------------------------------------------------------
 51eb4000 0b000000 00000000 d10ff624 485dbc31 4bf60700 00000000 00000000
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 2:
current = 0xa3bd, required = 0xa3bd
BBED> p kcvfhrdb
ub4 kcvfhrdb                                @96       0x0040eb51

修改root rdba地址为xifenfei segment header的地址

尝试启动数据库

SQL> startup mount;
ORACLE 例程已经启动。
Total System Global Area  209715200 bytes
Fixed Size                  1289724 bytes
Variable Size             113246724 bytes
Database Buffers           88080384 bytes
Redo Buffers                7098368 bytes
数据库装载完毕。
SQL> oradebug setmypid
已处理的语句
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
已处理的语句
SQL> alter session set db_file_multiblocK_read_count=1;
会话已更改。
SQL> oradebug TRACEFILE_NAME
e:\oracle\product\10.2.0\admin\ora10g\udump\ora10g_ora_7356.trc
SQL> alter database Open;
alter database Open
*
第 1 行出现错误:
ORA-01092: ORACLE 实例终止。强制断开连接
--trace文件
WAIT #1: nam='db file sequential read' ela= 26895 file#=1 block#=60241 blocks=1 obj#=-1 tim=380397162424
=====================
PARSING IN CURSOR #2 len=193 dep=1 uid=0 oct=1 lid=0 tim=380397162916 hv=1250491271 ad='8baee6a0'
create table bootstrap$ ( line#         number not null,   obj#           number not null,   sql_text   varchar2(4000) not null)   storage (initial 50K objno 51736 extents (file 1 block 60241))
END OF STMT
PARSE #2:c=0,e=372,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=380397162912
BINDS #2:
EXEC #2:c=0,e=80,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=380397163083
=====================
PARSING IN CURSOR #2 len=55 dep=1 uid=0 oct=3 lid=0 tim=380397163449 hv=2111436465 ad='8baedf2c'
select line#, sql_text from bootstrap$ where obj# != :1
END OF STMT
PARSE #2:c=0,e=311,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=380397163447
BINDS #2:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=00288360  bln=22  avl=04  flg=05
  value=51736
EXEC #2:c=0,e=515,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=380397164052
WAIT #2: nam='db file sequential read' ela= 314 file#=1 block#=60241 blocks=1 obj#=-1 tim=380397164415
WAIT #2: nam='db file sequential read' ela= 396 file#=1 block#=60242 blocks=1 obj#=-1 tim=380397164902
…………
PARSING IN CURSOR #2 len=272 dep=1 uid=0 oct=1 lid=0 tim=380397203298 hv=2124945659 ad='8bacb620'
CREATE TABLE BOOTSTRAP$("LINE#" NUMBER NOT NULL,"OBJ#" NUMBER NOT NULL,"SQL_TEXT" VARCHAR2(4000) NOT NULL) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 STORAGE (  INITIAL 56K NEXT 1024K MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 OBJNO 56 EXTENTS (FILE 1 BLOCK 377))
END OF STMT
PARSE #2:c=0,e=239,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=380397203295
BINDS #2:
EXEC #2:c=0,e=324,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=380397203701
ERROR #2:err=955 tim=38048197
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-00955: 名称已由现有对象使用
*** 2013-12-17 19:12:21.783
EXEC #1:c=93601,e=4199938,p=10,cr=60,cu=0,mis=0,r=0,dep=0,og=1,tim=380400250570
ERROR #1:err=1092 tim=38048501

数据库启动到创建bootstarp$的时候报错,报错的原因是因为xifenfei对象中的obj#=56的为CREATE TABLE BOOTSTRAP$,而前面的查询bootstarp$是过滤掉了obj#=56(为过滤掉xifenfei对象本身的obj#[51736])

upgrade模式启动数据库

SQL> conn / as sysdba
已连接到空闲例程。
SQL> startup mount
ORACLE 例程已经启动。
Total System Global Area  209715200 bytes
Fixed Size                  1289724 bytes
Variable Size             117441028 bytes
Database Buffers           83886080 bytes
Redo Buffers                7098368 bytes
数据库装载完毕。
SQL> alter database Open upgrade;
数据库已更改。

虽然启动的时候在报CREATE TABLE BOOTSTRAP$(注意大写,而不是启动第一条的create table bootstrap$),但是upgrade模式可以正常启动数据库

修改xifenfei中关于CREATE TABLE BOOTSTRAP$语句对应的obj#为xifenfei object_id

SQL> UPDATE XIFENFEI SET OBJ#=51736 WHERE OBJ#=56;
已更新 1 行。
SQL> commit;
提交完成。
SQL> shutdown immediate
数据库已经关闭。
已经卸载数据库。
ORACLE 例程已经关闭。
SQL> startup mount
ORACLE 例程已经启动。
Total System Global Area  209715200 bytes
Fixed Size                  1289724 bytes
Variable Size             121635332 bytes
Database Buffers           79691776 bytes
Redo Buffers                7098368 bytes
数据库装载完毕。
SQL> alter database open;
数据库已更改。

至此通过bbed结合修改CREATE TABLE BOOTSTRAP$语句对应的obj#完成数据库启动读取非bootstarp$表的过程

继续分析xifenfei和bootstarp$关系

SQL> select count(*) from bootstrap$;
  COUNT(*)
----------
        57
SQL> select count(*) from xifenfei;
  COUNT(*)
----------
        57
SQL> select obj# from bootstrap$ where line#=56;
      OBJ#
----------
     51736
SQL> select obj# from xifenfei where line#=56;
      OBJ#
----------
     51736
SQL> select header_file,header_block from dba_segments where segment_name='BOOTSTRAP$';
HEADER_FILE HEADER_BLOCK
----------- ------------
          1          377
SQL> SELECT OBJECT_ID FROM DBA_OBJECTS WHERE OBJECT_NAME='BOOTSTRAP$';
 OBJECT_ID
----------
        56
SQL> truncate table xifenfei;
truncate table xifenfei
               *
第 1 行出现错误:
ORA-00701: 无法改变热启动数据库所需的对象

数据库启动过程中,会读xifenfei(root rdba指向表),然后加载bootstarp$表,而且bootstarp$表中记录和xifenfei表中记录完全相同.

替换bootstarp$表

对数据库有一定了解的人都知道,数据库启动是通过bootstarp$表来实现启动引导的。那这里有两个疑问:
1.引导表bootstarp$名字是否是唯一?
2.引导表的位置是不是在数据文件固定?比如11g/12c在file 1 block 520,10g在file 1 block 377?

跟踪数据库启动

[oracle@rac2 ~]$ sqlplus / as sysdba
SQL*Plus: Release 12.1.0.1.0 Production on Sun Dec 15 14:16:58 2013
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics
and Real Application Testing options
SQL> startup mount;
ORACLE instance started.
Total System Global Area  400846848 bytes
Fixed Size                  2288872 bytes
Variable Size             310379288 bytes
Database Buffers           79691776 bytes
Redo Buffers                8486912 bytes
Database mounted.
SQL> oradebug setmypid
Statement processed.
SQL> alter session set db_file_multiblocK_read_count=1;
Statement processed.
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
Session altered.
SQL> oradebug TRACEFILE_NAME
/u01/app/oracle/diag/rdbms/test/TEST/trace/TEST_ora_31364.trc
SQL> alter database Open;
Database altered.
SQL> oradebug EVENT 10046 trace name context off
Statement processed.
SQL> select HEADER_FILE,HEADER_BLOCK from dba_segments where segment_name='BOOTSTRAP$';
HEADER_FILE HEADER_BLOCK
----------- ------------
          1          520
SQL> select object_id from dba_objects where object_name='BOOTSTRAP$';
 OBJECT_ID
----------
        59

分析trace文件

WAIT #140077386411120: nam='db file sequential read' ela= 56 file#=1 block#=520 blocks=1 obj#=-1 tim=1719385755334
=====================
PARSING IN CURSOR #140077386402760 len=188 dep=1 uid=0 oct=1 lid=0 tim=1719385757322 hv=4006182593 ad='6645d370' sqlid='32r4f1brckzq1'
create table bootstrap$ (
END OF STMT
PARSE #140077386402760:c=2000,e=1711,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1719385757319
EXEC #140077386402760:c=0,e=430,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=0,tim=1719385757909
CLOSE #140077386402760:c=0,e=9,dep=1,type=0,tim=1719385758105
=====================
PARSING IN CURSOR #140077386402760 len=55 dep=1 uid=0 oct=3 lid=0 tim=1719385759507 hv=2111436465 ad='6645bc80' sqlid='6apq2rjyxmxpj'
select line#, sql_text from bootstrap$ where obj# != :1
END OF STMT
PARSE #140077386402760:c=1000,e=1365,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1719385759505
BINDS #140077386402760:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f664edb8780  bln=22  avl=02  flg=05
  value=59
EXEC #140077386402760:c=3000,e=8859,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=867914364,tim=1719385768574
WAIT #140077386402760: nam='db file sequential read' ela= 27 file#=1 block#=520 blocks=1 obj#=59 tim=1719385768753
WAIT #140077386402760: nam='db file sequential read' ela= 23 file#=1 block#=521 blocks=1 obj#=59 tim=1719385769575

这里我们可以发现,数据库启动的时候是读file 1 block 520,object_id为59,为bootstarp$对象

bbed查看root rdba地址

ub4 kcvfhrdb                             @96       0x00400208
SQL> select to_number('208','xxx') from dual;
TO_NUMBER('208','XXX')
----------------------
                   520

这里可以看出来,数据库启动的file header rdba地址和trace文件中的一致,也是指定到bootstarp$;

替换bootstrap$表为xifenfei

SQL> create table xifenfei as select * from bootstrap$;
Table created.
SQL> desc xifenfei
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 LINE#                                     NOT NULL NUMBER
 OBJ#                                      NOT NULL NUMBER
 SQL_TEXT                                  NOT NULL VARCHAR2(4000)
SQL> select sql_text from xifenfei where line#=59;
SQL_TEXT
--------------------------------------------------------------------------------
CREATE TABLE BOOTSTRAP$("LINE#" NUMBER NOT NULL,"OBJ#" NUMBER NOT NULL,"SQL_TEXT
" VARCHAR2(4000) NOT NULL) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 STORAGE
 (  INITIAL 56K NEXT 1024K MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 OBJN
O 59 EXTENTS (FILE 1 BLOCK 520))
--清除bootstarp$记录
SQL> delete from xifenfei where line#=59;
1 row deleted.
SQL> commit;
Commit complete.
SQL> select objECT_ID from dba_objects where object_name='XIFENFEI';
 OBJECT_ID
----------
     20314
SQL> select HEADER_FILE,HEADER_BLOCK from dba_segments where segment_name='XIFENFEI';
HEADER_FILE HEADER_BLOCK
----------- ------------
          1        45712
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup upgrade;
ORACLE instance started.
Total System Global Area  400846848 bytes
Fixed Size                  2288872 bytes
Variable Size             310379288 bytes
Database Buffers           79691776 bytes
Redo Buffers                8486912 bytes
Database mounted.
Database opened.
--删除bootstarp$表
SQL>  drop table bootstrap$;
Table dropped.
SQL>  exec DBMS_DDL_INTERNAL.SWAP_BOOTSTRAP('XIFENFEI');
PL/SQL procedure successfully completed.
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> EXIT
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics
and Real Application Testing options
[oracle@rac2 ~]$ sqlplus / as sysdba
SQL*Plus: Release 12.1.0.1.0 Production on Sun Dec 15 14:29:54 2013
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> STARTUP MOUNT
ORACLE instance started.
Total System Global Area  400846848 bytes
Fixed Size                  2288872 bytes
Variable Size             310379288 bytes
Database Buffers           79691776 bytes
Redo Buffers                8486912 bytes
Database mounted.
SQL> oradebug setmypid
Statement processed.
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
Statement processed.
SQL> alter session set db_file_multiblocK_read_count=1;
Session altered.
SQL> oradebug TRACEFILE_NAME
/u01/app/oracle/diag/rdbms/test/TEST/trace/TEST_ora_32727.trc
SQL> alter database Open;
Database altered.
SQL> SELECT OBJECT_ID FROM DBA_OBJECTS WHERE OBJECT_NAME='BOOTSTRAP$';
no rows selected
SQL> SELECT OBJECT_ID FROM DBA_OBJECTS WHERE OBJECT_NAME='XIFENFEI';
 OBJECT_ID
----------
     20314

分析trace文件

WAIT #139841534602352: nam='db file sequential read' ela= 23 file#=1 block#=45712 blocks=1 obj#=-1 tim=1720082484775
=====================
PARSING IN CURSOR #139841534593992 len=193 dep=1 uid=0 oct=1 lid=0 tim=1720082488552 hv=2096904950 ad='6645d650' sqlid='bs6v55xygsfrq'
create table bootstrap$ (
END OF STMT
PARSE #139841534593992:c=2000,e=2925,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1720082488550
EXEC #139841534593992:c=0,e=466,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=0,tim=1720082489124
CLOSE #139841534593992:c=0,e=8,dep=1,type=0,tim=1720082489266
=====================
PARSING IN CURSOR #139841534593992 len=55 dep=1 uid=0 oct=3 lid=0 tim=1720082490510 hv=2111436465 ad='6645c050' sqlid='6apq2rjyxmxpj'
select line#, sql_text from bootstrap$ where obj# != :1
END OF STMT
PARSE #139841534593992:c=1999,e=1211,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1720082490509
BINDS #139841534593992:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f2f64fe8780  bln=22  avl=04  flg=05
  value=20314
EXEC #139841534593992:c=1000,e=1789,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=867914364,tim=1720082492533
WAIT #139841534593992: nam='db file sequential read' ela= 20 file#=1 block#=45712 blocks=1 obj#=20314 tim=1720082492685
WAIT #139841534593992: nam='db file sequential read' ela= 25 file#=1 block#=45713 blocks=1 obj#=20314 tim=1720082492986

这里可以看出来,数据库在启动的时候是读取file 1 block 45712,object_id为20314

bbed查看root rdba地址

ub4 kcvfhrdb                             @96       0x0040b290
SQL> select to_number('b290','xxxxxxx') from dual;
TO_NUMBER('B290','XXXXXXX')
---------------------------
                      45712

证明现在的文件头里面的file header rdba 已经修改为file 1 block 45712和trace里面看到的一致,都是XIFENFEI这个表

测试结论
1. 通过使用SWAP_BOOTSTRAP可以置换掉数据库启动开始表bootstrap$变为另外一个表,而且该过程直接修改文件头的kcvfhrdb值
2. 通过试验证明,oracle启动的时候不是程序里面写死的去读file 1的某个block,而是通过读取kcvfhrdb然后启动数据库

一次数据库优化全过程分析

最近对客户的一个数据库进行了优化,在本次优化过程中,主要涉及以下方面:
1. 确保系统有足够的内存,处理方法配置Hugepage,减小SGA
2. 优化因为主键表频繁插入引起的user$,con$,cdef$递归查询sql

SQL> select c.name, u.name from con$ c, cdef$ cd, user$ u where
   2 c.con# = cd.con# and cd.enabled = :1 and c.owner# = u.user#;
Execution Plan
----------------------------------------------------------
Plan hash value: 2409458995
-----------------------------------------------------------------------------
| Id  | Operation           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |  3820 |   164K|    38   (6)| 00:00:01 |
|*  1 |  HASH JOIN          |       |  3820 |   164K|    38   (6)| 00:00:01 |
|   2 |   TABLE ACCESS FULL | USER$ |    64 |   896 |     3   (0)| 00:00:01 |
|*  3 |   HASH JOIN         |       |  3820 |   111K|    34   (3)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL| CDEF$ |  3820 | 34380 |    25   (0)| 00:00:01 |
|   5 |    TABLE ACCESS FULL| CON$  |  6368 |   130K|     8   (0)| 00:00:01 |
-----------------------------------------------------------------------------

具体见:一次数据库优化全过程分析