ORACLE 12C RAC修改ocr/votedisk/asm spfile所在磁盘组名称

今天看着我这个单节点的12C rac,突然觉得ocr所在的磁盘组叫做+DG_SYS有点不舒服,想改成+SYS_DG。处理方法是先把ocr/votedisk/asm spfile迁移到已经存在的asm中,然后修改磁盘组名称,最后迁移到新名称磁盘组中(本次处理流程+DG_SYS—>+DATA—>+SYS_DG)
当前运行情况

[grid@xifenfei ~]$ crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       xifenfei                 STABLE
ora.DATA.dg
               ONLINE  ONLINE       xifenfei                 STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       xifenfei                 STABLE
ora.net1.network
               ONLINE  ONLINE       xifenfei                 STABLE
ora.ons
               ONLINE  ONLINE       xifenfei                 STABLE
ora.proxy_advm
               ONLINE  OFFLINE      xifenfei                 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       xifenfei                 169.254.196.108 10.1
                                                             0.30.22,STABLE
ora.asm
      1        ONLINE  ONLINE       xifenfei                 Started,STABLE
      2        OFFLINE OFFLINE                               STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cdb.db
      1        ONLINE  ONLINE       xifenfei                 Open,STABLE
ora.cvu
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.oc4j
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.xifenfei.vip
      1        ONLINE  ONLINE       xifenfei                 STABLE
--------------------------------------------------------------------------------
SQL> select * from v$version;
BANNER                                                                               CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production              0
PL/SQL Release 12.1.0.1.0 - Production                                                    0
CORE    12.1.0.1.0      Production                                                                0
TNS for Linux: Version 12.1.0.1.0 - Production                                            0
NLSRTL Version 12.1.0.1.0 - Production                                                    0
SQL> select name,state from v$asm_diskgroup;
NAME                           STATE
------------------------------ -----------
DG_SYS                         MOUNTED
DATA                           MOUNTED
[grid@xifenfei ~]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   60a037da30714f6bbfe5d90206ff27a7 (/dev/sdc2) [DG_SYS]
Located 1 voting disk(s).
[grid@xifenfei ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          4
         Total space (kbytes)     :     409568
         Used space (kbytes)      :       1380
         Available space (kbytes) :     408188
         ID                       : 2132096904
         Device/File Name         :    +DG_SYS
                                    Device/File integrity check succeeded
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
         Cluster registry integrity check succeeded
         Logical corruption check bypassed due to non-privileged user
SQL> show parameter spfile;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      +DG_SYS/xff-cluster/ASMPARAMET
                                                 ERFILE/registry.253.825640465

修改ocr路径
ocrconfig -add和ocrconfig -delete完成ocr更换磁盘组,该过程可以在线处理

[root@xifenfei ~]# ocrconfig -add +data
--alert 日志
2013-09-09 22:32:40.799:
[crsd(5064)]CRS-1007:The OCR/OCR mirror location was replaced by +data.
[root@xifenfei ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          4
         Total space (kbytes)     :     409568
         Used space (kbytes)      :       1380
         Available space (kbytes) :     408188
         ID                       : 2132096904
         Device/File Name         :    +DG_SYS
                                    Device/File integrity check succeeded
         Device/File Name         :      +data
                                    Device/File integrity check succeeded
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
         Cluster registry integrity check succeeded
[root@xifenfei ~]# ocrconfig -delete +DG_SYS
--alert 日志
2013-09-09 22:35:53.585:
[crsd(5064)]CRS-1010:The OCR mirror location +DG_SYS was removed.
[root@xifenfei ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          4
         Total space (kbytes)     :     409568
         Used space (kbytes)      :       1380
         Available space (kbytes) :     408188
         ID                       : 2132096904
         Device/File Name         :      +data
                                    Device/File integrity check succeeded
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
         Cluster registry integrity check succeeded
         Logical corruption check succeeded

修改votedisk路径
通过crsctl replace votedisk命令修改

[root@xifenfei ~]# crsctl replace votedisk +DATA
Successful addition of voting disk 161ddea0a5fe4f28bfb67536e6105122.
Successful deletion of voting disk 60a037da30714f6bbfe5d90206ff27a7.
Successfully replaced voting disk group with +DATA.
CRS-4266: Voting file(s) successfully replaced
-alert日志
2013-09-09 22:38:15.259:
[cssd(4685)]CRS-1605:CSSD voting file is online: /dev/sdb; details in /u01/app/12.1/grid/product/log/xifenfei/cssd/ocssd.log.
2013-09-09 22:38:15.259:
[cssd(4685)]CRS-1626:A Configuration change request completed successfully
2013-09-09 22:38:15.285:
[cssd(4685)]CRS-1601:CSSD Reconfiguration complete. Active nodes are xifenfei .
[root@xifenfei ~]# crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   161ddea0a5fe4f28bfb67536e6105122 (/dev/sdb) [DATA]
Located 1 voting disk(s).

修改asm spfile位置

[grid@xifenfei ~]$  gpnptool get -o-
Success.
…………
<orcl:ASM-Profile id="asm" DiscoveryString="/dev/sd*" SPFile="+DG_SYS/xff-cluster/ASMPARAMETERFILE/registry.253.825640465" Mode="legacy"/>
…………
[grid@xifenfei ~]$ sqlplus / as sysasm
SQL*Plus: Release 12.1.0.1.0 Production on Mon Sep 9 22:42:05 2013
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> create pfile='/tmp/pfile.asm' from spfile;
File created.
SQL> create spfile='+DATA' FROM PFILE='/tmp/pfile.asm';
File created.
[grid@xifenfei ~]$  gpnptool get -o-
Success.
…………
<orcl:ASM-Profile id="asm" DiscoveryString="/dev/sd*" SPFile="+DATA/xff-cluster/ASMPARAMETERFILE/registry.253.825720159" Mode="legacy"/>
…………

这里证明create asm spfile会自动修改spfile在gpnptool对应的profile里面的配置,无需人工干预

重启crs
为了使得asm使用新的磁盘组中的spfile文件

[root@xifenfei ~]# crsctl stop crs
[root@xifenfei ~]# crsctl start crs

验证+DG_SYS磁盘组未被使用

[grid@xifenfei ~]$ sqlplus / as sysasm
SQL*Plus: Release 12.1.0.1.0 Production on Mon Sep 9 22:59:49 2013
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> show parameter spfile;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      +DATA/xff-cluster/ASMPARAMETER
                                                 FILE/registry.253.825720159
ASMCMD> lsof
DB_Name  Instance_Name  Path
+ASM     +ASM1          +DATA.255.819326577
cdb      cdb1           +DATA/CDB/CONTROLFILE/current.274.819356503
cdb      cdb1           +DATA/CDB/DATAFILE/sysaux.278.819355829
cdb      cdb1           +DATA/CDB/DATAFILE/system.269.819356101
cdb      cdb1           +DATA/CDB/DATAFILE/undotbs1.276.819356317
cdb      cdb1           +DATA/CDB/DATAFILE/users.279.819356309
cdb      cdb1           +DATA/CDB/DD7C48AA5A4404A2E04325AAE80A403C/DATAFILE/pdbseed_temp01.dbf
cdb      cdb1           +DATA/CDB/DD7C48AA5A4404A2E04325AAE80A403C/DATAFILE/sysaux.272.819356709
cdb      cdb1           +DATA/CDB/DD7C48AA5A4404A2E04325AAE80A403C/DATAFILE/system.271.819356709
cdb      cdb1           +DATA/CDB/ONLINELOG/group_1.277.822736453
cdb      cdb1           +DATA/CDB/ONLINELOG/group_2.280.822736461
cdb      cdb1           +DATA/CDB/ONLINELOG/group_3.275.822736397
cdb      cdb1           +DATA/CDB/TEMPFILE/temp.273.819356649

dismount +DG_SYS磁盘组

ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576     20480     3369                0            3369              0             Y  DATA/
MOUNTED  EXTERN  N         512   4096  1048576      5451     5231                0            5231              0             N  DG_SYS/
ASMCMD> umount dg_sys
ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576     20480     3369                0            3369              0             Y  DATA/

修改asm dg名称
修改磁盘组+DG_SYS为+SYS_DG

[grid@xifenfei ~]$ renamedg phase=both dgname=DG_SYS newdgname=SYS_DG verbose=true
Parsing parameters..
Parameters in effect:
         Old DG name       : DG_SYS
         New DG name          : SYS_DG
         Phases               :
                 Phase 1
                 Phase 2
         Discovery str        : (null)
         Clean              : TRUE
         Raw only           : TRUE
renamedg operation: phase=both dgname=DG_SYS newdgname=SYS_DG verbose=true
Executing phase 1
Discovering the group
Performing discovery with string:
Identified disk UFS:/dev/sdc2 with disk number:0 and timestamp (32990496 1727895552)
Checking for hearbeat...
Re-discovering the group
Performing discovery with string:
Identified disk UFS:/dev/sdc2 with disk number:0 and timestamp (32990496 1727895552)
Checking if the diskgroup is mounted or used by CSS
Checking disk number:0
Generating configuration file..
Completed phase 1
Executing phase 2
Looking for /dev/sdc2
Modifying the header
Completed phase 2
Terminating kgfd context 0x7fceeb02a0a0

mount +SYS_DG

SQL> select name,state from v$asm_diskgroup;
NAME                           STATE
------------------------------ -----------
DATA                           MOUNTED
SYS_DG                         DISMOUNTED
SQL> alter diskgroup sys_dg mount;
Diskgroup altered.
SQL>  select name,state from v$asm_diskgroup;
NAME                           STATE
------------------------------ -----------
DATA                           MOUNTED
SYS_DG                         MOUNTED

asm spfile/ocr/votedisk迁移从+DATA到+SYS_DG

SQL> create spfile='+SYS_DG' FROM pfile='/tmp/pfile.asm';
File created.
[root@xifenfei ~]# ocrconfig -add +SYS_DG
[root@xifenfei ~]# ocrconfig -DELETE +DATA
[root@xifenfei ~]# crsctl replace votedisk +SYS_DG
Successful addition of voting disk 9694a31053ea4ff4bfb57891461a1296.
Successful deletion of voting disk 161ddea0a5fe4f28bfb67536e6105122.
Successfully replaced voting disk group with +SYS_DG.
CRS-4266: Voting file(s) successfully replaced
[root@xifenfei ~]# crsctl stop crs
[root@xifenfei ~]# crsctl start crs

删除ocr里面老磁盘组(+DG_SYS)信息

[root@xifenfei ~]# crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       xifenfei                 STABLE
ora.DATA.dg
               ONLINE  ONLINE       xifenfei                 STABLE
ora.DG_SYS.dg
               ONLINE  OFFLINE      xifenfei                 STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       xifenfei                 STABLE
ora.SYS_DG.dg
               ONLINE  ONLINE       xifenfei                 STABLE
ora.net1.network
               ONLINE  ONLINE       xifenfei                 STABLE
ora.ons
               ONLINE  ONLINE       xifenfei                 STABLE
ora.proxy_advm
               ONLINE  OFFLINE      xifenfei                 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       xifenfei                 169.254.196.108 10.1
                                                             0.30.22,STABLE
ora.asm
      1        ONLINE  ONLINE       xifenfei                 Started,STABLE
      2        OFFLINE OFFLINE                               STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cdb.db
      1        ONLINE  ONLINE       xifenfei                 Open,STABLE
ora.cvu
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.oc4j
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.xifenfei.vip
      1        ONLINE  ONLINE       xifenfei                 STABLE
--------------------------------------------------------------------------------
[root@xifenfei ~]# srvctl remove diskgroup -diskgroup dg_sys
[root@xifenfei ~]# crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       xifenfei                 STABLE
ora.DATA.dg
               ONLINE  ONLINE       xifenfei                 STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       xifenfei                 STABLE
ora.SYS_DG.dg
               ONLINE  ONLINE       xifenfei                 STABLE
ora.net1.network
               ONLINE  ONLINE       xifenfei                 STABLE
ora.ons
               ONLINE  ONLINE       xifenfei                 STABLE
ora.proxy_advm
               ONLINE  OFFLINE      xifenfei                 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       xifenfei                 169.254.196.108 10.1
                                                             0.30.22,STABLE
ora.asm
      1        ONLINE  ONLINE       xifenfei                 Started,STABLE
      2        OFFLINE OFFLINE                               STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cdb.db
      1        ONLINE  ONLINE       xifenfei                 Open,STABLE
ora.cvu
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.oc4j
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.xifenfei.vip
      1        ONLINE  ONLINE       xifenfei                 STABLE
--------------------------------------------------------------------------------

至此ocr/votedisk/asm spfile所在磁盘组修改名称完成,因为该库是一个单节点的rac,如果是两个或者更多节点的rac可以实现不停机的情况下进行(分步重启节点).该处理过程和11.2 rac完全相同,未有任何的改变

oracle 12.1 RAC的ocr磁盘组异常恢复

在11.2或者12.1的RAC中,ocr和votedisk可以放到asm中,而很多人安装系统把ocr和votedisk放到一个单独的asm 磁盘组里面,但是如果这个磁盘组坏了,而数据所在的磁盘组是好的,这个时候该怎么恢复呢?这里的恢复分两种情况,一种是有ocr备份的恢复,另外一种是无ocr备份的恢复。但是在一般情况下ocr是每4个小时自动备份一份,因此大部分的系统中都会有ocr的备份。本blog主要对于oracle 12c rac在有ocr备份,存储ocr,votedisk的asm磁盘组异常恢复
确定ocr,votedisk,asm spfile存在一个独立asm diskgroup中

[grid@xifenfei ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          4
         Total space (kbytes)     :     409568
         Used space (kbytes)      :       1360
         Available space (kbytes) :     408208
         ID                       : 2132096904
         Device/File Name         :    +DG_SYS
                                    Device/File integrity check succeeded
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
         Cluster registry integrity check succeeded
         Logical corruption check bypassed due to non-privileged user
SQL> show parameter spfile;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      +DG_SYS/xff-cluster/ASMPARAMET
                                                 ERFILE/registry.253.825628977
[grid@xifenfei ~]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   3e20d13ae98a4fcfbffa489ab4df68a3 (/dev/sdc2) [DG_SYS]
Located 1 voting disk(s).
ASMCMD>  lsdsk -t -G dg_sys
Create_Date  Mount_Date  Repair_Timer  Path
08-SEP-13    08-SEP-13   0             /dev/sdc2

查看当前rac状态

[grid@xifenfei ~]$ crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       xifenfei                 STABLE
ora.DATA.dg
               ONLINE  ONLINE       xifenfei                 STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       xifenfei                 STABLE
ora.net1.network
               ONLINE  ONLINE       xifenfei                 STABLE
ora.ons
               ONLINE  ONLINE       xifenfei                 STABLE
ora.proxy_advm
               ONLINE  OFFLINE      xifenfei                 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       xifenfei                 169.254.196.108 10.1
                                                             0.30.22,STABLE
ora.asm
      1        ONLINE  ONLINE       xifenfei                 Started,STABLE
      2        OFFLINE OFFLINE                               STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cdb.db
      1        ONLINE  ONLINE       xifenfei                 Open,STABLE
ora.cvu
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.oc4j
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.xifenfei.vip
      1        ONLINE  ONLINE       xifenfei                 STABLE
--------------------------------------------------------------------------------

kfed查看磁盘头

[grid@xifenfei ~]$ kfed read /dev/sdc2
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  2879801080 ; 0x00c: 0xaba646f8
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:         ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]:            0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                202375168 ; 0x020: 0x0c100000
kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:             DG_SYS_0000 ; 0x028: length=11
kfdhdb.grpname:                  DG_SYS ; 0x048: length=6
kfdhdb.fgname:              DG_SYS_0000 ; 0x068: length=11
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             32990483 ; 0x0a8: HOUR=0x13 DAYS=0x8 MNTH=0x9 YEAR=0x7dd
kfdhdb.crestmp.lo:            303455232 ; 0x0ac: USEC=0x0 MSEC=0x197 SECS=0x21 MINS=0x4
kfdhdb.mntstmp.hi:             32990485 ; 0x0b0: HOUR=0x15 DAYS=0x8 MNTH=0x9 YEAR=0x7dd
kfdhdb.mntstmp.lo:           1776845824 ; 0x0b4: USEC=0x0 MSEC=0x221 SECS=0x1e MINS=0x1a
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80
kfdhdb.dsksize:                    5451 ; 0x0c4: 0x0000154b
kfdhdb.pmcnt:                         3 ; 0x0c8: 0x00000003
kfdhdb.fstlocn:                       1 ; 0x0cc: 0x00000001
kfdhdb.altlocn:                       2 ; 0x0d0: 0x00000002
kfdhdb.f1b1locn:                     10 ; 0x0d4: 0x0000000a
kfdhdb.redomirrors[0]:                0 ; 0x0d8: 0x0000
kfdhdb.redomirrors[1]:                0 ; 0x0da: 0x0000
kfdhdb.redomirrors[2]:                0 ; 0x0dc: 0x0000
kfdhdb.redomirrors[3]:                0 ; 0x0de: 0x0000
kfdhdb.dbcompat:              168820736 ; 0x0e0: 0x0a100000
kfdhdb.grpstmp.hi:             32990483 ; 0x0e4: HOUR=0x13 DAYS=0x8 MNTH=0x9 YEAR=0x7dd
kfdhdb.grpstmp.lo:            301063168 ; 0x0e8: USEC=0x0 MSEC=0x77 SECS=0x1f MINS=0x4
kfdhdb.vfstart:                     224 ; 0x0ec: 0x000000e0
kfdhdb.vfend:                       256 ; 0x0f0: 0x00000100
kfdhdb.spfile:                      219 ; 0x0f4: 0x000000db   ----asm spfile的起点
kfdhdb.spfflg:                        1 ; 0x0f8: 0x00000001
kfdhdb.flags:                         1 ; 0x0fc: 0x00000001

备份ocr

[root@xifenfei xff-cluster]# ocrconfig  -manualbackup
xifenfei     2013/09/08 23:48:57     /u01/app/12.1/grid/product/cdata/xff-cluster/backup_20130908_234857.ocr
[root@xifenfei xff-cluster]# ocrconfig -showbackup
xifenfei     2013/08/08 21:11:00     /u01/app/12.1/grid/product/cdata/xifenfe-cluster/backup00.ocr
xifenfei     2013/08/08 17:10:56     /u01/app/12.1/grid/product/cdata/xifenfe-cluster/backup01.ocr
xifenfei     2013/07/08 20:23:18     /u01/app/12.1/grid/product/cdata/xifenfe-cluster/backup02.ocr
xifenfei     2013/08/08 17:10:56     /u01/app/12.1/grid/product/cdata/xifenfe-cluster/day.ocr
xifenfei     2013/08/08 17:10:56     /u01/app/12.1/grid/product/cdata/xifenfe-cluster/week.ocr
xifenfei     2013/09/08 23:48:57     /u01/app/12.1/grid/product/cdata/xff-cluster/backup_20130908_234857.ocr
xifenfei     2013/06/28 22:55:02     /u01/app/12.1/grid/product/cdata/xifenfe-cluster/backup_20130628_225502.ocr

破坏asm disk

[grid@xifenfei ~]$ dd if=/dev/zero of=/dev/sdc2 bs=4096 count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 6.6061e-05 seconds, 62.0 MB/s

关闭crs

[root@xifenfei xff-cluster]# crsctl stop crs

启动crs

[root@xifenfei xff-cluster]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[grid@xifenfei admin]$ crsctl status res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                               Instance Shutdown,ST
                                                             ABLE
ora.cluster_interconnect.haip
      1        ONLINE  OFFLINE                               STABLE
ora.crf
      1        ONLINE  OFFLINE                               STABLE
ora.crsd
      1        ONLINE  OFFLINE                               STABLE
ora.cssd
      1        ONLINE  OFFLINE      xifenfei                 STARTING
ora.cssdmonitor
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.ctssd
      1        ONLINE  OFFLINE                               STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.drivers.acfs
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.evmd
      1        ONLINE  INTERMEDIATE xifenfei                 STABLE
ora.gipcd
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.gpnpd
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.mdnsd
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.storage
      1        ONLINE  OFFLINE                               STABLE
--------------------------------------------------------------------------------

GI相关日志

--alert日志
2013-09-08 23:53:37.662:
[gpnpd(1507)]CRS-2328:GPNPD started on node xifenfei.
2013-09-08 23:54:10.244:
[cssd(1584)]CRS-1713:CSSD daemon is started in hub mode
2013-09-08 23:54:10.915:
[cssd(1584)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/12.1/grid/product/log/xifenfei/cssd/ocssd.log
2013-09-08 23:54:11.183:
[ohasd(1367)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
2013-09-08 23:54:11.183:
[ohasd(1367)]CRS-2769:Unable to failover resource 'ora.diskmon'.
2013-09-08 23:54:26.044:
[cssd(1584)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/12.1/grid/product/log/xifenfei/cssd/ocssd.log
2013-09-08 23:54:41.146:
[cssd(1584)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/12.1/grid/product/log/xifenfei/cssd/ocssd.log
2013-09-08 23:54:56.195:
[cssd(1584)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/12.1/grid/product/log/xifenfei/cssd/ocssd.log
--ocssd日志
2013-09-08 23:54:25.976: [    GPNP][1090226496]clsgpnp_profileCallUrlInt: [at clsgpnp.c:2360] Result: (0) CLSGPNP_OK. Successful get-profile CALL to remote "ipc://GPNPD_xifenfei" disco ""
2013-09-08 23:54:25.976: [    CSSD][1090226496]clssnmReadDiscoveryProfile: voting file discovery string(/dev/sd*)
2013-09-08 23:54:25.976: [    CSSD][1090226496]clssnmvDDiscThread: using discovery string /dev/sd* for initial discovery
2013-09-08 23:54:25.976: [   SKGFD][1090226496]Discovery with str:/dev/sd*:
2013-09-08 23:54:25.976: [   SKGFD][1090226496]UFS discovery with :/dev/sd*:
2013-09-08 23:54:26.032: [   SKGFD][1090226496]Fetching UFS disk :/dev/sdc:
2013-09-08 23:54:26.037: [   SKGFD][1090226496]Fetching UFS disk :/dev/sdc1:
2013-09-08 23:54:26.037: [   SKGFD][1090226496]Fetching UFS disk :/dev/sdb:
2013-09-08 23:54:26.037: [   SKGFD][1090226496]Fetching UFS disk :/dev/sdc2:
2013-09-08 23:54:26.037: [   SKGFD][1090226496]Fetching UFS disk :/dev/sdd:
2013-09-08 23:54:26.037: [   SKGFD][1090226496]Fetching UFS disk :/dev/sdd1:
2013-09-08 23:54:26.037: [   SKGFD][1090226496]Fetching UFS disk :/dev/sda:
2013-09-08 23:54:26.037: [   SKGFD][1090226496]Fetching UFS disk :/dev/sda1:
2013-09-08 23:54:26.037: [   SKGFD][1090226496]Fetching UFS disk :/dev/sda2:
2013-09-08 23:54:26.037: [   SKGFD][1090226496]OSS discovery with :/dev/sd*:
2013-09-08 23:54:26.042: [   SKGFD][1090226496]Handle 0x1d65c10 from lib :UFS:: for disk :/dev/sdb:
2013-09-08 23:54:26.043: [   SKGFD][1090226496]Handle 0x20c95a0 from lib :UFS:: for disk :/dev/sdc2:
2013-09-08 23:54:26.043: [   SKGFD][1090226496]Handle 0x20c9dd0 from lib :UFS:: for disk :/dev/sdd:
2013-09-08 23:54:26.044: [   SKGFD][1090226496]Lib :UFS:: closing handle 0x1d65c10 for disk :/dev/sdb:
2013-09-08 23:54:26.044: [   SKGFD][1090226496]Lib :UFS:: closing handle 0x20c95a0 for disk :/dev/sdc2:
2013-09-08 23:54:26.044: [   SKGFD][1090226496]Lib :UFS:: closing handle 0x20c9dd0 for disk :/dev/sdd:
2013-09-08 23:54:26.044: [    CSSD][1090226496]clssnmvDiskVerify: Successful discovery of 0 disks
2013-09-08 23:54:26.044: [    CSSD][1090226496]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2013-09-08 23:54:26.044: [    CSSD][1090226496]clssnmvFindInitialConfigs: No voting files found
2013-09-08 23:54:26.044: [    CSSD][1090226496](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds

在我们破坏了ocr所在的asm disk的磁盘后,启动crs明显提示无法找到votedisk信息

强制关闭crs

[root@xifenfei xff-cluster]# crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'xifenfei'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'xifenfei'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'xifenfei'
CRS-2677: Stop of 'ora.drivers.acfs' on 'xifenfei' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'xifenfei' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'xifenfei'
CRS-2673: Attempting to stop 'ora.gipcd' on 'xifenfei'
CRS-2673: Attempting to stop 'ora.evmd' on 'xifenfei'
CRS-2677: Stop of 'ora.gpnpd' on 'xifenfei' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'xifenfei' succeeded
CRS-2677: Stop of 'ora.evmd' on 'xifenfei' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'xifenfei' has completed
CRS-4133: Oracle High Availability Services has been stopped.

exclusive模式启动crs

[root@xifenfei xff-cluster]# crsctl start crs -excl -nocrs
CRS-4123: Oracle High Availability Services has been started.
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'xifenfei'
CRS-2677: Stop of 'ora.drivers.acfs' on 'xifenfei' succeeded
CRS-2672: Attempting to start 'ora.evmd' on 'xifenfei'
CRS-2672: Attempting to start 'ora.mdnsd' on 'xifenfei'
CRS-2676: Start of 'ora.evmd' on 'xifenfei' succeeded
CRS-2676: Start of 'ora.mdnsd' on 'xifenfei' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'xifenfei'
CRS-2676: Start of 'ora.gpnpd' on 'xifenfei' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'xifenfei'
CRS-2672: Attempting to start 'ora.gipcd' on 'xifenfei'
CRS-2676: Start of 'ora.cssdmonitor' on 'xifenfei' succeeded
CRS-2676: Start of 'ora.gipcd' on 'xifenfei' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'xifenfei'
CRS-2672: Attempting to start 'ora.diskmon' on 'xifenfei'
CRS-2676: Start of 'ora.diskmon' on 'xifenfei' succeeded
CRS-2676: Start of 'ora.cssd' on 'xifenfei' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'xifenfei'
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'xifenfei'
CRS-2672: Attempting to start 'ora.ctssd' on 'xifenfei'
CRS-2676: Start of 'ora.ctssd' on 'xifenfei' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'xifenfei' succeeded
CRS-2676: Start of 'ora.drivers.acfs' on 'xifenfei' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'xifenfei'
CRS-2676: Start of 'ora.asm' on 'xifenfei' succeeded
[grid@xifenfei xifenfei]$ crsctl stat res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  INTERMEDIATE xifenfei                 OCR not started,STAB
                                                             LE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.crf
      1        OFFLINE OFFLINE                               STABLE
ora.crsd
      1        OFFLINE OFFLINE                               STABLE
ora.cssd
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.ctssd
      1        ONLINE  ONLINE       xifenfei                 ACTIVE:0,STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.drivers.acfs
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.evmd
      1        ONLINE  INTERMEDIATE xifenfei                 STABLE
ora.gipcd
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.gpnpd
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.mdnsd
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.storage
      1        OFFLINE OFFLINE                               STABLE

创建磁盘组

[grid@xifenfei xifenfei]$ sqlplus / as sysasm
SQL*Plus: Release 12.1.0.1.0 Production on Mon Sep 9 00:23:40 2013
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> create diskgroup DG_OCR external redundancy disk '/dev/sdc2' attribute 'COMPATIBLE.ASM' = '12.1.0';
Diskgroup created.
[root@xifenfei xff-cluster]# ls -l
total 1472
-rw-r--r-- 1 root root 1503232 Sep  8 23:48 backup_20130908_234857.ocr
[root@xifenfei xff-cluster]# ocrconfig -restore backup_20130908_234857.ocr
PROT-35: The configured OCR locations are not accessible
SQL> conn / as sysasm
Connected.
SQL> drop diskgroup DG_OCR force including contents;
Diskgroup dropped.
SQL>  create diskgroup DG_SYS  external redundancy disk '/dev/sdc2' attribute 'COMPATIBLE.ASM' = '12.1.0';
Diskgroup created.

为了操作方便,建议创建磁盘组和以前ocr所在异常的磁盘组一致

还原ocr

[root@xifenfei xff-cluster]# ocrconfig -restore backup_20130908_234857.ocr
--ALERT 日志
2013-09-09 00:26:50.584:
[client(3015)]CRS-1002:The OCR was restored from file backup_20130908_234857.ocr.

处理votedisk

[root@xifenfei xff-cluster]# crsctl replace votedisk +DG_SYS
Successful addition of voting disk 60a037da30714f6bbfe5d90206ff27a7.
Successfully replaced voting disk group with +DG_SYS.
CRS-4266: Voting file(s) successfully replaced

创建asm spfile

[grid@xifenfei dbs]$ vi /tmp/asm.txt
instance_type='asm'
large_pool_size=12M
remote_login_passwordfile= "EXCLUSIVE"
asm_diskstring           = "/dev/sd*"
asm_power_limit          = 1
[grid@xifenfei dbs]$ sqlplus '/ as sysasm'
SQL*Plus: Release 12.1.0.1.0 Production on Mon Sep 9 00:34:18 2013
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL>  create spfile='+DG_SYS' FROM pfile='/tmp/asm.txt';
File created.

重启crs

[root@xifenfei xff-cluster]# crsctl stop crs -f
[root@xifenfei xff-cluster]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[grid@xifenfei dbs]$ crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       xifenfei                 STABLE
ora.DATA.dg
               ONLINE  ONLINE       xifenfei                 STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       xifenfei                 STABLE
ora.net1.network
               ONLINE  ONLINE       xifenfei                 STABLE
ora.ons
               ONLINE  ONLINE       xifenfei                 STABLE
ora.proxy_advm
               ONLINE  OFFLINE      xifenfei                 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       xifenfei                 169.254.196.108 10.1
                                                             0.30.22,STABLE
ora.asm
      1        ONLINE  ONLINE       xifenfei                 Started,STABLE
      2        OFFLINE OFFLINE                               STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cdb.db
      1        ONLINE  ONLINE       xifenfei                 Open,STABLE
ora.cvu
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.oc4j
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       xifenfei                 STABLE
ora.xifenfei.vip
      1        ONLINE  ONLINE       xifenfei                 STABLE
--------------------------------------------------------------------------------

这里crs已经恢复正常,进一步检查ocr,votedisk,asm spfile情况

[grid@xifenfei ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          4
         Total space (kbytes)     :     409568
         Used space (kbytes)      :       1380
         Available space (kbytes) :     408188
         ID                       : 2132096904
         Device/File Name         :    +DG_SYS
                                    Device/File integrity check succeeded
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
         Cluster registry integrity check succeeded
         Logical corruption check bypassed due to non-privileged user
[grid@xifenfei ~]$ sqlplus / as sysasm
SQL*Plus: Release 12.1.0.1.0 Production on Mon Sep 9 16:12:21 2013
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> show parameter spfile;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      +DG_SYS/xff-cluster/ASMPARAMET
                                                 ERFILE/registry.253.825640465
SQL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
[grid@xifenfei ~]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   60a037da30714f6bbfe5d90206ff27a7 (/dev/sdc2) [DG_SYS]
Located 1 voting disk(s).

至此在ocr 磁盘组异常,而有ocr备份的情况下故障恢复完毕,对于没有ocr备份的故障,只能通过重建ocr来完成,大概步骤为

--deconfigure(root)
remote node
# <$GRID_HOME>/crs/install/rootcrs.pl -deconfig -force -verbose
lastnode
# <$GRID_HOME>/crs/install/rootcrs.pl -deconfig -force -verbose -lastnode
--配置信息重建ocr等(grid)
# $GRID_HOME/crs/config/config.sh

证明递归session存在并解释为什么不在v$session中显示

我们在数据库的使用过程中,有时候会遇到类似情况,我会话是登录的,但是我进行某种操作,缺报session不足.这种情况证明该sql后台还产生了其他会话,这里通过试验分析证明了递归session的存在
会话创建表报session超

CDB_PDB@CHF>  create table t_xifenfei(id number) ;
create table t_xifenfei(id number)
ERROR at line 1:
ORA-00018: maximum number of sessions exceeded

这里有个问题:当前会话已经登录成功了,证明当前session是足够的,但是为什么在执行创建表操作之时依然会报ORA-00018呢?通过10046继续分析

CDB_PDB@CHF> alter session set events '10046 TRACE NAME CONTEXT FOREVER, LEVEL 12';
会话已更改。
CDB_PDB@CHF> create table t_xifenfei as select * from dual;
表已创建。
CDB_PDB@CHF> select value from v$diag_info where name='Default Trace File';
VALUE
--------------------------------------------------------------------------------
E:\APP\XIFENFEI\diag\rdbms\cdb\cdb\trace\cdb_ora_6596.trc

分析trace文件

CDB_PDB@CHF> host tkprof  E:\APP\XIFENFEI\diag\rdbms\cdb\cdb\trace\cdb_ora_6596.trc d:/1.txt
--查看trace文件,发现里面有很多基表操作,拿其中的一个tab$表分析,创建表过程有如下insert操作
insert into tab$(obj#,ts#,file#,block#,bobj#,tab#,intcols,kernelcols,clucols,
  audit$,flags,pctfree$,pctused$,initrans,maxtrans,rowcnt,blkcnt,empcnt,
  avgspc,chncnt,avgrln,analyzetime,samplesize,cols,property,degree,instances,
  dataobj#,avgspc_flb,flbcnt,trigflag,spare1,spare6)
values
(:1,:2,:3,:4,decode(:5,0,null,:5),decode(:6,0,null,:6),:7,:8,decode(:9,0,null,
  :9),:10,:11,:12,:13,:14,:15,:16,:17,:18,:19,:20,:21,:22,:23,:24,:25,
  decode(:26,1,null,:26),decode(:27,1,null,:27),:28,:29,:30,:31,:32,:33)

尝试人工插入

CDB_PDB@CHF> insert into sys.tab$ select * from sys.tab$ where rownum=1;
insert into sys.tab$ select * from sys.tab$ where rownum=1
                *
第 1 行出现错误:
ORA-01031: 权限不足

证明当前执行创建表的session无权限直接操作tab$表,证明应该有其他表操作它

v$session视图基表
通过查询V$FIXED_VIEW_DEFINITION视图获得相关sql语句,不同版本可能有出入,但是大体一致

/* Formatted on 2013/11/8 23:09:30 (QP5 v5.227.12220.39754) */
SELECT inst_id,
       addr,
       indx,
       ksuseser,
       ksuudses,
       ksusepro,
       ksuudlui,
       ksuudlna,
       ksuudoct,
       ksusesow,
       DECODE (ksusetrn, HEXTORAW ('00'), NULL, ksusetrn),
       DECODE (ksqpswat, HEXTORAW ('00'), NULL, ksqpswat),
       DECODE (BITAND (ksuseidl, 11),
               1, 'ACTIVE',
               0, DECODE (BITAND (ksuseflg, 4096), 0, 'INACTIVE', 'CACHED'),
               2, 'SNIPED',
               3, 'SNIPED',
               'KILLED'),
       DECODE (ksspatyp,
               1, 'DEDICATED',
               2, 'SHARED',
               3, 'PSEUDO',
               'NONE'),
       ksuudsid,
       ksuudsna,
       ksuseunm,
       ksusepid,
       ksusemnm,
       ksusetid,
       ksusepnm,
       DECODE (BITAND (ksuseflg, 19),
               17, 'BACKGROUND',
               1, 'USER',
               2, 'RECURSIVE',
               '?'),
       ksusesql,
       ksusesqh,
       ksusepsq,
       ksusepha,
       ksuseapp,
       ksuseaph,
       ksuseact,
       ksuseach,
       ksusecli,
       ksusefix,
       ksuseobj,
       ksusefil,
       ksuseblk,
       ksuseslt,
       ksuseltm,
       ksusectm,
       DECODE (BITAND (ksusepfl, 16), 0, 'NO', 'YES'),
       DECODE (ksuseft,
               2, 'SESSION',
               4, 'SELECT',
               8, 'TRANSACTIONAL',
               'NONE'),
       DECODE (ksusefm,
               1, 'BASIC',
               2, 'PRECONNECT',
               4, 'PREPARSE',
               'NONE'),
       DECODE (ksusefs, 1, 'YES', 'NO'),
       ksusegrp,
       DECODE (BITAND (ksusepfl, 16),
               16, 'ENABLED',
               DECODE (BITAND (ksusepfl, 32), 32, 'FORCED', 'DISABLED')),
       DECODE (BITAND (ksusepfl, 64),
               64, 'FORCED',
               DECODE (BITAND (ksusepfl, 128), 128, 'DISABLED', 'ENABLED')),
       DECODE (BITAND (ksusepfl, 512),
               512, 'FORCED',
               DECODE (BITAND (ksusepfl, 256), 256, 'DISABLED', 'ENABLED')),
       ksusecqd,
       ksuseclid
  FROM x$ksuse
 WHERE BITAND (ksspaflg, 1) != 0 AND BITAND (ksuseflg, 1) != 0

注意:v$session查询的肯定是BITAND (ksuseflg, 1)!=0的记录

通过锁住表测试
CDB_PDB@SYS表示sys用户,CDB_PDB@CHF表示chf用户,使用两个session,不同用户测试

CDB_PDB@SYS> show user;
USER 为 "SYS"
--SYS用户锁住表
CDB_PDB@SYS> lock table tab$ IN exclusive MODE;
表已锁定。
CDB_PDB@CHF> show user;
USER 为 "CHF"
CDB_PDB@CHF>  select sid from v$mystat where rownum=1;
       SID
----------
        57
CDB_PDB@CHF> select paddr from v$session where sid=57;
PADDR
----------------
000007FF1E10F228
--CHF用户创建表
CDB_PDB@CHF>  create table t_xifenfei_new as select * from dual;
--SYS用户查询
CDB_PDB@SYS> SELECT s.addr,
  2         s.indx sid,
  3         s.ksuseser SERIAL#,
  4         ksuudsna username,
  5         DECODE (BITAND (ksuseflg, 19),
  6                 17, 'BACKGROUND',
  7                 1, 'USER',
  8                 2, 'RECURSIVE',
  9                 '?')
 10            TYPE
 11    FROM x$ksuse s
 12   WHERE ksusepro = '000007FF1E10F228';
ADDR                    SID    SERIAL# USERNAME                       TYPE
---------------- ---------- ---------- ------------------------------ ----------
000007FF1E1EBEA0         57         23 CHF                            USER
000007FF1E1D7F90         67        183 SYS                            RECURSIVE
CDB_PDB@SYS> SELECT ksuudsna username,
  2         ksuseflg
  3    FROM x$ksuse s
  4   WHERE ksusepro = '000007FF1E10F228';
USERNAME                         KSUSEFLG
------------------------------ ----------
CHF                             135266369
SYS                                     2
--这里我们发现递归sys调用的sql,在v$session视图中被排除了,因此递归sql的session不能在v$session显示
CDB_PDB@SYS> select bitand(2,1) from dual;
BITAND(2,1)
-----------
          0

至此,我们可以验证,我们当前的会话,在创建表的过程中有一个sys的递归session执行了关于基表的操作,但是由于v$session视图对于x$ksuse表中的部分记录进行了过滤因此我们不能在v$session查看到这些递归session

继续分析bitand函数
通过观察v$session的创建语句,我们可以发现如下规律,如果某个session是递归session,那么BITAND (ksuseflg, 19)=2,那当这个值为2的时候,是不是BITAND (ksuseflg, 1)一定为0呢?bitand函数实际上就是把里面的两个参数转换为二进制然后进行and运算,也就是两个对应位都为1的情况才会结果得带1(bitand(3,1)=1,bitand(2,1)=0),这里可以发现19转换为二进制为10011,要使得BITAND (ksuseflg, 19)=2成立,那就是说ksuseflg转换为二进制后,最后一位必须是0;而BITAND (ksuseflg, 1)在这样的情况下,一定为0,因此递归session的一定不会在v$session视图显示.

因RAC的undo_management参数不一致导致数据库mount报ORA-01105 ORA-01606

环境Linux 5.8 10.2.0.5 RAC,两个节点只能一个节点mount,如果尝试mount另外节点就报ORA-01105和ORA-01606错误
数据库版本

SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bi
PL/SQL Release 10.2.0.5.0 - Production
CORE    10.2.0.5.0      Production
TNS for Linux: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production

crs资源情况

[oracle@node1 dbs]$ $ORA_CRS_HOME/bin/crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora....D1.inst application    OFFLINE   OFFLINE
ora....D2.inst application    ONLINE    ONLINE    node2
ora.PROD.db    application    ONLINE    ONLINE    node2
ora....SM1.asm application    ONLINE    ONLINE    node1
ora....E1.lsnr application    ONLINE    ONLINE    node1
ora.node1.gsd  application    ONLINE    ONLINE    node1
ora.node1.ons  application    ONLINE    ONLINE    node1
ora.node1.vip  application    ONLINE    ONLINE    node1
ora....SM2.asm application    ONLINE    ONLINE    node2
ora....E2.lsnr application    ONLINE    ONLINE    node2
ora.node2.gsd  application    ONLINE    ONLINE    node2
ora.node2.ons  application    ONLINE    ONLINE    node2
ora.node2.vip  application    ONLINE    ONLINE    node2

节点1 mount报错

SQL> startup
ORACLE instance started.
Total System Global Area  171966464 bytes
Fixed Size                  2094832 bytes
Variable Size             113248528 bytes
Database Buffers           50331648 bytes
Redo Buffers                6291456 bytes
ORA-01105: mount is incompatible with mounts by other instances
ORA-01606: gc_files_to_locks not identical to that of another mounted instance
Error:    ORA 1105
Text:     mount is incompatible with mounts by other instances
-------------------------------------------------------------------------------
Cause:  An attempt was made to mount the database, but another instance has already mounted
        a database by the same name, and the mounts are not compatible.
        dditional messages will accompany this message to report why the mounts are incompatible.
Action:  See the accompanying messages for the appropriate action to take.
Error:  ORA 1606
Text:   GC_FILES_TO_LOCKS not identical to that of another mounted instance
-------------------------------------------------------------------------------
Cause:  The initialization parameter GC_FILES_TO_LOCKS is not the same as
        another instance mounted in parallel mode.
        This parameter must be the same as that for all shared instances.
Action: Modify the parameter to be compatible with the other instances, then
        shut down and restart the instance.

根据这个错误提示,查询两个节点的gc_files_to_locks参数,均为空值(默认值),也就是值相同

SQL> show parameter gc_files_to_locks;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
gc_files_to_locks                    string

检查两个节点的gc相关隐含参数,发现所有值也均一致

NAME                           DESCRIPTION                                                           VALUE
------------------------------ --------------------------------------------------------------------- -------
_gc_affinity_limit             dynamic affinity limit                                                50
_gc_affinity_minimum           dynamic affinity minimum activity per minute                          6000
_gc_affinity_time              if non zero, enable dynamic object affinity                           10
_gc_async_memcpy               if TRUE, use async memcpy                                             FALSE
_gc_check_bscn                 if TRUE, check for stale blocks                                       TRUE
_gc_coalesce_recovery_reads    if TRUE, coalesce recovery reads                                      TRUE
_gc_defer_time                 how long to defer down converts for hot buffers                       3
_gc_dissolve_undo_affinity     if TRUE, dissolve undo affinity after an offline                      FALSE
_gc_dynamic_affinity_locks     if TRUE, get dynamic affinity locks                                   TRUE
_gc_element_percent            global cache element percent                                          103
_gc_global_lru                 turn global lru off, make it automatic, or turn it on                 AUTO
_gc_initiate_undo_affinity     if TRUE, initiate undo affinity after an online                       TRUE
_gc_integrity_checks           set the integrity check level                                         1
_gc_keep_recovery_buffers      if TRUE, make recovery buffers current                                TRUE
_gc_latches                    number of latches per LMS process                                     8
_gc_maximum_bids               maximum number of bids which can be prepared                          0
_gcs_fast_reconfig             if TRUE, enable fast reconfiguration for gcs locks                    TRUE
_gcs_latches                   number of gcs resource hash latches to be allocated per LMS process   64
_gcs_pkey_history              number of pkey remastering history                                    4000
_gcs_process_in_recovery       if TRUE, process gcs requests during instance recovery                TRUE
_gcs_resources                 number of gcs resources to be allocated
_gcs_shadow_locks              number of pcm shadow locks to be allocated
_gc_statistics                 if TRUE, kcl statistics are maintained                                TRUE
_gcs_testing                   GCS testing parameter                                                 0
_gc_tsn_undo_affinity          if TRUE, use TSN undo affinity                                        TRUE
_gc_undo_affinity              if TRUE, enable dynamic undo affinity                                 TRUE
_gc_undo_affinity_locks        if TRUE, get affinity locks for undo                                  TRUE
_gc_use_cr                     if TRUE, allow CR pins on PI and WRITING buffers                      TRUE
_gc_vector_read                if TRUE, vector read current buffers                                  TRUE

仔细对比数据库参数,发现undo异常

--节点1
SQL>  show parameter undo
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
undo_management                      string      MANUAL
undo_retention                       integer     900
undo_tablespace                      string      SYSTEM
--节点2
SQL>  show parameter undo
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
undo_management                      string      AUTO
undo_retention                       integer     900
undo_tablespace                      string      UNDOTBS1

这里已经明确,因为两个节点的undo_*相关参数配置不正确,导致数据库只能一个节点mount。进一步定位问题发现,原来是因为dba粗心在编辑节点1的参数文件的时候把undo_*相关的参数给弄丢了,从而数据库使用了默认值undo_management=manual,undo_tablespace=system

win平台rman备份和删除dg备库归档日志脚本

总觉得使用windows跑oracle是不靠谱的事情,可以这个世界上总有很多人喜欢做类似这样的事情,对于数据库比较常见的两件事情:rman和删除dg备库归档日志,在linux/unix平台上使用shell实现很简单,可是跑到win里面,就变的烦了,不是因为其麻烦,而是因为用的人少,不知道怎么下手处理该事情,我编写了简单的实现初级功能的win下面rman备份和删除备库归档日志脚本,供大家参考,也更加欢迎朋友提出来更加好的处理方法(win是真心的不懂)
rman备份脚本

--backup_oracle.bat文件
rman target / cmdfile=D:\backup\rman\backup_db.rman
log=d:/backup/rman/logfile/rmanlog%date:~0,4%%date:~5,2%%date:~8,2%.log
--backup_db.rman文件
CONFIGURE RETENTION POLICY TO REDUNDANCY = 2;
CONFIGURE DEVICE TYPE DISK PARALLELISM 2;
CONFIGURE DEFAULT DEVICE TYPE TO DISK;
backup filesperset = 5 as compressed backupset database format 'd:/backup/rman/full_%U.rman';
sql 'alter system archive log current';
backup  filesperset = 50 as compressed backupset archivelog  all format 'd:/backup/rman/arch_%U.rman'  delete input;
DELETE noprompt OBSOLETE;
crosscheck backup;
delete noprompt expired backup;
backup  format 'd:/backup/rman/ctl_%U.rman' current controlfile;
backup spfile format 'd:/backup/rman/spfile_%U.rman' ;
exit;

backup_oracle.bat文件加入到计划任务即可

删除dg备库归档日志(已经应用)

--delete_dg_archivelog.bat
rem 注意修改 部署目录
cd D:\win_xifenfei
d:
rem 注意delete_archive.sql 查询是否有记录
echo delete archivelog staring > delete_archivelog.bak
sqlplus / as sysdba @delete_archive.sql
echo rman target / cmdfile=rman_checkcross.rman>>delete_archivelog.bat
delete_archivelog.bat >>delete_dg_archivelog_%DATE:~0,4%%DATE:~5,2%%DATE:~8,2%".log
exit
--delete_archive.sql
set lines 150
col name for a150
set pagesize 0 feedback off verify off heading off echo off
spool delete_archivelog.bat
select 'del '||name from v$archived_log where APPLIED='YES' AND NAME IS NOT NULL and DEST_ID=1;
spool off
exit;
--rman_checkcross.rman
crosscheck archivelog all;
delete noprompt expired archivelog all;
exit

delete_dg_archivelog.bat加入到计划任务即可

db_block_checksum实质是通过flg_kcbh来控制block checksum

db_block_checksum 用于DBWn和direct loader数据块写入到磁盘时,基于块内的所有字节计算得出一个校验值并将其写入块头。在该参数设置为typical和full时,当读入时候重新计算校验和写出时候的校验对比,如果不同则认为是块损坏。如果设置为FULL模式,则基于update/delete应用程序语句级别的改变发生后,校验值会被重新计算并写入。同时对于日志块,在写入之前,同样会生产校验值并写入到块头。该参数主要是防止IO硬件和IO子系统的错误。
这里提示我们是在系统参数级别使用db_block_checksum来控制block是否进行验证,那在block本身级别,是否有类似的值来控制,实现对block值的checksum?通过dump结合bbed给出相关答案
db_block_checksum为TYPICAL测试

CDB_CDB$ROOT@SYS> show parameter db_block_checksum;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_block_checksum                    string      TYPICAL
CDB_CDB$ROOT@SYS> conn / as sysdba
已连接。
CDB_CDB$ROOT@SYS> alter session set container=pdb;
会话已更改。
CDB_CDB$ROOT@SYS> alter database open;
数据库已更改。
CDB_CDB$ROOT@SYS> alter session set current_schema=chf;
会话已更改。
CDB_CDB$ROOT@SYS> create table t_xifenfei(id number,name varchar2(100));
表已创建。
CDB_CDB$ROOT@SYS> insert into t_xifenfei values(1,'www.xifenfei.com');
已创建 1 行。
CDB_CDB$ROOT@SYS> alter system checkpoint;
系统已更改。
CDB_CDB$ROOT@SYS>  select dbms_rowid.rowid_relative_fno(rowid) file_no,
2  dbms_rowid.rowid_block_number(rowid) block_no  from t_xifenfei;
   FILE_NO   BLOCK_NO
---------- ----------
         9      19229
CDB_CDB$ROOT@SYS> SELECT NAME FROM V$DATAFILE WHERE FILE#=9;
NAME
---------------------------------------------------------------------
E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF

dump 该block

buffer tsn: 3 rdba: 0x02404b1d (9/19229)
scn: 0xb8c.3c232935 seq: 0x01 flg: 0x06 tail: 0x29350601
frmt: 0x02 chkval: 0xe08b type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
tab 0, row 0, @0x1f81
tl: 23 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [16]  77 77 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d
end_of_block_dump
End dump data blocks tsn: 3 file#: 9 minblk 19229 maxblk 19229

这里可以看到因为db_block_checksum=TYPICAL,因此插入记录的时候,dump block发现flg: 0x06 和 chkval: 0xe08b

使用bbed查看相关记录
补充:在win系统中,bbed查看block和block num相差1

BBED> p kcbh
struct kcbh, 20 bytes                       @0
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub1 spare1_kcbh                          @2        0x00
   ub1 spare2_kcbh                          @3        0x00
   ub4 rdba_kcbh                            @4        0x02404b1d
   ub4 bas_kcbh                             @8        0x3c232935
   ub2 wrp_kcbh                             @12       0x0b8c
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x06 (KCBHFDLC, KCBHFCKV)
   ub2 chkval_kcbh                          @16       0xe08b
   ub2 spare3_kcbh                          @18       0x0000

通过对比bbed和dump出来数据,可以得出flg: 0x06(flg_kcbh),chkval: 0xe08b(chkval_kcbh)

使用bbed修改block验证chkval_kcbh
提问:在使用bbed修改block的时候,一般都需要使用sum apply 处理下,为什么呢?

BBED> d /v
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19230   Offsets: 8165 to 8191  Dba:0x00000000
-------------------------------------------------------
 2c010202 c1021077 77772e78 6966656e l ,...?.www.xifen
 6665692e 636f6d01 063529            l fei.com..5)
 <16 bytes per line>
--没有修改任何值,sum的current和required值相同,而且和chkval_kcbh也相同
BBED> sum
Check value for File 0, Block 19230:
current = 0xe08b, required = 0xe08b
--尝试修改值
BBED> m /x 78 offset 8173
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19230            Offsets: 8173 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 78772e78 6966656e 6665692e 636f6d01 063529
 <32 bytes per line>
BBED> d /v offset 8165
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19230   Offsets: 8165 to 8191  Dba:0x00000000
-------------------------------------------------------
 2c010202 c1021077 78772e78 6966656e l ,...?.wxw.xifen
 6665692e 636f6d01 063529            l fei.com..5)
 <16 bytes per line>
--发现current<>required
BBED> sum
Check value for File 0, Block 19230:
current = 0xe08b, required = 0xef8b
--apply把current修改为required值
BBED> sum apply
Check value for File 0, Block 19230:
current = 0xef8b, required = 0xef8b
--发现chkval_kcbh也修改为了required值
BBED> p kcbh.chkval_kcbh
ub2 chkval_kcbh                             @16       0xef8b

通过这里可以发现,在flg_kcbh含(KCBHFCKV)的情况下,如果block发生改变,则运行sum apply之后chkval_kcbh也发生改变

继续dump block

buffer tsn: 3 rdba: 0x02404b1d (9/19229)
scn: 0xb8c.3c232935 seq: 0x01 flg: 0x06 tail: 0x29350601
frmt: 0x02 chkval: 0xef8b type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
tab 0, row 0, @0x1f81
tl: 23 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [16]  77 78 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d
end_of_block_dump
End dump data blocks tsn: 3 file#: 9 minblk 19229 maxblk 19229

证明上述结论正确:dump.flg=bbed.flg_kcbh,dump.chkval=bbed.chkval_kcbh

db_block_checksum为FALSE测试

CDB_CDB$ROOT@SYS> alter system set db_block_checksum=false;
系统已更改。
CDB_CDB$ROOT@SYS> drop table t_xifenfei_new purge;
表已删除。
CDB_CDB$ROOT@SYS> alter session set current_schema=chf;
会话已更改。
CDB_CDB$ROOT@SYS>  create table t_xifenfei_new(id number,name varchar2(100));
表已创建。
CDB_CDB$ROOT@SYS>  insert into t_xifenfei_new values(1,'www.orasos.com');
已创建 1 行。
CDB_CDB$ROOT@SYS> commit;
提交完成。
CDB_CDB$ROOT@SYS>  select dbms_rowid.rowid_relative_fno(rowid) file_no,
 2  dbms_rowid.rowid_block_number(rowid) block_no  from t_xifenfei_new;
   FILE_NO   BLOCK_NO
---------- ----------
         9      19237
CDB_CDB$ROOT@SYS> alter system checkpoint;
系统已更改。
CDB_CDB$ROOT@SYS> alter system dump datafile 9 block 19237;
系统已更改。

dump block分析

buffer tsn: 3 rdba: 0x02404b25 (9/19237)
scn: 0xb8c.3c23c8b2 seq: 0x01 flg: 0x02 tail: 0xc8b20601
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
tab 0, row 0, @0x1f83
tl: 21 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [14]  77 77 77 2e 6f 72 61 73 6f 73 2e 63 6f 6d
end_of_block_dump
End dump data blocks tsn: 3 file#: 9 minblk 19237 maxblk 19237

这里可以发现当设置db_block_checksum=false之时,插入数据,显示flg: 0x02,chkval: 0x0000,由此猜测无对block写入进行部分验证(tailchk依然验证)

使用bbed修改block

BBED> set filename 'E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF'
        FILENAME        E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF
BBED> set blocksize 19238
BBED-00108: illegal BLOCKSIZE (19238) specified
BBED> set blocksize 8192
        BLOCKSIZE       8192
BBED> set block 8192
        BLOCK#          8192
BBED> set blocksize 19238
BBED-00108: illegal BLOCKSIZE (19238) specified
BBED> set block 19238
        BLOCK#          19238
BBED> p kcbh
struct kcbh, 20 bytes                       @0
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub1 spare1_kcbh                          @2        0x00
   ub1 spare2_kcbh                          @3        0x00
   ub4 rdba_kcbh                            @4        0x02404b25
   ub4 bas_kcbh                             @8        0x3c23c8b2
   ub2 wrp_kcbh                             @12       0x0b8c
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x02 (KCBHFDLC)
   ub2 chkval_kcbh                          @16       0x0000
   ub2 spare3_kcbh                          @18       0x0000
--这里看到flg_kcbh与chkval_kcbh和dump结果一致
BBED> sum
Check value for File 0, Block 19238:
current = 0x0000, required = 0x0000
--required为0,表示不验证
BBED> d /v offset 8174
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238   Offsets: 8174 to 8191  Dba:0x00000000
-------------------------------------------------------
 7777772e 6f726173 6f732e63 6f6d0106 l www.orasos.com..
 b2c8                                l 踩
 <16 bytes per line>
BBED> set mode edit
        MODE            Edit
BBED> m /x 78
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 8174 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 7877772e 6f726173 6f732e63 6f6d0106 b2c8
 <32 bytes per line>
BBED> d /v
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238   Offsets: 8174 to 8191  Dba:0x00000000
-------------------------------------------------------
 7877772e 6f726173 6f732e63 6f6d0106 l xww.orasos.com..
 b2c8                                l 踩
 <16 bytes per line>
--修改了block,但是sum依然提示required为0
BBED> sum
Check value for File 0, Block 19238:
current = 0x0000, required = 0x0000
BBED> sum apply
Check value for File 0, Block 19238:
current = 0x0000, required = 0x0000

dump block核对

buffer tsn: 3 rdba: 0x02404b25 (9/19237)
scn: 0xb8c.3c23c8b2 seq: 0x01 flg: 0x02 tail: 0xc8b20601
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
tab 0, row 0, @0x1f83
tl: 21 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [14]  78 77 77 2e 6f 72 61 73 6f 73 2e 63 6f 6d
end_of_block_dump
End dump data blocks tsn: 3 file#: 9 minblk 19237 maxblk 19237

修改flg_kcbh测试

BBED> m /x 06 offset 15
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets:   15 to   30           Dba:0x00000000
------------------------------------------------------------------------
 06000000 00010000 00d86b01 00aec823
 <32 bytes per line>
BBED> sum
Check value for File 0, Block 19238:
current = 0x0000, required = 0x04b3
--修改flg_kcbh=x06后,sum中的required出现了非0值,表示已经启动了block完整性检测
BBED> m /x 79 offset 1876
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 1876 to 1891           Dba:0x00000000
------------------------------------------------------------------------
 79391b01 0000c24e 07000205 c406573f
 <32 bytes per line>
BBED> sum
Check value for File 0, Block 19238:
current = 0x0000, required = 0x049d
--修改了chkval_kcbh值
BBED> sum apply
Check value for File 0, Block 19238:
current = 0x049d, required = 0x049d

再次检查dump block

buffer tsn: 3 rdba: 0x02404b25 (9/19237)
scn: 0xb8c.3c23c8b2 seq: 0x01 flg: 0x06 tail: 0xc8b20601
frmt: 0x02 chkval: 0x049d type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
tl: 21 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [14]  78 77 77 2e 6f 72 61 73 6f 73 2e 63 6f 6d
end_of_block_dump
End dump data blocks tsn: 3 file#: 9 minblk 19237 maxblk 19237

证明修改flg_kcbh后,block的sum验证起效

完整测试

--flg_kcbh=2,chkval_kcbh为04b3,修改block观察变化
BBED> sum
Check value for File 0, Block 19238:
current = 0x04b3, required = 0x04b3
BBED> p kcbh
struct kcbh, 20 bytes                       @0
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub1 spare1_kcbh                          @2        0x00
   ub1 spare2_kcbh                          @3        0x00
   ub4 rdba_kcbh                            @4        0x02404b25
   ub4 bas_kcbh                             @8        0x3c23c8b2
   ub2 wrp_kcbh                             @12       0x0b8c
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x02 (KCBHFDLC)
   ub2 chkval_kcbh                          @16       0x04b3
   ub2 spare3_kcbh                          @18       0x0000
BBED> m /x 11 offset 7184
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 7184 to 7199           Dba:0x00000000
------------------------------------------------------------------------
 110000c2 39b50002 05c40721 07280000
 <32 bytes per line>
BBED> undo
BBED> modify /x 00 filename 'E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF' block 19238. offset 7184.
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 7184 to 7199           Dba:0x00000000
------------------------------------------------------------------------
 000000c2 39b50002 05c40721 07280000
 <32 bytes per line>
BBED> m /x 11 offset 8174
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 8174 to 8189           Dba:0x00000000
------------------------------------------------------------------------
 1177772e 6f726173 6f732e63 6f6d0106
 <32 bytes per line>
BBED> sum
Check value for File 0, Block 19238:
current = 0x04b3, required = 0x04b3
--证明当flg_kcbh=2修改block之后chkval_kcbh依然为04b3,证明flg_kcbh不含(KCBHFCKV),
--修改block不会导致chkval_kcbh改变,也就是说,该block为启用db_block_checksum
--flg_kcbh=6的时候验证修改block导致的chkval_kcbh变化
BBED> m /x 06 offset 15
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets:   15 to   30           Dba:0x00000000
------------------------------------------------------------------------
 06b30400 00010000 00d86b01 00aec823
 <32 bytes per line>
--刚刚修改flg_kcbh=6,马上看到required非0
BBED> sum
Check value for File 0, Block 19238:
current = 0x04b3, required = 0x04da
BBED> sum apply
Check value for File 0, Block 19238:
current = 0x04da, required = 0x04da
--尝试修改block
BBED> m /x 22 offset 8174
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 8174 to 8189           Dba:0x00000000
------------------------------------------------------------------------
 2277772e 6f726173 6f732e63 6f6d0106
 <32 bytes per line>
--required发生改变
BBED> sum
Check value for File 0, Block 19238:
current = 0x04da, required = 0x04e9
BBED> sum apply
Check value for File 0, Block 19238:
current = 0x04e9, required = 0x04e9
--修改为以前值,验证required
BBED> m /x 11 offset 8174
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 8174 to 8189           Dba:0x00000000
------------------------------------------------------------------------
 1177772e 6f726173 6f732e63 6f6d0106
 <32 bytes per line>
BBED> sum
Check value for File 0, Block 19238:
current = 0x04e9, required = 0x04da
BBED> sum apply
Check value for File 0, Block 19238:
current = 0x04da, required = 0x04da
--发现当block修改回来后,required值和以前一致(也就是说chkval_kcbh值还原)
--进步一说明chkval_kcbh取决于block内部值

通过相关测试db_block_checksum是在实例级别启动block checksum,但是具体到每个block是通过flg_kcbh来控制,而具体体现是在chkval_kcbh值上

使用UltraEdit修改oracle二进制文件

对数据库OPEN的过程做10046,因为数据库没有open,那写着语句是写在什么地方的呢?通过使用UltraEdit打开$ORACLE_HOME/bin/oracle文件,可以发现很多sql语句,而且与open过程的10046很多一致.

044acdc0h: 73 65 6C 65 63 74 20 75 2E 6E 61 6D 65 2C 20 6F ; select u.name, o
044acdd0h: 32 2E 6E 61 6D 65 2C 20 6F 32 2E 6F 62 6A 23 20 ; 2.name, o2.obj#
044acde0h: 66 72 6F 6D 20 69 6E 64 24 20 69 2C 20 6F 62 6A ; from ind$ i, obj
044acdf0h: 24 20 6F 31 2C 20 6F 62 6A 24 20 6F 32 2C 20 75 ; $ o1, obj$ o2, u
044ace00h: 73 65 72 24 20 75 20 77 68 65 72 65 20 6F 31 2E ; ser$ u where o1.
044ace10h: 6F 77 6E 65 72 23 20 3D 20 3A 31 20 20 61 6E 64 ; owner# = :1  and
044ace20h: 20 6F 31 2E 74 79 70 65 23 20 3D 20 32 20 61 6E ;  o1.type# = 2 an
044ace30h: 64 20 69 2E 74 79 70 65 23 20 3D 20 39 20 61 6E ; d i.type# = 9 an
044ace40h: 64 20 69 2E 62 6F 23 20 3D 20 6F 31 2E 6F 62 6A ; d i.bo# = o1.obj
044ace50h: 23 20 20 61 6E 64 20 69 2E 6F 62 6A 23 20 3D 20 ; #  and i.obj# =
044ace60h: 6F 32 2E 6F 62 6A 23 20 61 6E 64 20 6F 32 2E 6F ; o2.obj# and o2.o
044ace70h: 77 6E 65 72 23 20 3D 20 75 2E 75 73 65 72 23 20 ; wner# = u.user#
044ace80h: 61 6E 64 20 69 2E 74 79 70 65 23 20 3D 20 39 20 ; and i.type# = 9
044ace90h: 61 6E 64 20 6F 31 2E 6F 77 6E 65 72 23 20 21 3D ; and o1.owner# !=
044acea0h: 20 6F 32 2E 6F 77 6E 65 72 23 00 00 00 00 00 00 ;  o2.owner#......
044aceb0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
044acec0h: 73 65 6C 65 63 74 20 75 2E 6E 61 6D 65 2C 20 6F ; select u.name, o
044aced0h: 2E 6E 61 6D 65 2C 20 6F 2E 6F 62 6A 23 20 66 72 ; .name, o.obj# fr
044acee0h: 6F 6D 20 6F 62 6A 24 20 6F 2C 20 75 73 65 72 24 ; om obj$ o, user$
044acef0h: 20 75 2C 20 69 6E 64 24 20 69 20 77 68 65 72 65 ;  u, ind$ i where
044acf00h: 20 6F 2E 6F 77 6E 65 72 23 3D 3A 31 20 61 6E 64 ;  o.owner#=:1 and
044acf10h: 20 6F 2E 6F 77 6E 65 72 23 3D 75 2E 75 73 65 72 ;  o.owner#=u.user
044acf20h: 23 20 61 6E 64 20 6F 2E 6F 62 6A 23 3D 69 2E 6F ; # and o.obj#=i.o
044acf30h: 62 6A 23 20 61 6E 64 20 69 2E 74 79 70 65 23 3D ; bj# and i.type#=
044acf40h: 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; 9...............
044acf50h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
044acf60h: 73 65 6C 65 63 74 20 76 6E 61 6D 65 20 66 72 6F ; select vname fro
044acf70h: 6D 20 73 79 73 2E 73 6E 61 70 24 20 77 68 65 72 ; m sys.snap$ wher
044acf80h: 65 20 73 6F 77 6E 65 72 20 3D 20 3A 31 20 61 6E ; e sowner = :1 an
044acf90h: 64 20 69 6E 73 74 73 69 74 65 20 3D 20 30 20 20 ; d instsite = 0
044acfa0h: 61 6E 64 20 28 62 69 74 61 6E 64 28 66 6C 61 67 ; and (bitand(flag
044acfb0h: 2C 20 32 36 38 34 33 35 34 35 36 29 20 3D 20 30 ; , 268435456) = 0
044acfc0h: 20 6F 72 20 62 69 74 61 6E 64 28 6F 62 6A 66 6C ;  or bitand(objfl
044acfd0h: 61 67 2C 20 33 32 29 20 3D 20 30 29 00 00 00 00 ; ag, 32) = 0)....

既然数据库OPEN的过程很多sql是在oracle二进制文件中写的,那是否可以考虑通过修改二进制文件中的sql语句来改变执行计划,甚至可以尝试修改这些语句使得异常的数据库能够正常open.这里演示通过修改sql语句来展示改变执行计划(不使用index,而改用全表扫描)
10046 捕获语句和执行计划

PARSING IN CURSOR #2 len=116 dep=2 uid=0 oct=3 lid=0 tim=1345475346332403 hv=854877822 ad='2f2be060'
select o.owner#,o.name,o.namespace,o.remoteowner,o.linkname,o.subname,o.dataobj#,o.flags from obj$ o where o.obj#=:1
END OF STMT
PARSE #2:c=1000,e=1258,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,tim=1345475346332394
BINDS #2:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=b69ef000  bln=22  avl=03  flg=05
  value=893
EXEC #2:c=2000,e=1382,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,tim=1345475346333978
WAIT #2: nam='db file sequential read' ela= 17820 file#=1 block#=220 blocks=1 obj#=-1 tim=1345475346351927
FETCH #2:c=1000,e=18054,p=1,cr=3,cu=0,mis=0,r=1,dep=2,og=4,tim=1345475346352100
STAT #2 id=1 cnt=1 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ$ (cr=3 pr=1 pw=0 time=18047 us)'
STAT #2 id=2 cnt=1 pid=1 pos=1 obj=36 op='INDEX UNIQUE SCAN I_OBJ1 (cr=2 pr=1 pw=0 time=18014 us)'

使用UE查看该sql在二进制文件内容

0459c650h: 3A 35 2C 3A 36 2C 3A 37 29 00 00 00 00 00 00 00 ; :5,:6,:7).......
0459c660h: 73 65 6C 65 63 74 20 6F 2E 6F 77 6E 65 72 23 2C ; select o.owner#,
0459c670h: 6F 2E 6E 61 6D 65 2C 6F 2E 6E 61 6D 65 73 70 61 ; o.name,o.namespa
0459c680h: 63 65 2C 6F 2E 72 65 6D 6F 74 65 6F 77 6E 65 72 ; ce,o.remoteowner
0459c690h: 2C 6F 2E 6C 69 6E 6B 6E 61 6D 65 2C 6F 2E 73 75 ; ,o.linkname,o.su
0459c6a0h: 62 6E 61 6D 65 2C 6F 2E 64 61 74 61 6F 62 6A 23 ; bname,o.dataobj#
0459c6b0h: 2C 6F 2E 66 6C 61 67 73 20 66 72 6F 6D 20 6F 62 ; ,o.flags from ob
0459c6c0h: 6A 24 20 6F 20 77 68 65 72 65 20 6F 2E 6F 62 6A ; j$ o where o.obj
0459c6d0h: 23 3D 3A 31 00 00 00 00 00 00 00 00 00 00 00 00 ; #=:1............
0459c6e0h: 73 65 6C 65 63 74 20 74 79 70 65 23 2C 63 6F 6E ; select type#,con

修改二进制文件中sql语句

0459c650h: 3A 35 2C 3A 36 2C 3A 37 29 00 00 00 00 00 00 00 ; :5,:6,:7).......
0459c660h: 73 65 6C 65 63 74 20 6F 2E 6F 77 6E 65 72 23 2C ; select o.owner#,
0459c670h: 6F 2E 6E 61 6D 65 2C 6F 2E 6E 61 6D 65 73 70 61 ; o.name,o.namespa
0459c680h: 63 65 2C 6F 2E 72 65 6D 6F 74 65 6F 77 6E 65 72 ; ce,o.remoteowner
0459c690h: 2C 6F 2E 6C 69 6E 6B 6E 61 6D 65 2C 6F 2E 73 75 ; ,o.linkname,o.su
0459c6a0h: 62 6E 61 6D 65 2C 6F 2E 64 61 74 61 6F 62 6A 23 ; bname,o.dataobj#
0459c6b0h: 2C 6F 2E 66 6C 61 67 73 20 66 72 6F 6D 20 6F 62 ; ,o.flags from ob
0459c6c0h: 6A 24 20 6F 20 77 68 65 72 65 20 6F 62 6A 23 2B ; j$ o where obj#+
0459c6d0h: 30 3D 3A 31 00 00 00 00 00 00 00 00 00 00 00 00 ; 0=:1............
0459c6e0h: 73 65 6C 65 63 74 20 74 79 70 65 23 2C 63 6F 6E ; select type#,con

这里通过对obj#加上一个常量0,使得该sql在执行之时不会使用obj$.obj#上的index,从而使用全表扫描,来实现我们修改执行计划,屏蔽该index的目的
修改后的10046观察

PARSING IN CURSOR #2 len=116 dep=2 uid=0 oct=3 lid=0 tim=1345475781593851 hv=493726595 ad='2f2ba76c'
select o.owner#,o.name,o.namespace,o.remoteowner,o.linkname,o.subname,o.dataobj#,o.flags from obj$ o where obj#+0=:1
END OF STMT
PARSE #2:c=1000,e=1095,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,tim=1345475781593840
BINDS #2:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=b6a86000  bln=22  avl=03  flg=05
  value=893
EXEC #2:c=2999,e=1603,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,tim=1345475781595754
WAIT #2: nam='db file sequential read' ela= 18417 file#=1 block#=121 blocks=1 obj#=-1 tim=1345475781614344
WAIT #2: nam='db file sequential read' ela= 1000 file#=1 block#=123 blocks=1 obj#=-1 tim=1345475781615726
WAIT #2: nam='db file sequential read' ela= 1462 file#=1 block#=124 blocks=1 obj#=-1 tim=1345475781617720
WAIT #2: nam='db file sequential read' ela= 492 file#=1 block#=125 blocks=1 obj#=-1 tim=1345475781618452
WAIT #2: nam='db file sequential read' ela= 1358 file#=1 block#=126 blocks=1 obj#=-1 tim=1345475781620361
WAIT #2: nam='db file sequential read' ela= 500 file#=1 block#=127 blocks=1 obj#=-1 tim=1345475781621039
WAIT #2: nam='db file sequential read' ela= 63132 file#=1 block#=128 blocks=1 obj#=-1 tim=1345475781684316
WAIT #2: nam='db file sequential read' ela= 989 file#=1 block#=5233 blocks=1 obj#=-1 tim=1345475781685559
WAIT #2: nam='db file sequential read' ela= 792 file#=1 block#=5234 blocks=1 obj#=-1 tim=1345475781686583
FETCH #2:c=7999,e=90994,p=9,cr=13,cu=0,mis=0,r=1,dep=2,og=4,tim=1345475781686865
STAT #2 id=1 cnt=1 pid=0 pos=1 obj=18 op='TABLE ACCESS FULL OBJ$ (cr=13 pr=9 pw=0 time=90977 us)'

很明显,到这里我们通过UE修改oracle二进制文件,实现了sql语句执行计划的更改.
免责申明:本方法仅供在非常特殊或者测试环境下使用,一般生产环境请勿模仿,否则后果自负

误杀进程导致rac hang住

有客户反馈系统hang住,不能归档,需要我们紧急介入分析
节点1日志
出现redo不能归档,redo日志都已经被写满,人工执行了ALTER SYSTEM ARCHIVE LOG CURRENT,数据库就开始把redo全部归档,但是后面产生的redo又不能归档,当redo全部写满之后,数据库有出现大量log file switch (archiving needed)等待

Tue Sep 24 22:05:37 2013
Thread 1 advanced to log sequence 47282 (LGWR switch)
  Current log# 6 seq# 47282 mem# 0: +DATA/q9db/onlinelog/group_6.1244.818697409
Tue Sep 24 22:07:31 2013
ORACLE Instance q9db1 - Can not allocate log, archival required
Thread 1 cannot allocate new log, sequence 47283
All online logs needed archiving
  Current log# 6 seq# 47282 mem# 0: +DATA/q9db/onlinelog/group_6.1244.818697409
Tue Sep 24 22:28:17 2013
ALTER SYSTEM ARCHIVE LOG
Archived Log entry 259646 added for thread 1 sequence 47266 ID 0x354620c2 dest 1:
Tue Sep 24 22:28:18 2013
Thread 1 advanced to log sequence 47283 (LGWR switch)
  Current log# 7 seq# 47283 mem# 0: +DATA/q9db/onlinelog/group_7.1243.818697415
Archived Log entry 259647 added for thread 1 sequence 47267 ID 0x354620c2 dest 1:
Archived Log entry 259648 added for thread 1 sequence 47268 ID 0x354620c2 dest 1:
Archived Log entry 259649 added for thread 1 sequence 47269 ID 0x354620c2 dest 1:
Archived Log entry 259650 added for thread 1 sequence 47270 ID 0x354620c2 dest 1:
Archived Log entry 259651 added for thread 1 sequence 47271 ID 0x354620c2 dest 1:
Archived Log entry 259652 added for thread 1 sequence 47272 ID 0x354620c2 dest 1:
Tue Sep 24 22:28:28 2013
Archived Log entry 259653 added for thread 1 sequence 47273 ID 0x354620c2 dest 1:
Archived Log entry 259654 added for thread 1 sequence 47274 ID 0x354620c2 dest 1:
Archived Log entry 259655 added for thread 1 sequence 47275 ID 0x354620c2 dest 1:
Archived Log entry 259656 added for thread 1 sequence 47276 ID 0x354620c2 dest 1:
Archived Log entry 259657 added for thread 1 sequence 47277 ID 0x354620c2 dest 1:
Archived Log entry 259658 added for thread 1 sequence 47278 ID 0x354620c2 dest 1:
Archived Log entry 259659 added for thread 1 sequence 47279 ID 0x354620c2 dest 1:
Tue Sep 24 22:28:39 2013
Archived Log entry 259660 added for thread 1 sequence 47280 ID 0x354620c2 dest 1:
Archived Log entry 259661 added for thread 1 sequence 47281 ID 0x354620c2 dest 1:
Archived Log entry 259662 added for thread 1 sequence 47282 ID 0x354620c2 dest 1:
Tue Sep 24 22:29:39 2013
Thread 1 advanced to log sequence 47284 (LGWR switch)
  Current log# 8 seq# 47284 mem# 0: +DATA/q9db/onlinelog/group_8.1242.818697417
Tue Sep 24 22:31:18 2013
Thread 1 advanced to log sequence 47285 (LGWR switch)
  Current log# 16 seq# 47285 mem# 0: +DATA/q9db/onlinelog/group_16.1884.827003545
Thread 1 advanced to log sequence 47286 (LGWR switch)
  Current log# 17 seq# 47286 mem# 0: +DATA/q9db/onlinelog/group_17.1885.827003587

节点2日志
节点2中出现大量的IPC Send timeout

Tue Sep 24 15:22:19 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
…………
Tue Sep 24 18:51:55 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
Tue Sep 24 18:57:54 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
Receiver: inst 1 binc 464003926 ospid 1566
Tue Sep 24 19:03:57 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
Receiver: inst 1 binc 464003926 ospid 1566
Tue Sep 24 19:09:53 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
…………
Tue Sep 24 20:22:00 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]

节点1因为不能归档hang住,节点2紧接着也就hang住。对节点1hang住之时对两个节点分别做systemstate dump,使用ass进行分析得到节点1和节点2的记录大体如下:
节点1

393:waiting for 'log file switch (archiving needed)'
394:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
395:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
397:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
398:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
451:waiting for 'SQL*Net message from client'
469:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
470:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
471:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
618:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
626:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
NO BLOCKING PROCESSES FOUND

节点2

515:waiting for 'gc buffer busy acquire'
     Cmd: Insert
516:waiting for 'gc buffer busy acquire'
     Cmd: Insert
517:waiting for 'gc buffer busy acquire'
     Cmd: Insert
518:waiting for 'gc buffer busy acquire'
     Cmd: Insert
519:waiting for 'gc buffer busy acquire'
     Cmd: Insert
520:waiting for 'gc buffer busy acquire'
     Cmd: Select
521:waiting for 'gc current request'
     Cmd: Insert
522:waiting for 'enq: TX - row lock contention'[Enq TX-00BA0020-001E3E3C]
     Cmd: Select
523:waiting for 'gc buffer busy acquire'
     Cmd: Insert
524:waiting for 'SQL*Net message from client'
525:waiting for 'gc buffer busy acquire'
     Cmd: Insert
526:waiting for 'gc buffer busy acquire'
     Cmd: Insert
527:waiting for 'enq: TX - row lock contention'[Enq TX-00BA0020-001E3E3C]
     Cmd: Select
528:waiting for 'SQL*Net message from client'
529:waiting for 'gc buffer busy acquire'
     Cmd: Select
                    Resource Holder State
    Enq TX-0005001E-0022374F   223: waiting for 'gc current request'
    Enq TX-0047001B-002BCEB2   247: waiting for 'gc current request'
    Enq TX-015B001E-000041FF   330: waiting for 'gc current request'
    Enq TX-00010010-002EA7CD   179: waiting for 'gc current request'
    Enq TX-00BA0020-001E3E3C    ??? Blocker
Object Names
~~~~~~~~~~~~
Enq TX-0005001E-0022374F
Enq TX-0047001B-002BCEB2
Enq TX-015B001E-000041FF
Enq TX-00010010-002EA7CD
Enq TX-00BA0020-001E3E3C

通过这里,我们可以明白,节点2的很多事务hang住是因为请求gc current request,而该等待是因为节点1无法归档,有些block无法正常传输到节点2,导致节点2一直hang在这里,然后就出现IPC Send timeout;节点1上的事务阻塞甚至hang住是因为无法归档导致.到此需要定位的问题是为什么节点1不能归档
继续分析节点1 alert日志

Tue Sep 24 15:18:20 2013
opidrv aborting process O000 ospid (7332) as a result of ORA-28
Immediate Kill Session#: 1904, Serial#: 1065
Immediate Kill Session: sess: 0x24a2522a38  OS pid: 7338
Immediate Kill Session#: 3597, Serial#: 11107
Immediate Kill Session: sess: 0x24c27cf498  OS pid: 7320
Tue Sep 24 15:18:23 2013
opidrv aborting process W000 ospid (7980) as a result of ORA-28
Tue Sep 24 15:18:23 2013
opidrv aborting process W001 ospid (8560) as a result of ORA-28
Tue Sep 24 15:18:35 2013
LGWR: Detected ARCH process failure
LGWR: Detected ARCH process failure
LGWR: Detected ARCH process failure
LGWR: Detected ARCH process failure
LGWR: STARTING ARCH PROCESSES
Tue Sep 24 15:18:35 2013
ARC0 started with pid=66, OS id=10793
Tue Sep 24 15:18:35 2013
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db1/trace/q9db1_nsa2_12635.trc:
ORA-00028: your session has been killed
LNS: Failed to archive log 8 thread 1 sequence 47156 (28)
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
Thread 1 advanced to log sequence 47157 (LGWR switch)
ARC0: STARTING ARCH PROCESSES
  Current log# 9 seq# 47157 mem# 0: +DATA/q9db/onlinelog/group_9.1241.818697421
Tue Sep 24 15:18:36 2013
ARC1 started with pid=81, OS id=10805
Tue Sep 24 15:18:36 2013
ARC2 started with pid=84, OS id=10807
Tue Sep 24 15:18:36 2013
ARC3 started with pid=87, OS id=10809
ARC1: Archival started
ARC2: Archival started
ARC2: Becoming the 'no FAL' ARCH
ARC2: Becoming the 'no SRL' ARCH
ARC1: Becoming the heartbeat ARCH
Error 1031 received logging on to the standby
PING[ARC1]: Heartbeat failed to connect to standby 'q9adgdg'. Error is 1031.
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Archived Log entry 259135 added for thread 1 sequence 47156 ID 0x354620c2 dest 1:
Error 1031 received logging on to the standby
FAL[server, ARC3]: Error 1031 creating remote archivelog file 'q9adgdg'
FAL[server, ARC3]: FAL archive failed, see trace file.
ARCH: FAL archive failed. Archiver continuing
ORACLE Instance q9db1 - Archival Error. Archiver continuing.
Tue Sep 24 15:18:46 2013
opidrv aborting process O001 ospid (9605) as a result of ORA-28
Tue Sep 24 15:18:46 2013
opidrv aborting process O000 ospid (10813) as a result of ORA-28
Tue Sep 24 15:18:46 2013
Immediate Kill Session#: 2909, Serial#: 369
Immediate Kill Session: sess: 0x24226c7200  OS pid: 9091
Immediate Kill Session#: 3380, Serial#: 30271
Immediate Kill Session: sess: 0x2422782c58  OS pid: 10265
Immediate Kill Session#: 3597, Serial#: 11109
Immediate Kill Session: sess: 0x24c27cf498  OS pid: 10267
Tue Sep 24 15:20:14 2013
Restarting dead background process DIAG
Tue Sep 24 15:20:14 2013
DIAG started with pid=64, OS id=20568
Restarting dead background process PING
Tue Sep 24 15:20:14 2013
PING started with pid=68, OS id=20570
Restarting dead background process LMHB
Tue Sep 24 15:20:14 2013
LMHB started with pid=70, OS id=20572
Restarting dead background process SMCO
…………
Tue Sep 24 15:23:13 2013
ARC0: Detected ARCH process failure
Tue Sep 24 15:23:13 2013
Thread 1 advanced to log sequence 47158 (LGWR switch)
  Current log# 10 seq# 47158 mem# 0: +DATA/q9db/onlinelog/group_10.1240.818697423
ARC0: STARTING ARCH PROCESSES
ARC0: STARTING ARCH PROCESSES COMPLETE
ARC0: Becoming the heartbeat ARCH
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance q9db1 - Archival Error
ORA-00028: your session has been killed

查看ARCn进程

[oracle@q9db01 ~]$ ps -ef|grep ora_ar
oracle   20718 12870  0 22:07 pts/14   00:00:00 grep ora_ar
[oracle@q9db01 ~]$ ps -ef|grep ora_ar
oracle   25998 12870  0 22:07 pts/14   00:00:00 grep ora_ar

这里基本上明白了,因为客户的系统从15:15开始由于中间件程序异常,导致大量会话连接数据库,然后dba为了防止其他业务不受影响,然后开始大量通过alter system kill session,误杀了不少系统进程,包括ARCn(0,1,2,3)进程,在后面ARCn进程因为某种原因无法正常启动,导致redo无法归档,所有的redo组写满系统即hang住,该系统由于大量kill session已经导致了实例本身异常(正常情况ARCn进程kill之后会自动重启),处理方案:先增加redo组配合定时人工归档,等待业务低峰重启节点1,解决问题。温馨提醒:kill进程请小心

ALERT: Disable Transparent HugePages on SLES11, RHEL6, OEL6 and UEK2 Kernels

随着去ioe的潮流越来越汹涌,随着内存越来越便宜,使用Liunx大内存跑核心业务的系统越来越多。因为大内存,多会话,使得配置HugePages也成了必然的选择,在以前的文章中介绍过怎么去配置HugePages(Liunx系统中Oracle使用HugePages配置)以及因为没有配置Hugepage导致的故障(因未配置Hugepage会话数添增悲剧案例).但是随着Linux 6(RedHat 6, OEL 6, SLES 11 and UEK2 kernels等)的广泛应用,他们所在HugePages方面所具有的新特性(Transparent HugePages)很多人可能还没有注意到,Oracle强烈建议无论是rac还是单实例都关闭该特性.
检查是否启用Transparent HugePages方法

To check if the Transparent HugePages are enabled in your server execute the following:
Default/Enabled setting is  [always]:
# cat /sys/kernel/mm/transparent_hugepage/enabled
[always] never
 Disabled setting is [never]:
# cat /sys/kernel/mm/transparent_hugepage/enabled
always [never]
If "enabled" is NOT set to "[never]", the Transparent HugePages are being used.
You can also issue:
# grep AnonHugePages /proc/meminfo
If the output contains a line like "AnonHugepages: xxxx kB", with a value > 0kB the kernel is using Transparent HugePages.

禁用Transparent HugePages方法

Add the following to the kernel boot line in /etc/grub.conf (this is the preferred method) and reboot the server:
transparent_hugepage=never
Once modified the line will read similar to the following example:
title Oracle Linux Server (2.6.32-300.25.1.el6uek.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-300.25.1.el6uek.x86_64 ro root=LABEL=/ transparent_hugepage=never
        initrd /initramfs-2.6.32-300.25.1.el6uek.x86_64.img
OR
Add the following lines in /etc/rc.local and reboot the server:
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
   echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
   echo never > /sys/kernel/mm/transparent_hugepage/defrag
fi

具体见文档:ALERT Disable Transparent HugePages on SLES11 RHEL6 OEL6 and UEK2 Kernels(Doc ID 1557478.1)

oerr ora code本质探讨

oerr ora code 命令

[oracle@q9-gg02 ~]$ oerr ora 10047
10047, 00000, "trace switching of sessions"
// *Cause:
// *Action:

unix/linux系统下$ORACLE_HOME/rdbms/mesg/oraus.msg包含ora-记录

[oracle@q9-gg02 ~]$ vi $ORACLE_HOME/rdbms/mesg/oraus.msg
…………
10047, 00000, "trace switching of sessions"
// *Cause:
// *Action:
10048, 00000, "Undo segment shrink"
// *Cause:
// *Action:
10049, 00000, "protect library cache memory heaps"
// *Cause:
// *Action: Use the OS memory protection (if available) to protect library
//          cache memory heaps that are pinned.

oerr ora code本质

[oracle@q9-gg02 ~]$ strace oerr ora 10047
execve("/u01/oracle/app/product/10.2/db_1/bin/oerr", ["oerr", "ora", "10047"], [/* 34 vars */]) = 0
brk(0)                                  = 0x1c239000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b60db3fe000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b60db3ff000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/u01/oracle/app/product/10.2/db_1/lib/tls/x86_64/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/u01/oracle/app/product/10.2/db_1/lib/tls/x86_64", 0x7fff7a3cb830) = -1 ENOENT (No such file or directory)
open("/u01/oracle/app/product/10.2/db_1/lib/tls/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/u01/oracle/app/product/10.2/db_1/lib/tls", 0x7fff7a3cb830) = -1 ENOENT (No such file or directory)
open("/u01/oracle/app/product/10.2/db_1/lib/x86_64/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/u01/oracle/app/product/10.2/db_1/lib/x86_64", 0x7fff7a3cb830) = -1 ENOENT (No such file or directory)
open("/u01/oracle/app/product/10.2/db_1/lib/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/u01/oracle/app/product/10.2/db_1/lib", {st_mode=S_IFDIR|0750, st_size=12288, ...}) = 0
open("tls/x86_64/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("tls/libtermcap.so.2", O_RDONLY)   = -1 ENOENT (No such file or directory)
open("x86_64/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("libtermcap.so.2", O_RDONLY)       = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=154281, ...}) = 0
mmap(NULL, 154281, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b60db400000
close(3)                                = 0
open("/lib64/libtermcap.so.2", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\20 \214>\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=15840, ...}) = 0
mmap(0x3e8c200000, 2108944, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3e8c200000
mprotect(0x3e8c203000, 2093056, PROT_NONE) = 0
mmap(0x3e8c402000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x3e8c402000
close(3)                                = 0
open("/u01/oracle/app/product/10.2/db_1/lib/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("tls/x86_64/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("tls/libdl.so.2", O_RDONLY)        = -1 ENOENT (No such file or directory)
open("x86_64/libdl.so.2", O_RDONLY)     = -1 ENOENT (No such file or directory)
open("libdl.so.2", O_RDONLY)            = -1 ENOENT (No such file or directory)
open("/lib64/libdl.so.2", O_RDONLY)     = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\16`\214>\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=23360, ...}) = 0
mmap(0x3e8c600000, 2109696, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3e8c600000
mprotect(0x3e8c602000, 2097152, PROT_NONE) = 0
mmap(0x3e8c802000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x3e8c802000
close(3)                                = 0
open("/u01/oracle/app/product/10.2/db_1/lib/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("tls/x86_64/libc.so.6", O_RDONLY)  = -1 ENOENT (No such file or directory)
open("tls/libc.so.6", O_RDONLY)         = -1 ENOENT (No such file or directory)
open("x86_64/libc.so.6", O_RDONLY)      = -1 ENOENT (No such file or directory)
open("libc.so.6", O_RDONLY)             = -1 ENOENT (No such file or directory)
open("/lib64/libc.so.6", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\332\341\213>\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1722328, ...}) = 0
mmap(0x3e8be00000, 3502424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3e8be00000
mprotect(0x3e8bf4f000, 2093056, PROT_NONE) = 0
mmap(0x3e8c14e000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14e000) = 0x3e8c14e000
mmap(0x3e8c153000, 16728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3e8c153000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b60db426000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b60db427000
arch_prctl(ARCH_SET_FS, 0x2b60db426fe0) = 0
mprotect(0x3e8c802000, 4096, PROT_READ) = 0
mprotect(0x3e8c14e000, 16384, PROT_READ) = 0
mprotect(0x3e8bc1c000, 4096, PROT_READ) = 0
munmap(0x2b60db400000, 154281)          = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
open("/dev/tty", O_RDWR|O_NONBLOCK)     = 3
close(3)                                = 0
brk(0)                                  = 0x1c239000
brk(0x1c25a000)                         = 0x1c25a000
open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=56458208, ...}) = 0
mmap(NULL, 56458208, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b60db428000
close(3)                                = 0
getuid()                                = 1000
getgid()                                = 1000
geteuid()                               = 1000
getegid()                               = 1000
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
open("/proc/meminfo", O_RDONLY)         = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b60dea00000
read(3, "MemTotal:     132051308 kB\nMemFr"..., 1024) = 781
close(3)                                = 0
munmap(0x2b60dea00000, 4096)            = 0
rt_sigaction(SIGCHLD, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGCHLD, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGQUIT, {0x1, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
uname({sys="Linux", node="q9-gg02.800best.com", ...}) = 0
stat("/home/oracle", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
stat(".", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
getpid()                                = 25928
open("/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=25464, ...}) = 0
mmap(NULL, 25464, PROT_READ, MAP_SHARED, 3, 0) = 0x2b60dea00000
close(3)                                = 0
getppid()                               = 25927
getpgrp()                               = 25927
rt_sigaction(SIGCHLD, {0x436360, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
open("/u01/oracle/app/product/10.2/db_1/bin/oerr", O_RDONLY) = 3
ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff7a3cc060) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "#!/bin/sh\n#\n# $Id: oerr 28-aug-2"..., 80) = 80
lseek(3, 0, SEEK_SET)                   = 0
getrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=64*1024}) = 0
dup2(3, 255)                            = 255
close(3)                                = 0
fcntl(255, F_SETFD, FD_CLOEXEC)         = 0
fcntl(255, F_GETFL)                     = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fstat(255, {st_mode=S_IFREG|0755, st_size=2365, ...}) = 0
lseek(255, 0, SEEK_CUR)                 = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "#!/bin/sh\n#\n# $Id: oerr 28-aug-2"..., 2365) = 2365
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
pipe([3, 4])                            = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
lseek(255, -973, SEEK_CUR)              = 1392
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b60db427070) = 25929
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGCHLD, {0x436360, [], SA_RESTORER, 0x3e8be302d0}, {0x436360, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
close(4)                                = 0
read(3, "ora:rdbms:*:\n", 128)          = 13
read(3, "", 128)                        = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25929
wait4(-1, 0x7fff7a3cb804, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn(0)                         = 0
close(3)                                = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "if [ $? -ne 0 ]\nthen\n\techo \"oerr"..., 2365) = 973
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
pipe([3, 4])                            = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
lseek(255, -645, SEEK_CUR)              = 1720
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b60db427070) = 25931
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGCHLD, {0x436360, [], SA_RESTORER, 0x3e8be302d0}, {0x436360, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
close(4)                                = 0
read(3, "Facility=ora\nComponent=rdbms\n", 128) = 29
read(3, "", 128)                        = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25931
wait4(-1, 0x7fff7a3cb904, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn(0)                         = 0
close(3)                                = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "if [ -z \"$Facility\" -o -z \"$Comp"..., 2365) = 645
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
--检查oraus.msg文件是否存在并且有权限访问
stat("/u01/oracle/app/product/10.2/db_1/rdbms/mesg/oraus.msg", {st_mode=S_IFREG|0644, st_size=3789794, ...}) = 0
geteuid()                               = 1000
getegid()                               = 1000
getuid()                                = 1000
getgid()                                = 1000
--访问oraus.msg文件
access("/u01/oracle/app/product/10.2/db_1/rdbms/mesg/oraus.msg", R_OK) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
stat(".", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
--查找awk命令位置
stat("/u01/oracle/app/product/10.2/db_1/jre/1.4.2/bin/awk", 0x7fff7a3cbe80) = -1 ENOENT (No such file or directory)
stat("/u01/oracle/app/product/10.2/db_1/bin/awk", 0x7fff7a3cbe80) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/qt-3.3/bin/awk", 0x7fff7a3cbe80) = -1 ENOENT (No such file or directory)
stat("/usr/kerberos/bin/awk", 0x7fff7a3cbe80) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/awk", 0x7fff7a3cbe80) = -1 ENOENT (No such file or directory)
--找到awk命令
stat("/bin/awk", {st_mode=S_IFREG|0755, st_size=338744, ...}) = 0
access("/bin/awk", X_OK)                = 0
access("/bin/awk", R_OK)                = 0
stat("/bin/awk", {st_mode=S_IFREG|0755, st_size=338744, ...}) = 0
access("/bin/awk", X_OK)                = 0
access("/bin/awk", R_OK)                = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [INT CHLD], 8) = 0
rt_sigprocmask(SIG_SETMASK, [INT CHLD], NULL, 8) = 0
lseek(255, -8, SEEK_CUR)                = 2357
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b60db427070) = 25934
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
--通过awk命令获取oraus.msg中相关命令
wait4(-1, 10047, 00000, "trace switching of sessions"
// *Cause:
// *Action:
[{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 25934
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, 0x7fff7a3cbb54, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn(0xffffffffffffffff)        = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "\nexit 0\n", 2365)           = 8
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
exit_group(0)                           = ?

通过这里可以知道oerr其实就是通过awk对$ORACLE_HOME/rdbms/mesg/oraus.msg文件进行筛选,显示出来对应ORA-错误号的提示