2014年1月 – Database SOS

关于dul有效期描述

dul是oracle内部工具,用于在数据库不open启动下挖取数据文件内容,从而最小程度减少数据损失.主要可以恢复如下情况:
1.用于异常断电,强制关闭数据库等故障导致数据库使用隐含参数,bbed等各种非常规数据库恢复方法无法正常open的数据库恢复
2.用于恢复无truncate,drop表恢复
3.用于丢失system表空间文件数据库恢复
4.用于大量坏块数据库恢复
5.用于丢失部分非system数据文件恢复
关于dul的相关使用文档请参考:ORACLE DUL汇总,因为dul是oracle内部工具,不需要数据库open就可以获得数据,为了数据安全,dul软件增加了有效期,本文就是对于dul的有效期进行了描述
关于dul的有效期
熟悉dul的人都清楚,dul从10开始就有有效期之说,一般的有效期是1到2个月,极少数情况会到3个月的有效期.通过试验验证

--使用上一期dul,在未修改系统时间的情况下提示过期
e:\dul10>dul
Data UnLoader: 10.2.0.5.25 - Internal Only - on Sat Jan 25 16:17:41 2014
with 64-bit io functions
Copyright (c) 1994 2013 Bernard van Duijnen All rights reserved.
 Strictly Oracle Internal Use Only
Cannot start now
You need a more recent DUL version for this os
--修改系统时间,提示加载成功
e:\dul10>dul
Data UnLoader: 10.2.0.5.25 - Internal Only - on Sat Dec 21 16:21:16 2013
with 64-bit io functions
Copyright (c) 1994 2013 Bernard van Duijnen All rights reserved.
 Strictly Oracle Internal Use Only
--配置dul挖取数据文件,dul又提示和以前一样错误
e:\dul10>dul
Data UnLoader: 10.2.0.5.25 - Internal Only - on Sat Dec 21 16:19:49 2013
with 64-bit io functions
Copyright (c) 1994 2013 Bernard van Duijnen All rights reserved.
 Strictly Oracle Internal Use Only
Cannot open file now
You need a more recent DUL version for this os
--使用最新版dul,在不修改系统时间和配置挖取文件的情况下均正常使用
e:\dul10>dul
Data UnLoader: 10.2.0.5.26 - Internal Only - on Sat Jan 25 16:22:15 2014
with 64-bit io functions
Copyright (c) 1994 2014 Bernard van Duijnen All rights reserved.
 Strictly Oracle Internal Use Only
e:\dul10>dul
Data UnLoader: 10.2.0.5.26 - Internal Only - on Sat Jan 25 16:22:39 2014
with 64-bit io functions
Copyright (c) 1994 2014 Bernard van Duijnen All rights reserved.
 Strictly Oracle Internal Use Only
Found db_id = 4156511432
Found db_name = ORA10G

这里可以知道dul是通过系统时间和数据文件头时间两重保证该软件安全,不能通过修改系统时间来破解

使用DUL挖数据文件恢复非数据外对象方法

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：使用DUL挖数据文件恢复非数据外对象方法

在dul进行数据库挖掘恢复的时候，我们可以通过unload table/user等方式来恢复表数据，但是对于一些view,index,trigger,source,seq,Dblink等不能直接通过unload来实现,但是可以通过挖基表来实现相关操作,这里提供了一些处理思路,在实际操作中根据需求,分析数据字典灵活应用
一．view
导出对象

USER$
OBJ$
COL$
VIEW$

执行sql语句

Set pages 10000
Set long 1000
Spool d:\create_view.sql
select
  'CREATE OR REPLACE VIEW '||O.NAME||' ('||
   replace(c.cols,',',','||chr(10))||')'||CHR(10)||
  'as'||chr(10), v.text
from
user$ u, obj$ o, view$ v,
( SELECT COL.OBJ#, COL.COLS
  FROM
  (SELECT
    OBJ#, COL#, substr(SYS_CONNECT_BY_PATH(NAME,','),2) COLS
  FROM COL$
  WHERE COL# > 0
  START WITH COL# = 1
  CONNECT BY PRIOR OBJ# = OBJ# AND PRIOR COL# = COL# - 1 ) COL,
  (SELECT OBJ#, COUNT(*) COLCNT FROM COL$
  WHERE COL# > 0 GROUP BY OBJ#) CN
  WHERE COL.OBJ# = CN.OBJ# AND COL.COL# = CN.COLCNT
) C
where u.user#=o.owner# and o.obj# = c.obj#
  and v.obj# = o.obj# and u.name=upper('&username');

说明
1）分布执行，不能放置一个脚本文件中执行
2）每条as后面的select语句可能需要重新格式化
3） Create view 语句最后需要增加”;”

二．source
导出对象

USER$
SOURCE$
OBJ$

执行sql语句

Set pages 10000
Set long 1000
set linse 1000
Spool d:\create_source.sql
SELECT DECODE(S.LINE,1,'CREATE OR REPLACE ','')||SOURCE SOURCE
FROM
  USER$ U, OBJ$  O, SOURCE$ S
WHERE
  U.USER# = O.OWNER# AND
  O.OBJ# = S.OBJ# AND
  U.NAME = UPPER('&username')
  -- AND O.NAME = UPPER('&SOURCE_NAME')
ORDER BY S.OBJ#, S.LINE;

说明
1）注意SOURCE中的用户名,如果导入不是相同用户,需要修改该脚本用户名
2）修改完用户名后,直接执行生成脚本即可

三．Index
导出对象

USER$
OBJ$
COL$
IND$
ICOL$

执行sql

Set pages 10000
Set long 1000
set linse 1000
Spool d:\create_index.sql
SELECT
  'CREATE '||decode(bitand(IDX.property, 1), 1, 'UNIQUE', '')||
  ' INDEX '||I.NAME||' ON '||T.NAME||'('||IDX.PATH||');' INDEX_DDL
FROM
  USER$ U, OBJ$  T, OBJ$ I,
  (
    select I.PROPERTY, I.BO#, I.OBJ#, C.POS#,
            SUBSTR(sys_connect_by_path(CN.NAME,','),2) path
    from IND$ I, ICOL$ C, COL$ CN
    WHERE I.OBJ# = C.OBJ# AND I.BO# = C.BO#
      AND I.BO# = CN.OBJ# AND C.COL# = CN.INTCOL#
    start with C.POS#=1
    connect by PRIOR I.OBJ# = I.OBJ#
            AND prior C.POS# = C.POS# - 1 ) IDX,
  (SELECT I.BO#, I.OBJ#, COUNT(*) COLCNT
    FROM ICOL$ I GROUP BY I.BO#, I.OBJ#) IDXC
WHERE
  U.USER# = T.OWNER# AND
  IDX.BO# = T.OBJ# AND
  IDX.OBJ# = I.OBJ# AND
  IDX.BO# =  IDXC.BO# AND
  IDX.OBJ# = IDXC.OBJ# AND
  IDX.POS# = IDXC.COLCNT AND
  U.NAME = upper('&username')
ORDER BY T.NAME, I.NAME;

说明
1）因为SYS_CONNECT_BY_PATH所以需要10g及其以上版本
2） SQL中没有分区唯一性索引
3）注意检查sql是否因为行长度不够导致异常

四．Sequence
导出对象

USER$
OBJ$
SEQ$

执行sql语句

Set pages 10000
Set long 1000
set linse 1000
Spool d:\create_sequence.sql
SELECT
  'CREATE SEQUENCE '|| SEQ_NAME ||
  ' MINVALUE '||minval ||
  ' MAXVALUE '||MAXVAL ||
  ' START WITH '||LASTVAL ||
  ' ' || CYC || ' ' || ORD ||
  DECODE(SIGN(CACHE), 1,' CACHE '|| CACHE, 'NOCACHE') ||
  ';' SEQ_DDL
from
  (select u.name OWNER, o.name SEQ_NAME,
      s.minvalue MINVAL, s.maxvalue MAXVAL,
      s.increment$ INC,
      decode (s.cycle#, 0, 'NOCYCLE', 1, 'CYCLE ') CYC,
      decode (s.order$, 0, 'NOORDER', 1, 'ORDER') ORD,
      s.cache, s.highwater LASTVAL
  from seq$ s, obj$ o, user$ u
  where u.user# = o.owner#
    and o.obj# = s.obj#
and u.name=upper('&username'));

五．TRIGGER
导出对象

OBJ$
USER$
TRIGGER$

执行sql语句

Set pages 10000
Set long 1000
set linse 1000
Spool d:\create_trigger.sql
select
   'CREATE OR REPLACE TRIGGER '|| trigger_name || chr(10)||
   decode( substr( trigger_type, 1, 1 ),
           'A', 'AFTER ', 'B', 'BEFORE ', 'I',
           'INSTEAD OF ' ) ||
   triggering_event || ' ON ' || table_owner || '.' ||
   table_name || chr(10) || REF_CLAUSE || chr(10) ||
   decode( instr( trigger_type, 'EACH ROW' ), 0, null,
           'FOR EACH ROW' ), trigger_body
from  (
   select trigusr.name owner, trigobj.name trigger_name,
      decode(t.type#, 0, 'BEFORE STATEMENT',
           1, 'BEFORE EACH ROW',   2, 'AFTER STATEMENT',
           3, 'AFTER EACH ROW',    4, 'INSTEAD OF',
           'UNDEFINED') trigger_type,
   decode(t.insert$*100 + t.update$*10 + t.delete$,
           100, 'INSERT', 010, 'UPDATE', 001, 'DELETE',
           110, 'INSERT OR UPDATE', 101, 'INSERT OR DELETE',
           011, 'UPDATE OR DELETE',
           111, 'INSERT OR UPDATE OR DELETE',
           'ERROR') triggering_event,
   tabusr.name table_owner, tabobj.name table_name,
   'REFERENCING NEW AS '||t.refnewname||' OLD AS '||t.refoldname REF_CLAUSE,
   t.whenclause,decode(t.enabled, 0, 'DISABLED', 1, 'ENABLED', 'ERROR') STATUS,
   t.definition , t.action# trigger_body
   from obj$ trigobj, obj$ tabobj, trigger$ t,
        user$ tabusr, user$ trigusr
   where (trigobj.obj#   = t.obj# and
       tabobj.obj#    = t.baseobject and
       tabobj.owner#  = tabusr.user# and
       trigobj.owner# = trigusr.user# and
       bitand(t.property, 63)     < 8 ))
where table_owner=upper('&username')
order by owner, trigger_name;

六. Dblink
导出对象

Sys.link$
sys.user$

执行查询sql

SELECT 'CREATE '||DECODE(U.NAME,'PUBLIC','public ')||'DATABASE LINK '||CHR(10)
||DECODE(U.NAME,'PUBLIC',Null, 'SYS','',U.NAME||'.')|| L.NAME||chr(10)
||'CONNECT TO ' || L.USERID || ' IDENTIFIED BY "'||L.PASSWORD||'" USING
'''||L.HOST||''''
||chr(10)||';' TEXT
FROM SYS.LINK$ L, SYS.USER$ U
WHERE L.OWNER# = U.USER#;

11.2.0.3 adg库因 bug 16427872 导致smon占用大量cpu

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：11.2.0.3 adg库因 bug 16427872 导致smon占用大量cpu

检查数据库发现客户有一套核心的ADG库smon进程负载异常,单进程一直持有cpu 100%

[oracle@q9adg01 trace]$ top -c
top - 14:00:14 up 83 days, 21:39,  4 users,  load average: 10.34, 11.55, 11.25
Tasks: 1162 total,   3 running, 1157 sleeping,   0 stopped,   2 zombie
Cpu(s):  1.7%us,  1.2%sy,  0.0%ni, 86.2%id, 10.7%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  264253752k total, 200445076k used, 63808676k free,   757684k buffers
Swap: 33554424k total,        0k used, 33554424k free,  6529220k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5707 oracle    25   0  150g  20m  16m R 99.9  0.0  14273:00 ora_smon_q9db1
 5285 oracle    16   0 13564 1952  820 R 31.5  0.0   0:02.49 top -c
 5713 oracle    18   0  150g  20m  17m S  5.3  0.0 410:01.33 ora_asmb_q9db1
 5821 oracle    15   0  150g  23m  17m S  5.3  0.0   4883:29 ora_lck0_q9db1
 7596 oracle    15   0  150g  69m  37m S  5.3  0.0   5368:28 ora_pr00_q9db1
[oracle@q9adg02 ~]$ top -c
top - 14:00:03 up 84 days, 19:36,  3 users,  load average: 6.46, 6.96, 6.76
Tasks: 1045 total,   5 running, 1040 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.8%us,  1.0%sy,  0.0%ni, 93.4%id,  3.7%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  264253752k total, 196879216k used, 67374536k free,   425320k buffers
Swap: 33554424k total,        0k used, 33554424k free,  4727836k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
11615 oracle    25   0  150g  16m  14m R 100.0  0.0  14272:55 ora_smon_q9db2
18173 oracle    16   0  150g  73m  37m D 18.6  0.0  24:33.91 oracleq9db2 (LOCAL=NO)
 6561 oracle    15   0  150g  31m  25m R 12.2  0.0   0:48.50 oracleq9db2 (LOCAL=NO)

数据库版本和patch信息

14:18:05 sys@Q9DB>select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL>  SELECT INST_ID,DATABASE_ROLE,OPEN_MODE FROM GV$DATABASE;
   INST_ID DATABASE_ROLE    OPEN_MODE
---------- ---------------- --------------------
         2 PHYSICAL STANDBY READ ONLY WITH APPLY
         1 PHYSICAL STANDBY READ ONLY WITH APPLY
SQL >select inst_id,STARTUP_TIME from gv$instance;
   INST_ID STARTUP_T
---------- ---------
         2 01-NOV-13
         1 01-NOV-13
[oracle@q9adg01 trace]$ /u01/app/oracle/product/11.2.0/db_1/OPatch/opatch  lspatches
16056266;Database Patch Set Update : 11.2.0.3.6 (16056266)
16315641;Grid Infrastructure Patch Set Update : 11.2.0.3.6 (16083653)

SYSAUX表空间增加数据文件

SQL> select ts# from v$tablespace where name='SYSAUX';
       TS#
----------
         2
SQL> select file#,name,creation_time from v$datafile where ts#=2;
     FILE# NAME                                               CREATION_
---------- -------------------------------------------------- ---------
         3 +DATA/q9db/datafile/sysaux.1412.818566605          12-MAR-08
       151 +DATA/q9db/datafile/sysaux.1431.818566885          26-MAR-12
       221 +DATA/q9db/datafile/sysaux.828.818547945           16-APR-12
      1744 +DATA/q9db_adg/datafile/sysaux.2050.835459505      29-DEC-13

核对数据库确实在2013年12月29日对SYSAUX表空间增加了数据文件而且未重启数据库,触发Bug 16427872 Standby SMON spins on CPU after add/drop SYSAUX datafile on primary
bug-16427872
Bug 16427872 Standby SMON spins on CPU after add/drop SYSAUX datafile on primary
在12.1.0.1中修复,在未修复前增加/删除sysaux的数据文件后,通过重启实例来解决该问题

multipath实现设备用户组设置

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：multipath实现设备用户组设置

现在的Linux系统中,很多都会使用系统自带的multipath多路径软件,在以前的版本中,我们一般通过multipath+udev或者multipath+rc.local来实现多路径和权限设置,而在redhat 5.3-5.11的版本中multipath就直接可以实现多路径聚合、设备持久化、用户组设置
操作系统版本

[root@rac1 dev]# uname -r
2.6.39-300.26.1.el5uek
[root@rac1 dev]# more /etc/issue
Oracle Linux Server release 5.9
Kernel \r on an \m

fdisk记录

[root@rac1 dev]# fdisk -l
…………
Disk /dev/sdh: 134.2 GB, 134217728000 bytes
255 heads, 63 sectors/track, 16317 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
Disk /dev/sdi: 33.5 GB, 33554432000 bytes
64 heads, 32 sectors/track, 32000 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
   Device Boot      Start         End      Blocks   Id  System

multipath包
检查安装multipath相关包(该版本系统默认安装)

[root@rac1 dev]# rpm -aq|grep mapper
device-mapper-multipath-libs-0.4.9-56.0.3.el5
device-mapper-event-1.02.67-2.el5
device-mapper-1.02.67-2.el5
device-mapper-multipath-0.4.9-56.0.3.el5

获取wwid值

[root@rac1 dev]# /sbin/scsi_id -g -u -s /block/sdh
14f504e46494c45527049754962662d395751372d68356743
[root@rac1 dev]# /sbin/scsi_id -g -u -s /block/sdi
14f504e46494c4552484d486249782d464471382d354f4b58

获取uid和gid

[root@rac1 dev]# id grid
uid=1100(grid) gid=54321(oinstall) groups=54321(oinstall),1020(asmadmin),1021(asmdba)

multipath.conf配置

[root@rac1 dev]# vi /etc/multipath.conf
defaults {
user_friendly_names no
queue_without_daemon no
flush_on_last_del yes
max_fds max
}
blacklist {
devnode "^hd[a-z]"
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^cciss.*"
}
devices {
        device {
                vendor                  "OPNFILER "
                product                 "LUN"
                path_grouping_policy    group_by_prio
                features             "3 queue_if_no_path pg_init_retries 50"
                getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
                path_checker            tur
                path_selector           "round-robin 0"
                hardware_handler        "1 alua"
                failback               immediate
                rr_weight              uniform
                rr_min_io             128
        }
}
multipaths {
        multipath {
                wwid                    14f504e46494c45527049754962662d395751372d68356743  #wwid
                alias                   xifenfei128
                uid                     1100                                               #uid
                gid                     1020                                               #gid
        }
        multipath {
                wwid                    14f504e46494c4552484d486249782d464471382d354f4b58   #wwid
                alias                   xifenfei32
                uid                     1100                                                #uid
                gid                     1020                                                #gid
         }
}

启动multipath

[root@rac1 dev]# modprobe dm-multipath
[root@rac1 dev]# modprobe dm-round-robin
[root@rac1 dev]# chkconfig multipathd on
[root@rac1 dev]# service multipathd start
Starting multipathd daemon: [  OK  ]
[root@rac1 dev]# multipath -F
[root@rac1 dev]# multipath -v2
create: xifenfei128 (14f504e46494c45527049754962662d395751372d68356743) undef OPNFILER,VIRTUAL-DISK
size=125G features='0' hwhandler='0' wp=undef
`-+- policy='round-robin 0' prio=1 status=undef
  `- 3:0:0:9  sdh 8:112 undef ready  running
create: xifenfei32 (14f504e46494c4552484d486249782d464471382d354f4b58) undef OPNFILER,VIRTUAL-DISK
size=31G features='0' hwhandler='0' wp=undef
`-+- policy='round-robin 0' prio=1 status=undef
  `- 3:0:0:10 sdi 8:128 undef ready  running

查看生成多路径设备
注意设备名称、组、用户

[root@rac1 dev]# ls -l /dev/mapper/xifenfei*
brw-rw---- 1 grid asmadmin 252, 2 Jan  7 21:21 /dev/mapper/xifenfei128
brw-rw---- 1 grid asmadmin 252, 3 Jan  7 21:21 /dev/mapper/xifenfei32

补充Linux 6.x中udev设置所属组和权限
对于linux 6.x,multipath不能设置磁盘所属组和权限,可以通过udev进行实现,类似配置如下

[root@bxrac03 mapper]#cat 99-diskownership.rules
SUBSYSTEM!="block", GOTO="quickexit"
KERNEL!="dm-*", GOTO="quickexit"
PROGRAM=="/sbin/dmsetup info -c --noheadings -o name -m %m -j %M"
RESULT=="*ocr*", OWNER="grid", GROUP="oinstall", MODE="0660"
RESULT=="*oradata", OWNER="grid", GROUP="oinstall", MODE="0660"
RESULT=="*backup", OWNER="grid", GROUP="oinstall", MODE="0660"
LABEL="quickexit"

其中RESULT和dm的别名向匹配

数据库恢复检查脚本(Oracle Database Recovery Check)

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：数据库恢复检查脚本(Oracle Database Recovery Check)

Oracle Database Recovery Check 介绍
根据多年来的数据库恢复经验,提炼出来数据库恢复关键点信息收集脚本(Oracle Database Recovery Check)，该脚本主要是在数据库mount状态情况下查询数据库的一些基础表信息等信息，不对数据库进行任何写操作(只做读和dump操作)，不会在坏的数据库基础之上带来任何破坏，不影响任何数据库后续的恢复工作。通过该脚本收集信息能够快速定位数据库异常原因，并初步判断数据库恢复疑难程度，减少数据库异常恢复诊断时间，提供恢复效率和准确性。

Oracle Database Recovery Check下载
Oracle Database Recovery Check FOR ORACE 11G及其以前版本下载
Oracle Database Recovery Check For Linux/Uinx
Oracle Database Recovery Check For Windows
Oracle Database Recovery Check For ALL
Oracle Database Recovery Check FOR ORACE 12C下载
Oracle Database Recovery Check For Linux/Uinx
Oracle Database Recovery Check For Windows
Oracle Database Recovery Check For ALL

Oracle Database Recovery Check使用说明
1. 根据系统平台下载对应的版本,相应平台,使用unzip/winrar软件解压成sql文件
2. 上传到服务器之上(oracle用户需要有读取权限)
3. 启动数据库到mount状态
4. 在sqlplus中使用@path/check_recover_db.sql文件
5. 发送xifenfei_db_recover_YYYYMMDD.html到dba@xifenfei.com
6. 联系我:手机(17813235971)或者qq(107644445)

操作演示(所有平台,版本操作相同)
unix/liunx平台使用方法(for 12c以前版本,12c版本操作步骤完全相同)

[oracle@xifenfei ~]$ ls -l /home/oracle/check_recover_db_linux.sql
-rw-r--r-- 1 oracle oinstall 13008 Mar 30 10:36 check_recover_db_linux.sql
[oracle@xifenfei ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Sun Mar 30 10:55:58 2014
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup mount
ORACLE instance started.
Total System Global Area  418484224 bytes
Fixed Size                  1385052 bytes
Variable Size             331353508 bytes
Database Buffers           79691776 bytes
Redo Buffers                6053888 bytes
Database mounted.
SQL> @/home/oracle/check_recover_db_linux.sql
+----------------------------------------------------------------------------+
|                   Oracle Database Recovery Check Result                    |
|----------------------------------------------------------------------------+
|  Copyright (c) 2012-2014 xifenfei. All rights reserved. (www.xifenfei.com) |
+----------------------------------------------------------------------------+
Please start the database to mount state.
Note: Do not modify any inspection results
Please refer to the use of the script:http://www.xifenfei.com/5056.html
To send xifenfei_db_recover_YYYYMMDD.html to dba@xifenfei.com or QQ(107644445)
-----Oracle Database Recovery Check STRAT-----
----Starting Collect Data Dictionary Information----
----Starting Collect PATCH Information----
----Starting Collect ALERT LOG Information----
----Starting DUMP file_hdrs Information----
----Starting DUMP controlf Information----
----Starting DUMP redohdr Information----
-----Oracle Database Recovery Check END-----
********************************************************************************
Please check and upload /home/oracle/xifenfei_db_recover_20140330.html
********************************************************************************
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
生成的html文件为:/home/oracle/xifenfei_db_recover_20140330.html

win平台使用方法(for 12c版本,12c以前版本操作步骤完全相同)

C:\Users\XIFENFEI>dir E:\SkyDrive\ORACLE\tools\db_recover\check_recover_db_win_12c.sql
 驱动器 E 中的卷是 本地磁盘
 卷的序列号是 000C-3B41
 E:\SkyDrive\ORACLE\tools\db_recover 的目录
2014-03-29  23:43            11,101 check_recover_db_win_12c.sql
               1 个文件         11,101 字节
               0 个目录 19,557,310,464 可用字节
C:\Users\XIFENFEI>sqlplus / as sysdba
SQL*Plus: Release 12.1.0.1.0 Production on 星期日 3月 30 11:01:37 2014
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
已连接到空闲例程。
SQL> startup mount
ORACLE 例程已经启动。
Total System Global Area  400846848 bytes
Fixed Size                  2440024 bytes
Variable Size             289408168 bytes
Database Buffers          100663296 bytes
Redo Buffers                8335360 bytes
SQL> @E:\SkyDrive\ORACLE\tools\db_recover\check_recover_db_win_12c.sql
+----------------------------------------------------------------------------+
|                   Oracle Database Recovery Check Result                    |
|----------------------------------------------------------------------------+
|  Copyright (c) 2012-2014 xifenfei. All rights reserved. (www.xifenfei.com) |
+----------------------------------------------------------------------------+
Please start the database to mount state.
Note: Do not modify any inspection results
Please refer to the use of the script:http://www.xifenfei.com/5056.html
To send xifenfei_db_recover_YYYYMMDD.html to dba@xifenfei.com or QQ(107644445)
 驱动器 C 中的卷没有标签。
 卷的序列号是 D053-8FE1
 C:\Users\XIFENFEI 的目录
2014-03-30  11:02            32,796 xifenfei_db_recover_20140330.html
               1 个文件         32,796 字节
               0 个目录  8,444,190,720 可用字节
********************************************************************************
Please check and upload "xifenfei_db_recover_YYYYMMDD.html" in current directory
********************************************************************************
从 Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options 断开
生成的html文件为C:\Users\XIFENFEI\xifenfei_db_recover_20140330.html

因asm sga_target设置不当导致11gr2 rac无法正常启动

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：因asm sga_target设置不当导致11gr2 rac无法正常启动

2014年第一个故障排查和解决：同事反馈给我说solaris 11.2 两节点rac无法启动，让我帮忙看下。通过分析是因为sga_target参数设置不合理导致asm无法正常启动
GI无法正常启动

grid@zwq-rpt1:~$crsctl status resource -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
grid@zwq-rpt1:~$crsctl status resource -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                               Instance Shutdown
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       zwq-rpt1
ora.crf
      1        ONLINE  ONLINE       zwq-rpt1
ora.crsd
      1        ONLINE  OFFLINE
ora.cssd
      1        ONLINE  ONLINE       zwq-rpt1
ora.cssdmonitor
      1        ONLINE  ONLINE       zwq-rpt1
ora.ctssd
      1        ONLINE  ONLINE       zwq-rpt1                 ACTIVE:0
ora.diskmon
      1        OFFLINE OFFLINE
ora.evmd
      1        ONLINE  INTERMEDIATE zwq-rpt1
ora.gipcd
      1        ONLINE  ONLINE       zwq-rpt1
ora.gpnpd
      1        ONLINE  ONLINE       zwq-rpt1
ora.mdnsd
      1        ONLINE  ONLINE       zwq-rpt1

asm未正常启动

GI日志报错

2014-01-01 00:40:47.708
[cssd(1418)]CRS-1605:CSSD voting file is online: /dev/rdsk/emcpower0a; details in /export/home/app/grid/log/zwq-rpt1/cssd/ocssd.log.
2014-01-01 00:40:53.234
[cssd(1418)]CRS-1601:CSSD Reconfiguration complete. Active nodes are zwq-rpt1 zwq-rpt2 .
2014-01-01 00:40:56.659
[ctssd(1483)]CRS-2407:The new Cluster Time Synchronization Service reference node is host zwq-rpt2.
2014-01-01 00:40:56.661
[ctssd(1483)]CRS-2401:The Cluster Time Synchronization Service started on host zwq-rpt1.
2014-01-01 00:41:02.016
[ctssd(1483)]CRS-2408:The clock on host zwq-rpt1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
2014-01-01 00:43:23.874
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".
2014-01-01 00:45:42.837
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".
2014-01-01 00:48:02.087
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".
2014-01-01 00:48:18.836
[ohasd(1083)]CRS-2807:Resource 'ora.asm' failed to start automatically.
2014-01-01 00:48:18.837
[ohasd(1083)]CRS-2807:Resource 'ora.crsd' failed to start automatically.
2014-01-01 01:05:15.396
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [CRSDG], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".
2014-01-01 01:05:45.101
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [CRSDG], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".
2014-01-01 01:06:15.104
[/export/home/app/grid/bin/oraagent.bin(1348)]CRS-5019:All OCR locations are on ASM disk groups [CRSDG], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/export/home/app/grid/log/zwq-rpt1/agent/ohasd/oraagent_grid/oraagent_grid.log".

这里较为明显的看到，因为asm磁盘组异常导致ocr无法被访问导致crs无法正常启动

ORAAGENT日志

2014-01-01 00:43:23.870: [ora.asm][9] {0:0:2} [start] InstConnection::connectInt (2) Exception OCIException
2014-01-01 00:43:23.870: [ora.asm][9] {0:0:2} [start] InstConnection:connect:excp OCIException OCI error 604
2014-01-01 00:43:23.870: [ora.asm][9] {0:0:2} [start] DgpAgent::queryDgStatus excp ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^34f764db","kglHeapInitialize:temp")

报了较为清晰的ORA-4031错误,检查asm日志

ASM日志报错

Wed Jan 01 00:47:33 2014
ORACLE_BASE not set in environment. It is recommended
that ORACLE_BASE be set in the environment
Reusing ORACLE_BASE from an earlier startup = /export/home/app/oracle
Wed Jan 01 00:47:39 2014
Errors in file /export/home/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_1728.trc  (incident=291447):
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^34f764db","kglHeapInitialize:temp")
Incident details in: /export/home/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_291447/+ASM1_ora_1728_i291447.trc
Wed Jan 01 00:47:48 2014
Dumping diagnostic data in directory=[cdmp_20140101004748], requested by (instance=1, osid=1728), summary=[incident=291447].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Jan 01 00:47:53 2014
Errors in file /export/home/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_1730.trc  (incident=291448):
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^34f764db","kglHeapInitialize:temp")
Incident details in: /export/home/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_291448/+ASM1_ora_1730_i291448.trc
Wed Jan 01 00:48:01 2014
Dumping diagnostic data in directory=[cdmp_20140101004801], requested by (instance=1, osid=1730), summary=[incident=291448].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Jan 01 00:48:07 2014
Errors in file /export/home/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_1732.trc  (incident=291449):
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^34f764db","kglHeapInitialize:temp")
Incident details in: /export/home/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_291449/+ASM1_ora_1732_i291449.trc
Wed Jan 01 00:48:16 2014
Dumping diagnostic data in directory=[cdmp_20140101004816], requested by (instance=1, osid=1732), summary=[incident=291449].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Jan 01 00:48:16 2014
License high water mark = 1
USER (ospid: 1736): terminating the instance
Instance terminated by USER, pid = 1736

这里可以清晰的看到，因为shared pool不足，导致asm报ora-4031错误，从而使得asm无法正常启动

分析原因

Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options.
ORACLE_HOME = /export/home/app/grid
System name:	SunOS
Node name:	zwq-rpt1
Release:	5.11
Version:	11.1
Machine:	sun4v
Using parameter settings in server-side spfile +CRSDG/zwq-rpt-cluster/asmparameterfile/registry.253.823992831
System parameters with non-default values:
  sga_max_size             = 2G
  large_pool_size          = 16M
  instance_type            = "asm"
  sga_target               = 0
  remote_login_passwordfile= "EXCLUSIVE"
  asm_diskstring           = "/dev/rdsk/*"
  asm_diskgroups           = "FRADG"
  asm_diskgroups           = "DATADG"
  asm_power_limit          = 1
  diagnostic_dest          = "/export/home/app/oracle"

这里可以看到sga_target被设置为了0,而shared pool又未被配置，这里因为shared pool不足从而出现了ORA-4031,从而导致crs在启动asm的过程失败,从而使得ocr不能被访问,进而使得crs不能正常启动.

处理方法
1.编辑pfile

grid@zwq-rpt1:/export/home/app/oracle/diag/asm/+asm/+ASM1/trace$vi /tmp/asm.pfile
  memory_target = 2G
  large_pool_size          = 16M
  instance_type            = "asm"
  sga_target               = 0
  remote_login_passwordfile= "EXCLUSIVE"
  asm_diskstring           = "/dev/rdsk/*"
  asm_diskgroups           = "FRADG"
  asm_diskgroups           = "DATADG"
  asm_power_limit          = 1
  diagnostic_dest          = "/export/home/app/oracle"

2.启动asm

grid@zwq-rpt1:/export/home/app/oracle/diag/asm/+asm/+ASM1/trace$sqlplus / as sysasm
SQL*Plus: Release 11.2.0.3.0 Production on Wed Jan 1 01:04:10 2014
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup pfile='/tmp/asm.pfile'
ASM instance started
Total System Global Area 2138521600 bytes
Fixed Size                  2161024 bytes
Variable Size            2102806144 bytes
ASM Cache                  33554432 bytes
ASM diskgroups mounted

3. 创建spfile

SQL> create spfile='+CRSDG' FROM PFILE='/tmp/asm.pfile';
File created.
--asm alert日志
Wed Jan 01 01:08:59 2014
NOTE: updated gpnp profile ASM SPFILE to
NOTE: updated gpnp profile ASM diskstring: /dev/rdsk/*
NOTE: updated gpnp profile ASM diskstring: /dev/rdsk/*
NOTE: updated gpnp profile ASM SPFILE to +CRSDG/zwq-rpt-cluster/asmparameterfile/registry.253.835664939

4. 关闭asm

SQL> shutdown immediate
ORA-15097: cannot SHUTDOWN ASM instance with connected client (process 1971)
SQL> shutdown abort
ASM instance shutdown

5. 重启crs

root@zwq-rpt1:~# crsctl stop crs -f
root@zwq-rpt1:~# crsctl start crs

6. 重启其他节点crs

root@zwq-rpt2:~# crsctl stop crs -f
root@zwq-rpt2:~# crsctl start crs

7. 检查结果

root@zwq-rpt1:~# crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRSDG.dg
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
ora.DATADG.dg
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
ora.FRADG.dg
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
ora.LISTENER.lsnr
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
ora.asm
               ONLINE  ONLINE       zwq-rpt1                 Started
               ONLINE  ONLINE       zwq-rpt2                 Started
ora.gsd
               OFFLINE OFFLINE      zwq-rpt1
               OFFLINE OFFLINE      zwq-rpt2
ora.net1.network
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
ora.ons
               ONLINE  ONLINE       zwq-rpt1
               ONLINE  ONLINE       zwq-rpt2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       zwq-rpt1
ora.cvu
      1        ONLINE  ONLINE       zwq-rpt1
ora.oc4j
      1        ONLINE  ONLINE       zwq-rpt1
ora.rptdb.db
      1        ONLINE  ONLINE       zwq-rpt1                 Open
      2        ONLINE  ONLINE       zwq-rpt2                 Open
ora.scan1.vip
      1        ONLINE  ONLINE       zwq-rpt1
ora.zwq-rpt1.vip
      1        ONLINE  ONLINE       zwq-rpt1
ora.zwq-rpt2.vip
      1        ONLINE  ONLINE       zwq-rpt2

至此恢复正常,2014年第一个故障顺利解决

月度归档：1月 2014