Disable Transparent HugePages

从redhat 6开始引入了Transparent HugePages,但是oracle一直建议disable 它,而使用标准HugePages方式.对于6和7的禁用方式有一些区别.
linux 6
修改/etc/grub.conf之后重启系统生效

vi /etc/grub.conf
title Oracle Linux Server (2.6.32-300.25.1.el6uek.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-300.25.1.el6uek.x86_64 ro root=LABEL=/  transparent_hugepage=never
        initrd /initramfs-2.6.32-300.25.1.el6uek.x86_64.img

linux 7
修改/etc/default/grub然后执行grub2-mkconfig并重启系统生效

[root@xifenfei u01]# vi /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="nomodeset vconsole.font=latarcyrheb-sun16 vconsole.keymap=us crashkernel=auto
                    biosdevname=0 transparent_hugepage=never"
GRUB_DISABLE_RECOVERY="true"
~
[root@xifenfei u01]# grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-3.10.0-514.26.2.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-514.26.2.el7.x86_64.img
Found linux image: /boot/vmlinuz-3.10.0-123.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-123.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-cb0b6b4de89a4fe4acfc8774c2f01486
Found initrd image: /boot/initramfs-0-rescue-cb0b6b4de89a4fe4acfc8774c2f01486.img
done

临时禁用
该方法对于linux 6和7均有效,不用重启系统

if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
   echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
   echo never > /sys/kernel/mm/transparent_hugepage/defrag
fi

警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703

继续上篇的tab$被清空(ORA-600 16703故障解析—tab$表被清空),导致数据库启动异常的case
ORA-600 16703报错
ora-600-16703


数据库日志分析
数据库open成功同时报ORA-7445 kokeicbegin和ORA-600 kzrini:!uprofile错误
ora-600-kzrini-uprofile


再次启动数据库直接报ORA-600 16703错误
ora-600-16703


这里有一个其他现象,就是数据库在open成功的同时(同一秒内),开始报异常.重启之后直接报
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
根据10046分析结果

=====================
select rowcnt,blkcnt,empcnt,avgspc,chncnt,avgrln,nvl(degree,1), nvl(instances,1) from tab$ where obj# = :1
END OF STMT
PARSE #140048443935120:c=0,e=390,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905161433
=====================
select blevel, leafcnt, distkey, lblkkey, dblkkey, clufac,        nvl(degree,1), nvl(instances,1) from ind$ where bo# = :1 and obj# = :2
END OF STMT
PARSE #140048443934176:c=1000,e=601,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162088
=====================
PARSING IN CURSOR #140048443933232 len=70 dep=1 uid=0 oct=3 lid=0 tim=1499185905162444 hv=3377894161 ad='84f13d70' sqlid='32d4jrb4pd4sj'
select charsetid, charsetform from col$  where obj# = :1 and col# = :2
END OF STMT
PARSE #140048443933232:c=0,e=294,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162443
=====================
PARSING IN CURSOR #140048443932288 len=52 dep=1 uid=0 oct=3 lid=0 tim=1499185905247020 hv=429618617 ad='84f0f2d0' sqlid='4krwuz0ctqxdt'
select ctime, mtime, stime from obj$ where obj# = :1
END OF STMT
PARSE #140048443932288:c=0,e=549,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905247019
BINDS #140048443932288:
select blevel, leafcnt, distkey, lblkkey, dblkkey, clufac,        nvl(degree,1), nvl(instances,1) from ind$ where bo# = :1 and obj# = :2
END OF STMT
PARSE #140048443934176:c=1000,e=601,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162088
=====================
PARSING IN CURSOR #140048443933232 len=70 dep=1 uid=0 oct=3 lid=0 tim=1499185905162444 hv=3377894161 ad='84f13d70' sqlid='32d4jrb4pd4sj'
select charsetid, charsetform from col$  where obj# = :1 and col# = :2
END OF STMT
PARSE #140048443933232:c=0,e=294,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162443
=====================
PARSING IN CURSOR #140048443932288 len=52 dep=1 uid=0 oct=3 lid=0 tim=1499185905247020 hv=429618617 ad='84f0f2d0' sqlid='4krwuz0ctqxdt'
select ctime, mtime, stime from obj$ where obj# = :1
END OF STMT
PARSE #140048443932288:c=0,e=549,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905247019
BINDS #140048443932288:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f5f91b87bd0  bln=22  avl=02  flg=05
  value=20
EXEC #140048443932288:c=2000,e=2686,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=1218588913,tim=1499185905249810
WAIT #140048443932288: nam='db file sequential read' ela= 6205 file#=1 block#=337 blocks=1 obj#=36 tim=1499185905256132
WAIT #140048443932288: nam='db file sequential read' ela= 3739 file#=1 block#=338 blocks=1 obj#=36 tim=1499185905266704
WAIT #140048443932288: nam='db file sequential read' ela= 4966 file#=1 block#=241 blocks=1 obj#=18 tim=1499185905271761
FETCH #140048443932288:c=0,e=21976,p=3,cr=3,cu=0,mis=0,r=1,dep=1,og=4,plh=1218588913,tim=1499185905271820
STAT #140048443932288 id=1 cnt=1 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ$ (cr=3 pr=3 pw=0 time=21993 us)'
STAT #140048443932288 id=2 cnt=1 pid=1 pos=1 obj=36 op='INDEX RANGE SCAN I_OBJ1 (cr=2 pr=2 pw=0 time=16923 us)'
CLOSE #140048443932288:c=0,e=63,dep=1,type=0,tim=1499185905271941
BINDS #140048443935120:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f5f91c07de8  bln=22  avl=02  flg=05
  value=20
EXEC #140048443935120:c=1000,e=795,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=2970138452,tim=1499185905272802
WAIT #140048443935120: nam='db file sequential read' ela= 3197 file#=1 block#=169 blocks=1 obj#=3 tim=1499185905276069
WAIT #140048443935120: nam='db file sequential read' ela= 3459 file#=1 block#=170 blocks=1 obj#=3 tim=1499185905404084
WAIT #140048443935120: nam='db file sequential read' ela= 6358 file#=1 block#=145 blocks=1 obj#=4 tim=1499185905410548
FETCH #140048443935120:c=999,e=137805,p=3,cr=3,cu=0,mis=0,r=0,dep=1,og=4,plh=2970138452,tim=1499185905410635
STAT #140048443935120 id=1 cnt=0 pid=0 pos=1 obj=4 op='TABLE ACCESS CLUSTER TAB$ (cr=3 pr=3 pw=0 time=137810 us)'
STAT #140048443935120 id=2 cnt=1 pid=1 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=2 pw=0 time=131330 us)'
*** 2017-07-05 00:31:46.094
Incident 176395 created, dump file: /oracle/diag/rdbms/orcl/orcl2/incident/incdir_176395/orcl_ora_51261_i176395.trc
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []

以及以往恢复经验和mos,基本上可以确定是由于tab$和obj$记录不匹配导致该问题.而且是obj#=20的记录在tab$和obj$中不匹配.

分析tab$和obj$记录

Data UnLoader: 11.2.0.1.5 - Internal Only - on Wed Jul 05 01:28:53 2017
with 64-bit io functions and the decompression option
 
Copyright (c) 1994 2017 Bernard van Duijnen All rights reserved.
 
 Strictly Oracle Internal Use Only
 
 
Found db_id = 1334610369
Found db_name = orcl
DUL> unload table TAB$( OBJ# number, DATAOBJ# number,
  2      TS# number, FILE# number, BLOCK# number,
  3      BOBJ# number, TAB# number, COLS number, CLUCOLS number,
  4      PCTFREE$ ignore, PCTUSED$ ignore, INITRANS ignore, MAXTRANS ignore,
  5      FLAGS ignore, AUDIT$ ignore, ROWCNT ignore, BLKCNT ignore,
  6      EMPCNT ignore, AVGSPC ignore, CHNCNT ignore, AVGRLN ignore,
  7      AVGSPC_FLB ignore, FLBCNT ignore,
  8      ANALYZETIME ignore, SAMPLESIZE ignore,
  9      DEGREE ignore, INSTANCES ignore,
 10      INTCOLS ignore, KERNELCOLS number, PROPERTY number)
 11      cluster  C_OBJ#(OBJ#)
 12      storage ( tablespace 0 segobjno 2 tabno 1 file 1 block 144);
. unloading table                      TAB$       0 rows unloaded
DUL> unload table OBJ$( OBJ# number, DATAOBJ# number, OWNER# number,
  2      NAME clean varchar2(30), NAMESPACE ignore, SUBNAME clean varchar2(30),
  3      TYPE# number, CTIME ignore, MTIME ignore, STIME ignore,
  4      STATUS ignore, REMOTEOWNER ignore, LINKNAME ignore,
  5      FLAGS ignore, OID$ hexraw)
  6      storage ( tablespace 0 segobjno 18 file 1 block 240);
. unloading table                      OBJ$   89804 rows unloaded
DUL>

这里可以明确表示tab$无记录,obj$有记录,从而启动的过程出现ORA-600 16703错误可以很好解释.由于数据库启动成功和报错几乎同时进行,以及上面看到的tab$记录不存在的情况,初步怀疑是有startup触发器清空tab$表记录
工具分析触发器表trigger$
startup-trigger


这里果然看到一个after startup on database的触发器,名字为DBMS_SUPPORT_DBMONITOR,而它调用的是DBMS_SUPPORT_DBMONITORP存储.

工具分析pl/sql表source$
DBMS_SUPPORT_DBMONITOR


对wraped加密的程序进行解密
DBMS_SUPPORT_DBMONITOR-unwraped


这里可以明确的看到DBMS_SUPPORT_DBMONITORP存储过程备份tab$表到orachk中然后delete tab$表,现在已经明确是由于DBMS_SUPPORT_DBMONITOR触发器在数据库重启之后开始执行调用DBMS_SUPPORT_DBMONITORP程序,如果判断数据库创建时间大于等于300天,便干掉tab$表,实现数据库破坏.

查找DBMS_SUPPORT_DBMONITOR等程序源头
分析相关程序创建时间,通过obj$表的ctime和name来判断
DBMS_SUPPORT_DBMONITOR-ctime
bootstrap-ctime


这里可以看出来DBMS_SUPPORT_DBMONITOR和DBMS_SUPPORT_DBMONITORP的创建时间基本上和数据库核心对象的创建时间相差无几,我们可以大概排除掉,是由于pl sql等工具连接数据库导致该发问题(类似:plsql dev引起的数据库被黑勒索比特币实现原理分析和解决方案),很可能是在dbca创建库的过程中就已经带有了DBMS_SUPPORT_DBMONITOR等程序,如果这样那很可能是由于数据库的安装介质被破坏导致该问题.

分析DBMS_SUPPORT_DBMONITOR来源
prvtsupp
20170711001626


这里已经很清晰,由于prvtsupp.plb被人注入了恶意脚本从而使得数据库被创建了DBMS_SUPPORT_DBMONITOR的触发器和DBMS_SUPPORT_DBMONITORP的存储过程,从而实现了对数据库的而易破坏.

确定破坏文件prvtsupp.plb来源于介质
jar


这里比较明显,文件就是来源database\stage\Components\oracle.rdbms.dbscripts\11.2.0.4.0\1\DataFiles\filegroup2.jar\rdbms\admin\prvtsupp.plb文件被修改导致
md5


通过md5码对比,可以确定是有人对软件的安装介质进行了破坏,从而实现了恶意代码的注入,从而实现了数据库300天之后重启之后无法正常启动而是出现类似ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []的错误.

温馨提示
各位一定要从官方途径下载oracle安装介质,如果是从其他互联网途径下载一定要验证md5,确保文件没有被人恶意篡改,造成无可挽回的损坏.如果真的不幸遇到这类问题,请保护现场联系我们
Tel:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com
我们可以通过使用bbed对tab$表数据数据进行恢复实现数据库正常启动,实现数据0丢失,最大限度抢救您的数据和减少业务恢复时间

ORA-600 16703故障解析—tab$表被清空

最近连续遇到两次数据库启动报ORA-600 16703错误,而导致数据库无法正常启动的问题
ORA-600 16703报错
ora-600-16703


Completed: ALTER DATABASE RECOVER  database
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 32 processes
Started redo scan
Completed redo scan
 read 0 KB redo, 0 data blocks need recovery
Started redo application at
 Thread 1: logseq 993752, block 2, scn 14872376881763
Recovery of Online Redo Log: Thread 1 Group 6 Seq 993752 Reading mem 0
  Mem# 0: /oracle/oradata/xifenfei/redo06.log
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 993752, block 3, scn 14872376901765
 0 data blocks read, 0 data blocks written, 0 redo k-bytes read
Tue Jul 04 23:13:29 2017
Thread 1 advanced to log sequence 993753 (thread open)
Thread 1 opened at log sequence 993753
  Current log# 7 seq# 993753 mem# 0: /oracle/oradata/xifenfei/redo07.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Jul 04 23:13:29 2017
SMON: enabling cache recovery
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_15886.trc  (incident=163595):
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_163595/xifenfei_ora_15886_i163595.trc
Tue Jul 04 23:13:30 2017
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_15886.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_15886.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER (ospid: 15886): terminating the instance due to error 704
Instance terminated by USER, pid = 15886
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (15886) as a result of ORA-1092

这里报错比较明显ORA-600 16703,而且是在启动时bootstrap$中的对象出现该问题.
10046分析启动过程

=====================
select rowcnt,blkcnt,empcnt,avgspc,chncnt,avgrln,nvl(degree,1), nvl(instances,1) from tab$ where obj# = :1
END OF STMT
PARSE #140048443935120:c=0,e=390,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905161433
=====================
select blevel, leafcnt, distkey, lblkkey, dblkkey, clufac,        nvl(degree,1), nvl(instances,1) from ind$ where bo# = :1 and obj# = :2
END OF STMT
PARSE #140048443934176:c=1000,e=601,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162088
=====================
PARSING IN CURSOR #140048443933232 len=70 dep=1 uid=0 oct=3 lid=0 tim=1499185905162444 hv=3377894161 ad='84f13d70' sqlid='32d4jrb4pd4sj'
select charsetid, charsetform from col$  where obj# = :1 and col# = :2
END OF STMT
PARSE #140048443933232:c=0,e=294,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162443
=====================
PARSING IN CURSOR #140048443932288 len=52 dep=1 uid=0 oct=3 lid=0 tim=1499185905247020 hv=429618617 ad='84f0f2d0' sqlid='4krwuz0ctqxdt'
select ctime, mtime, stime from obj$ where obj# = :1
END OF STMT
PARSE #140048443932288:c=0,e=549,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905247019
BINDS #140048443932288:
select blevel, leafcnt, distkey, lblkkey, dblkkey, clufac,        nvl(degree,1), nvl(instances,1) from ind$ where bo# = :1 and obj# = :2
END OF STMT
PARSE #140048443934176:c=1000,e=601,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162088
=====================
PARSING IN CURSOR #140048443933232 len=70 dep=1 uid=0 oct=3 lid=0 tim=1499185905162444 hv=3377894161 ad='84f13d70' sqlid='32d4jrb4pd4sj'
select charsetid, charsetform from col$  where obj# = :1 and col# = :2
END OF STMT
PARSE #140048443933232:c=0,e=294,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162443
=====================
PARSING IN CURSOR #140048443932288 len=52 dep=1 uid=0 oct=3 lid=0 tim=1499185905247020 hv=429618617 ad='84f0f2d0' sqlid='4krwuz0ctqxdt'
select ctime, mtime, stime from obj$ where obj# = :1
END OF STMT
PARSE #140048443932288:c=0,e=549,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905247019
BINDS #140048443932288:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f5f91b87bd0  bln=22  avl=02  flg=05
  value=20
EXEC #140048443932288:c=2000,e=2686,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=1218588913,tim=1499185905249810
WAIT #140048443932288: nam='db file sequential read' ela= 6205 file#=1 block#=337 blocks=1 obj#=36 tim=1499185905256132
WAIT #140048443932288: nam='db file sequential read' ela= 3739 file#=1 block#=338 blocks=1 obj#=36 tim=1499185905266704
WAIT #140048443932288: nam='db file sequential read' ela= 4966 file#=1 block#=241 blocks=1 obj#=18 tim=1499185905271761
FETCH #140048443932288:c=0,e=21976,p=3,cr=3,cu=0,mis=0,r=1,dep=1,og=4,plh=1218588913,tim=1499185905271820
STAT #140048443932288 id=1 cnt=1 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ$ (cr=3 pr=3 pw=0 time=21993 us)'
STAT #140048443932288 id=2 cnt=1 pid=1 pos=1 obj=36 op='INDEX RANGE SCAN I_OBJ1 (cr=2 pr=2 pw=0 time=16923 us)'
CLOSE #140048443932288:c=0,e=63,dep=1,type=0,tim=1499185905271941
BINDS #140048443935120:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f5f91c07de8  bln=22  avl=02  flg=05
  value=20
EXEC #140048443935120:c=1000,e=795,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=2970138452,tim=1499185905272802
WAIT #140048443935120: nam='db file sequential read' ela= 3197 file#=1 block#=169 blocks=1 obj#=3 tim=1499185905276069
WAIT #140048443935120: nam='db file sequential read' ela= 3459 file#=1 block#=170 blocks=1 obj#=3 tim=1499185905404084
WAIT #140048443935120: nam='db file sequential read' ela= 6358 file#=1 block#=145 blocks=1 obj#=4 tim=1499185905410548
FETCH #140048443935120:c=999,e=137805,p=3,cr=3,cu=0,mis=0,r=0,dep=1,og=4,plh=2970138452,tim=1499185905410635
STAT #140048443935120 id=1 cnt=0 pid=0 pos=1 obj=4 op='TABLE ACCESS CLUSTER TAB$ (cr=3 pr=3 pw=0 time=137810 us)'
STAT #140048443935120 id=2 cnt=1 pid=1 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=2 pw=0 time=131330 us)'
*** 2017-07-05 00:31:46.094
Incident 176395 created, dump file: /oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_176395/xifenfei_ora_51261_i176395.trc
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []

报错信息明显,由于select rowcnt,blkcnt,empcnt,avgspc,chncnt,avgrln,nvl(degree,1), nvl(instances,1) from tab$ where obj# = :1无法正常执行.从而出现了ORA-600 16703的错误,更加直接一点的解释就是obj#=20的对象在tab$中找不到记录,从而出现此类报错.和官方解释ORA-600 TAB$和obj$不匹配一致.
分析system文件
通过dul等工具分析system文件发现tab$表记录为空

Data UnLoader: 11.2.0.1.5 - Internal Only - on Wed Jul 05 01:28:53 2017
with 64-bit io functions and the decompression option
Copyright (c) 1994 2017 Bernard van Duijnen All rights reserved.
 Strictly Oracle Internal Use Only
Found db_id = 1334610369
Found db_name = xifenfei
DUL> unload table TAB$( OBJ# number, DATAOBJ# number,
  2      TS# number, FILE# number, BLOCK# number,
  3      BOBJ# number, TAB# number, COLS number, CLUCOLS number,
  4      PCTFREE$ ignore, PCTUSED$ ignore, INITRANS ignore, MAXTRANS ignore,
  5      FLAGS ignore, AUDIT$ ignore, ROWCNT ignore, BLKCNT ignore,
  6      EMPCNT ignore, AVGSPC ignore, CHNCNT ignore, AVGRLN ignore,
  7      AVGSPC_FLB ignore, FLBCNT ignore,
  8      ANALYZETIME ignore, SAMPLESIZE ignore,
  9      DEGREE ignore, INSTANCES ignore,
 10      INTCOLS ignore, KERNELCOLS number, PROPERTY number)
 11      cluster  C_OBJ#(OBJ#)
 12      storage ( tablespace 0 segobjno 2 tabno 1 file 1 block 144);
. unloading table                      TAB$       0 rows unloaded
DUL> unload table OBJ$( OBJ# number, DATAOBJ# number, OWNER# number,
  2      NAME clean varchar2(30), NAMESPACE ignore, SUBNAME clean varchar2(30),
  3      TYPE# number, CTIME ignore, MTIME ignore, STIME ignore,
  4      STATUS ignore, REMOTEOWNER ignore, LINKNAME ignore,
  5      FLAGS ignore, OID$ hexraw)
  6      storage ( tablespace 0 segobjno 18 file 1 block 240);
. unloading table                      OBJ$   89804 rows unloaded
DUL>

发现在obj$中有创建表ORACHKBEC66CBE055000000000001(ORACHK+16进制24位)的一个表名字
orachk


该表用途通过分析数据库日志发现
create table ORACHKBEC66CBE055000000000001 tablespace system as select * from sys.tab$;
也就是说,这个orachk的表是用来备份tab$的,然后进一步发现有delete from tab$.至此基本上分析清楚,tab$表备份到ORACHK表中,然后delete tab$表数据.实现数据库破坏以及难以恢复的效果.有点类似plsql dev引起的数据库被黑勒索比特币实现原理分析和解决方案的破坏案例
我们可以通过使用bbed对tab$表数据数据进行恢复实现数据库正常启动,实现数据0丢失,最大限度抢救您的数据和减少业务恢复时间
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com
源头已经分析清楚,是有人篡改了oracle安装介质,导致数据库运行300天之后出现该问题,具体见:警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703

Oracle 12c active dataguard switchover

从12.1开始adg的切换发生了一些改变,直接使用alter database switchover to [target standby db_unique_name] verify; alter database switchover to [target standby db_unique_name]; 即可完成切换,以下是一次生产环境的具体操作步骤
主库操作

SQL> select database_role from v$database;
DATABASE_ROLE
----------------
PRIMARY
SQL> alter database switchover to xifenfei verify;
数据库已更改。
--alert日志
Sun Jun 25 09:07:08 2017
diag_adl:SWITCHOVER VERIFY: Send VERIFY request to switchover target xifenfei
diag_adl:SWITCHOVER VERIFY COMPLETE
SQL> alter database switchover to xifenfei;
数据库已更改。
--alert日志
Sun Jun 25 09:07:46 2017
diag_adl:Starting switchover [Process ID: 37024]
Sun Jun 25 09:07:46 2017
diag_adl:ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY [Process Id: 37024] (xifenfei)
diag_adl:Waiting for target standby to receive all redo
Sun Jun 25 09:07:46 2017
diag_adl:Waiting for all non-current ORLs to be archived...
Sun Jun 25 09:07:46 2017
diag_adl:All non-current ORLs have been archived.
Sun Jun 25 09:07:46 2017
diag_adl:Waiting for all FAL entries to be archived...
Sun Jun 25 09:07:46 2017
diag_adl:All FAL entries have been archived.
Sun Jun 25 09:07:46 2017
diag_adl:Waiting for dest_id 2 to become synchronized...
Sun Jun 25 09:07:47 2017
diag_adl:Active, synchronized Physical Standby switchover target has been identified
diag_adl:Preventing updates and queries at the Primary
diag_adl:Generating and shipping final logs to target standby
diag_adl:Switchover End-Of-Redo Log thread 1 sequence 96534 has been fixed
diag_adl:Switchover: Primary highest seen SCN set to 0x3.0x109d7502
diag_adl:ARCH: Noswitch archival of thread 1, sequence 96534
diag_adl:ARCH: End-Of-Redo Branch archival of thread 1 sequence 96534
diag_adl:ARCH: LGWR is scheduled to archive destination LOG_ARCHIVE_DEST_2 after log switch
diag_adl:ARCH: Standby redo logfile selected for thread 1 sequence 96534 for destination LOG_ARCHIVE_DEST_2
diag_adl:ARCH: Archiving is disabled due to current logfile archival
diag_adl:Primary will check for some target standby to have received all redo
diag_adl:Waiting for target standby to apply all redo
diag_adl:Backup controlfile written to trace file
            /u01/app/oracle/diag/rdbms/xifenfeildg/xifenfei/trace/xifenfei_ora_37024.trc
diag_adl:Converting the primary database to a new standby database
diag_adl:Clearing standby activation ID 612004791 (0x247a73b7)
diag_adl:The primary database controlfile was created using the
diag_adl:'MAXLOGFILES 16' clause.
diag_adl:There is space for up to 11 standby redo logfiles
diag_adl:Use the following SQL commands on the standby database to create
diag_adl:standby redo logfiles that match the primary database:
diag_adl:ALTER DATABASE ADD STANDBY LOGFILE 'srl1.f' SIZE 1073741824;
diag_adl:ALTER DATABASE ADD STANDBY LOGFILE 'srl2.f' SIZE 1073741824;
diag_adl:ALTER DATABASE ADD STANDBY LOGFILE 'srl3.f' SIZE 1073741824;
diag_adl:ALTER DATABASE ADD STANDBY LOGFILE 'srl4.f' SIZE 1073741824;
diag_adl:ALTER DATABASE ADD STANDBY LOGFILE 'srl5.f' SIZE 1073741824;
diag_adl:ALTER DATABASE ADD STANDBY LOGFILE 'srl6.f' SIZE 1073741824;
diag_adl:Archivelog for thread 1 sequence 96534 required for standby recovery
diag_adl:Switchover: Primary controlfile converted to standby controlfile succesfully.
diag_adl:Switchover: Complete - Database shutdown required
diag_adl:Sending request(convert to primary database) to switchover target xifenfei
Sun Jun 25 09:07:58 2017
diag_adl:Switchover complete. Database shutdown required
diag_adl:USER (ospid: 37024): terminating the instance
Sun Jun 25 09:07:59 2017
diag_adl:Instance terminated by USER, pid = 37024
diag_adl:Shutting down instance (abort)
diag_adl:License high water mark = 527
Sun Jun 25 09:07:59 2017
Instance shutdown complete

备库alert日志

Sun Jun 25 09:05:54 2017
SWITCHOVER VERIFY BEGIN
SWITCHOVER VERIFY COMPLETE
Sun Jun 25 09:06:35 2017
RFS[107]: Assigned to RFS process (PID:7330)
RFS[107]: Selected log 12 for thread 1 sequence 96534 dbid 588725663 branch 916962073
Sat Jun 24 20:06:35 2017
Archived Log entry 100576 added for thread 1 sequence 96534 ID 0x247a73b7 dest 1:
Sat Jun 24 20:06:35 2017
Resetting standby activation ID 612004791 (0x247a73b7)
Sat Jun 24 20:06:35 2017
Media Recovery End-Of-Redo indicator encountered
Sat Jun 24 20:06:35 2017
Media Recovery Continuing
Media Recovery Waiting for thread 1 sequence 96535
Sun Jun 25 09:06:36 2017
SWITCHOVER: received request 'ALTER DTABASE COMMIT TO SWITCHOVER  TO PRIMARY' from primary database.
Sun Jun 25 09:06:36 2017
ALTER DATABASE SWITCHOVER TO PRIMARY (xifenfei)
Maximum wait for role transition is 15 minutes.
Switchover: Media recovery is still active
Role Change: Canceling MRP - no more redo to apply
Sat Jun 24 20:06:36 2017
MRP0: Background Media Recovery cancelled with status 16037
Sat Jun 24 20:06:36 2017
Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_pr00_4590.trc:
ORA-16037: user requested cancel of managed recovery operation
Managed Standby Recovery not using Real Time Apply
Recovery interrupted!
Sat Jun 24 20:06:36 2017
Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_pr00_4590.trc:
ORA-16037: user requested cancel of managed recovery operation
Sat Jun 24 20:06:37 2017
MRP0: Background Media Recovery process shutdown (xifenfei)
Sun Jun 25 09:06:38 2017
Role Change: Canceled MRP
Killing 2 processes(PIDS:7328,4704)(all RFS)
      in order to disallow current and future RFS connections.Requested by OS process 7334
Stopping Emon pool
All dispatchers and shared servers shutdown
CLOSE: killing server sessions.
Active process 5428 user 'oracle' program 'oracle@kage7.hk0620.com (TNS V1-V3)'
Active process 5161 user 'oracle' program 'oracle@kage7.hk0620.com'
…………
Active process 5428 user 'oracle' program 'oracle@kage7.hk0620.com (TNS V1-V3)'
Active process 5161 user 'oracle' program 'oracle@kage7.hk0620.com'
Active process 5178 user 'oracle' program 'oracle@kage7.hk0620.com'
CLOSE: all sessions shutdown successfully.
Stopping Emon pool
Sat Jun 24 20:06:43 2017
SMON: disabling cache recovery
Sat Jun 24 20:06:44 2017
Buffer Cache Full DB Caching mode changing from FULL CACHING DISABLED to FULL CACHING ENABLED
Sun Jun 25 09:06:44 2017
Backup controlfile written to trace file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_rmi_7334.trc
SwitchOver after complete recovery through change 13163656450
Online logfile pre-clearing operation disabled by switchover
Online log /u01/app/oracle/oradata/xifenfei/redo01n.log: Thread 1 Group 1 was previously cleared
Online log /u01/app/oracle/fast_recovery_area/xifenfei/redo01n.log: Thread 1 Group 1 was previously cleared
Online log /u01/app/oracle/oradata/xifenfei/redo02n.log: Thread 1 Group 2 was previously cleared
Online log /u01/app/oracle/fast_recovery_area/xifenfei/redo02n.log: Thread 1 Group 2 was previously cleared
Online log /u01/app/oracle/oradata/xifenfei/redo03n.log: Thread 1 Group 3 was previously cleared
Online log /u01/app/oracle/fast_recovery_area/xifenfei/redo03n.log: Thread 1 Group 3 was previously cleared
Online log /u01/app/oracle/oradata/xifenfei/redo04n.log: Thread 1 Group 4 was previously cleared
Online log /u01/app/oracle/fast_recovery_area/xifenfei/redo04n.log: Thread 1 Group 4 was previously cleared
Online log /u01/app/oracle/oradata/xifenfei/redo05n.log: Thread 1 Group 5 was previously cleared
Online log /u01/app/oracle/fast_recovery_area/xifenfei/redo05n.log: Thread 1 Group 5 was previously cleared
Standby became primary SCN: 13163656448
Switchover: Complete - Database mounted as primary
SWITCHOVER: completed request from primary database.
Sat Jun 24 20:07:12 2017
ARC0: Becoming the 'no SRL' ARCH

原备库(现主库)操作

SQL> conn / as sysdba
Connected.
SQL>  select database_role,open_mode from v$database;
DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PRIMARY          MOUNTED
SQL> alter database open;
Database altered.

原主库(现备库)操作

[oracle@localhost scripts]$ ss
SQL*Plus: Release 12.1.0.2.0 Production on 星期日 6月 25 09:09:40 2017
Copyright (c) 1982, 2014, Oracle.  All rights reserved.
已连接到空闲例程。
SQL> startup
ORACLE 例程已经启动。
Total System Global Area 8.5899E+10 bytes
Fixed Size                  7654304 bytes
Variable Size            1.2616E+10 bytes
Database Buffers         7.3014E+10 bytes
Redo Buffers              260780032 bytes
数据库装载完毕。
数据库已经打开。
SQL> alter database recover managed standby database disconnect;
数据库已更改。

oracle rac 12.2 执行root.sh报CLSRSC-400

在redhat 7.3版本中安装oracle rac 12.2的过程中,执行root.sh脚本的第14步的时候报如下错误,导致无法继续
CLSRSC-400: A system reboot is required to continue installing.
The command ‘/u01/app/grid/product/12.2.0/grid/perl/bin/perl -I/u01/app/grid/product/12.2.0/grid/perl/lib
-I/u01/app/grid/product/12.2.0/grid/crs/install /u01/app/grid/product/12.2.0/grid/crs/install/rootcrs.pl ‘ execution failed
os版本信息

[grid@xifenfei01 ~]$ more /etc/redhat-release
Red Hat Enterprise Linux Server release 7.3 (Maipo)
[grid@xifenfei01 ~]$ uname -a
Linux xifenfei01 3.10.0-514.el7.x86_64 #1 SMP Wed Oct 19 11:24:13 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

root.sh报错

[root@xifenfei01 ~]# /u01/app/grid/oraInventory/orainstRoot.sh
Changing permissions of /u01/app/grid/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.
Changing groupname of /u01/app/grid/oraInventory to oinstall.
The execution of the script is complete.
[root@xifenfei01 ~]# /u01/app/grid/product/12.2.0/grid/root.sh
Performing root user operation.
The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/grid/product/12.2.0/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...
Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Relinking oracle with rac_on option
Using configuration parameter file: /u01/app/grid/product/12.2.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/grid/grid_bash/crsdata/xifenfei01/crsconfig/rootcrs_xifenfei01_2017-06-11_09-52-55AM.log
2017/06/11 09:53:00 CLSRSC-594: Executing installation step 1 of 19: 'SetupTFA'.
2017/06/11 09:53:00 CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector.
2017/06/11 09:53:27 CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector.
2017/06/11 09:53:27 CLSRSC-594: Executing installation step 2 of 19: 'ValidateEnv'.
2017/06/11 09:53:30 CLSRSC-363: User ignored prerequisites during installation
2017/06/11 09:53:30 CLSRSC-594: Executing installation step 3 of 19: 'CheckFirstNode'.
2017/06/11 09:53:31 CLSRSC-594: Executing installation step 4 of 19: 'GenSiteGUIDs'.
2017/06/11 09:53:32 CLSRSC-594: Executing installation step 5 of 19: 'SaveParamFile'.
2017/06/11 09:53:37 CLSRSC-594: Executing installation step 6 of 19: 'SetupOSD'.
2017/06/11 09:53:38 CLSRSC-594: Executing installation step 7 of 19: 'CheckCRSConfig'.
2017/06/11 09:53:38 CLSRSC-594: Executing installation step 8 of 19: 'SetupLocalGPNP'.
2017/06/11 09:53:51 CLSRSC-594: Executing installation step 9 of 19: 'ConfigOLR'.
2017/06/11 09:53:56 CLSRSC-594: Executing installation step 10 of 19: 'ConfigCHMOS'.
2017/06/11 09:53:56 CLSRSC-594: Executing installation step 11 of 19: 'CreateOHASD'.
2017/06/11 09:54:00 CLSRSC-594: Executing installation step 12 of 19: 'ConfigOHASD'.
2017/06/11 09:54:15 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.service'
2017/06/11 09:54:44 CLSRSC-594: Executing installation step 13 of 19: 'InstallAFD'.
2017/06/11 09:54:48 CLSRSC-594: Executing installation step 14 of 19: 'InstallACFS'.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'xifenfei01'
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'xifenfei01' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
2017/06/11 09:55:15 CLSRSC-400: A system reboot is required to continue installing.
The command '/u01/app/grid/product/12.2.0/grid/perl/bin/perl -I/u01/app/grid/product/12.2.0/grid/perl/lib
-I/u01/app/grid/product/12.2.0/grid/crs/install /u01/app/grid/product/12.2.0/grid/crs/install/rootcrs.pl'execution failed

主要报错信息:
2017/06/11 09:55:15 CLSRSC-400: A system reboot is required to continue installing.
The command ‘/u01/app/grid/product/12.2.0/grid/perl/bin/perl -I/u01/app/grid/product/12.2.0/grid/perl/lib -I/u01/app/grid/product/12.2.0/grid/crs/install /u01/app/grid/product/12.2.0/grid/crs/install/rootcrs.pl ‘ execution failed
查询mos发下:ACFS Drivers Install reports CLSRSC-400: A system reboot is required to continue installing (Doc ID 2025056.1),主要是由于12c gi开始,acfs默认是安装的,由于acfs在redhat 7.3中不支持导致上述的错误信息.

[grid@xifenfei01 ~]$ acfsdriverstate -orahome $ORACLE_HOME supported
ACFS-9459: ADVM/ACFS is not supported on this OS version: '3.10.0-514.el7.x86_64'
ACFS-9201: Not Supported

处理方法
停掉crs,kill 进程(如果有不能停掉的,通过kill处理),执行root.sh

[root@xifenfei01 ~]# /u01/app/grid/product/12.2.0/grid/bin/crsctl status res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        OFFLINE OFFLINE                               STABLE
ora.cluster_interconnect.haip
      1        OFFLINE OFFLINE                               STABLE
ora.crf
      1        OFFLINE OFFLINE                               STABLE
ora.crsd
      1        OFFLINE OFFLINE                               STABLE
ora.cssd
      1        OFFLINE OFFLINE                               STABLE
ora.cssdmonitor
      1        OFFLINE OFFLINE                               STABLE
ora.ctssd
      1        OFFLINE OFFLINE                               STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.drivers.acfs
      1        OFFLINE OFFLINE                               STABLE
ora.evmd
      1        OFFLINE OFFLINE                               STABLE
ora.gipcd
      1        OFFLINE OFFLINE                               STABLE
ora.gpnpd
      1        OFFLINE OFFLINE                               STABLE
ora.mdnsd
      1        OFFLINE OFFLINE                               STABLE
ora.storage
      1        OFFLINE OFFLINE                               STABLE
--------------------------------------------------------------------------------
[root@xifenfei01 ~]# /u01/app/grid/product/12.2.0/grid/bin/crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'xifenfei01'
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'xifenfei02' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@xifenfei02 ~]# ps -ef|grep d.bin
root      29155  11754  0 10:46 pts/0    00:00:00 grep --color=auto d.bin
[root@xifenfei01 ~]# /u01/app/grid/product/12.2.0/grid/root.sh
Performing root user operation.
The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/grid/product/12.2.0/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Relinking oracle with rac_on option
Using configuration parameter file: /u01/app/grid/product/12.2.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/grid/grid_bash/crsdata/xifenfei01/crsconfig/rootcrs_xifenfei01_2017-06-11_10-33-57AM.log
2017/06/11 10:33:59 CLSRSC-594: Executing installation step 1 of 19: 'SetupTFA'.
2017/06/11 10:33:59 CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector.
2017/06/11 10:34:00 CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector.
2017/06/11 10:34:00 CLSRSC-594: Executing installation step 2 of 19: 'ValidateEnv'.
2017/06/11 10:34:01 CLSRSC-363: User ignored prerequisites during installation
2017/06/11 10:34:01 CLSRSC-594: Executing installation step 3 of 19: 'CheckFirstNode'.
2017/06/11 10:34:02 CLSRSC-594: Executing installation step 4 of 19: 'GenSiteGUIDs'.
2017/06/11 10:34:02 CLSRSC-594: Executing installation step 5 of 19: 'SaveParamFile'.
2017/06/11 10:34:03 CLSRSC-594: Executing installation step 6 of 19: 'SetupOSD'.
2017/06/11 10:34:04 CLSRSC-594: Executing installation step 7 of 19: 'CheckCRSConfig'.
2017/06/11 10:34:04 CLSRSC-594: Executing installation step 8 of 19: 'SetupLocalGPNP'.
2017/06/11 10:34:06 CLSRSC-594: Executing installation step 9 of 19: 'ConfigOLR'.
2017/06/11 10:34:06 CLSRSC-594: Executing installation step 10 of 19: 'ConfigCHMOS'.
2017/06/11 10:34:53 CLSRSC-594: Executing installation step 11 of 19: 'CreateOHASD'.
2017/06/11 10:34:54 CLSRSC-594: Executing installation step 12 of 19: 'ConfigOHASD'.
2017/06/11 10:35:09 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.service'
2017/06/11 10:35:31 CLSRSC-594: Executing installation step 13 of 19: 'InstallAFD'.
2017/06/11 10:35:33 CLSRSC-594: Executing installation step 14 of 19: 'InstallACFS'.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.evmd' on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'xifenfei01'
CRS-2677: Stop of 'ora.mdnsd' on 'xifenfei01' succeeded
CRS-2677: Stop of 'ora.evmd' on 'xifenfei01' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'xifenfei01' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'xifenfei01'
CRS-2677: Stop of 'ora.gipcd' on 'xifenfei01' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'xifenfei01' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
2017/06/11 10:35:57 CLSRSC-594: Executing installation step 15 of 19: 'InstallKA'.
2017/06/11 10:36:01 CLSRSC-594: Executing installation step 16 of 19: 'InitConfig'.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'xifenfei01'
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'xifenfei01' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.evmd' on 'xifenfei01'
CRS-2672: Attempting to start 'ora.mdnsd' on 'xifenfei01'
CRS-2676: Start of 'ora.mdnsd' on 'xifenfei01' succeeded
CRS-2676: Start of 'ora.evmd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'xifenfei01'
CRS-2676: Start of 'ora.gpnpd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'xifenfei01'
CRS-2672: Attempting to start 'ora.gipcd' on 'xifenfei01'
CRS-2676: Start of 'ora.cssdmonitor' on 'xifenfei01' succeeded
CRS-2676: Start of 'ora.gipcd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'xifenfei01'
CRS-2672: Attempting to start 'ora.diskmon' on 'xifenfei01'
CRS-2676: Start of 'ora.diskmon' on 'xifenfei01' succeeded
CRS-2676: Start of 'ora.cssd' on 'xifenfei01' succeeded
Disk groups created successfully. Check /u01/app/grid/grid_bash/cfgtoollogs/asmca/asmca-170611AM103637.log for details.
2017/06/11 10:37:40 CLSRSC-482: Running command: '/u01/app/grid/product/12.2.0/grid/bin/ocrconfig -upgrade grid oinstall'
CRS-2672: Attempting to start 'ora.crf' on 'xifenfei01'
CRS-2672: Attempting to start 'ora.storage' on 'xifenfei01'
CRS-2676: Start of 'ora.storage' on 'xifenfei01' succeeded
CRS-2676: Start of 'ora.crf' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'xifenfei01'
CRS-2676: Start of 'ora.crsd' on 'xifenfei01' succeeded
CRS-4256: Updating the profile
Successful addition of voting disk 49af246c7d2e4f5dbf0d9ea09cc047d5.
Successfully replaced voting disk group with +DATA.
CRS-4256: Updating the profile
CRS-4266: Voting file(s) successfully replaced
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   49af246c7d2e4f5dbf0d9ea09cc047d5 (/dev/mapper/data1) [DATA]
Located 1 voting disk(s).
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.crsd' on 'xifenfei01'
CRS-2677: Stop of 'ora.crsd' on 'xifenfei01' succeeded
CRS-2673: Attempting to stop 'ora.storage' on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.crf' on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'xifenfei01'
CRS-2677: Stop of 'ora.crf' on 'xifenfei01' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'xifenfei01' succeeded
CRS-2677: Stop of 'ora.storage' on 'xifenfei01' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'xifenfei01'
CRS-2677: Stop of 'ora.mdnsd' on 'xifenfei01' succeeded
CRS-2677: Stop of 'ora.asm' on 'xifenfei01' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'xifenfei01'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'xifenfei01' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'xifenfei01'
CRS-2673: Attempting to stop 'ora.evmd' on 'xifenfei01'
CRS-2677: Stop of 'ora.ctssd' on 'xifenfei01' succeeded
CRS-2677: Stop of 'ora.evmd' on 'xifenfei01' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'xifenfei01'
CRS-2677: Stop of 'ora.cssd' on 'xifenfei01' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'xifenfei01'
CRS-2677: Stop of 'ora.gipcd' on 'xifenfei01' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'xifenfei01' has completed
CRS-4133: Oracle High Availability Services has been stopped.
2017/06/11 10:38:40 CLSRSC-594: Executing installation step 17 of 19: 'StartCluster'.
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'xifenfei01'
CRS-2672: Attempting to start 'ora.evmd' on 'xifenfei01'
CRS-2676: Start of 'ora.mdnsd' on 'xifenfei01' succeeded
CRS-2676: Start of 'ora.evmd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'xifenfei01'
CRS-2676: Start of 'ora.gpnpd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'xifenfei01'
CRS-2676: Start of 'ora.gipcd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'xifenfei01'
CRS-2674: Start of 'ora.drivers.acfs' on 'xifenfei01' failed
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'xifenfei01'
CRS-2676: Start of 'ora.cssdmonitor' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'xifenfei01'
CRS-2672: Attempting to start 'ora.diskmon' on 'xifenfei01'
CRS-2676: Start of 'ora.diskmon' on 'xifenfei01' succeeded
CRS-2676: Start of 'ora.cssd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'xifenfei01'
CRS-2672: Attempting to start 'ora.ctssd' on 'xifenfei01'
CRS-2676: Start of 'ora.ctssd' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'xifenfei01'
CRS-2674: Start of 'ora.drivers.acfs' on 'xifenfei01' failed
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'xifenfei01'
CRS-2676: Start of 'ora.asm' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'xifenfei01'
CRS-2676: Start of 'ora.storage' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'xifenfei01'
CRS-2676: Start of 'ora.crf' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'xifenfei01'
CRS-2676: Start of 'ora.crsd' on 'xifenfei01' succeeded
CRS-6023: Starting Oracle Cluster Ready Services-managed resources
CRS-6017: Processing resource auto-start for servers: xifenfei01
CRS-6016: Resource auto-start has completed for server xifenfei01
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
2017/06/11 10:40:23 CLSRSC-343: Successfully started Oracle Clusterware stack
2017/06/11 10:40:23 CLSRSC-594: Executing installation step 18 of 19: 'ConfigNode'.
CRS-2672: Attempting to start 'ora.ASMNET1LSNR_ASM.lsnr' on 'xifenfei01'
CRS-2676: Start of 'ora.ASMNET1LSNR_ASM.lsnr' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'xifenfei01'
CRS-2676: Start of 'ora.asm' on 'xifenfei01' succeeded
CRS-2672: Attempting to start 'ora.DATA.dg' on 'xifenfei01'
CRS-2676: Start of 'ora.DATA.dg' on 'xifenfei01' succeeded
2017/06/11 10:42:19 CLSRSC-594: Executing installation step 19 of 19: 'PostConfig'.
2017/06/11 10:43:16 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster ... succeeded

其他剩余节点也是类似处理,最终跳过acfs安装成功

[grid@xifenfei01 ~]$ crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       xifenfei01               STABLE
               ONLINE  ONLINE       xifenfei02               STABLE
ora.DATA.dg
               ONLINE  ONLINE       xifenfei01               STABLE
               ONLINE  ONLINE       xifenfei02               STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       xifenfei01               STABLE
               ONLINE  ONLINE       xifenfei02               STABLE
ora.chad
               ONLINE  ONLINE       xifenfei01               STABLE
               ONLINE  ONLINE       xifenfei02               STABLE
ora.net1.network
               ONLINE  ONLINE       xifenfei01               STABLE
               ONLINE  ONLINE       xifenfei02               STABLE
ora.ons
               ONLINE  ONLINE       xifenfei01               STABLE
               ONLINE  ONLINE       xifenfei02               STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xifenfei01               STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       xifenfei01               169.254.20.214 192.1
                                                             68.1.20 192.168.2.20
                                                             ,STABLE
ora.asm
      1        ONLINE  ONLINE       xifenfei01               Started,STABLE
      2        ONLINE  ONLINE       xifenfei02               Started,STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       xifenfei01               STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       xifenfei01               Open,STABLE
ora.qosmserver
      1        ONLINE  ONLINE       xifenfei01               STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       xifenfei01               STABLE
ora.xifenfei01.vip
      1        ONLINE  ONLINE       xifenfei01               STABLE
ora.xifenfei02.vip
      1        ONLINE  ONLINE       xifenfei02               STABLE
--------------------------------------------------------------------------------

最新官方处理方案:CLSRSC-400: A system reboot is required to continue installing.

_optimizer_null_aware_antijoin和not in效率

准备两个测试表

SQL> conn chf/oracle
Connected.
SQL> select * from v$version;
BANNER
----------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
PL/SQL Release 11.2.0.4.0 - Production
CORE    11.2.0.4.0      Production
TNS for Linux: Version 11.2.0.4.0 - Production
NLSRTL Version 11.2.0.4.0 - Production
SQL> create table t_xifenfei
  2  as select * from dba_objects;
Table created.
SQL> select count(*) from t_xifenfei;
  COUNT(*)
----------
     86259
SQL> create table t_xifenfei1
  2  as select * from dba_objects;
Table created.
SQL> select count(*) from t_xifenfei1;
  COUNT(*)
----------
     86260
--删除部分记录,用来做not in的内部表
SQL> delete from t_xifenfei where object_id>86200;
918 rows deleted.
SQL> commit;
Commit complete.

查询_optimizer_null_aware_antijoin隐含参数默认值

SQL> conn / as sysdba
Connected.
SQL> col name for a52
SQL> col value for a24
SQL> col description for a50
set linesize 150
select a.ksppinm name,b.ksppstvl value,a.ksppdesc description
  from x$ksppi a,x$ksppcv b
 where a.inst_id = USERENV ('Instance')
   and b.inst_id = USERENV ('Instance')
   and a.indx = b.indx
   and upper(a.ksppinm) LIKE upper('%&param%')
order by name
SQL> SQL>   2    3    4    5    6    7    8  /
Enter value for param: _optimizer_null_aware_antijoin
old   6:    and upper(a.ksppinm) LIKE upper('%&param%')
new   6:    and upper(a.ksppinm) LIKE upper('%_optimizer_null_aware_antijoin%')
NAME                                                 VALUE                    DESCRIPTION
---------------------------------------------------- ------------------------ -----------------------------
_optimizer_null_aware_antijoin                       TRUE                     null-aware antijoin parameter

_optimizer_null_aware_antijoin从11.1.0.6开始引进,默认为true

_optimizer_null_aware_antijoin为true,执行not in

SQL> conn chf/oracle
Connected.
SQL> set autot trace
SQL> set timing on
SQL> set lines 150
SQL> set pages 1000
SQL>  select count(*) from t_xifenfei1 where object_id not in(select object_id from t_xifenfei);
Elapsed: 00:00:00.09
Execution Plan
----------------------------------------------------------
Plan hash value: 4048525918
----------------------------------------------------------------------------------------
| Id  | Operation                | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |             |     1 |    10 |   688   (1)| 00:00:09 |
|   1 |  SORT AGGREGATE          |             |     1 |    10 |            |          |
|*  2 |   HASH JOIN RIGHT ANTI NA|             |  1137 | 11370 |   688   (1)| 00:00:09 |
|   3 |    TABLE ACCESS FULL     | T_XIFENFEI  | 85341 |   416K|   344   (1)| 00:00:05 |
|   4 |    TABLE ACCESS FULL     | T_XIFENFEI1 | 86260 |   421K|   344   (1)| 00:00:05 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OBJECT_ID"="OBJECT_ID")
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       2472  consistent gets
          0  physical reads
          0  redo size
        527  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

_optimizer_null_aware_antijoin为false,执行not in

SQL> alter session set "_optimizer_null_aware_antijoin"=false;
Session altered.
Elapsed: 00:00:00.00
SQL> select count(*) from t_xifenfei1 where object_id not in(select object_id from t_xifenfei);
Elapsed: 00:02:29.64
Execution Plan
----------------------------------------------------------
Plan hash value: 2503880249
-----------------------------------------------------------------------------------
| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |             |     1 |     5 |    25M  (1)| 86:20:57 |
|   1 |  SORT AGGREGATE     |             |     1 |     5 |            |          |
|*  2 |   FILTER            |             |       |       |            |          |
|   3 |    TABLE ACCESS FULL| T_XIFENFEI1 | 86260 |   421K|   344   (1)| 00:00:05 |
|*  4 |    TABLE ACCESS FULL| T_XIFENFEI  |     1 |     5 |   344   (1)| 00:00:05 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter( NOT EXISTS (SELECT 0 FROM "T_XIFENFEI" "T_XIFENFEI" WHERE
              LNNVL("OBJECT_ID"<>:B1)))
   4 - filter(LNNVL("OBJECT_ID"<>:B1))
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
   52982891  consistent gets
          0  physical reads
          0  redo size
        527  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

这里很明显,当 _optimizer_null_aware_antijoin为false的时候not in效率非常低(当in里面记录多,使用FILTER效率肯定低下).

_optimizer_null_aware_antijoin为false,执行not exists

SQL> alter session set "_optimizer_null_aware_antijoin"=false;
Session altered.
SQL>select count(*) from t_xifenfei1 b where not exists
 2  (select 1 from t_xifenfei a where a.object_id=b.object_id);
Elapsed: 00:00:00.07
Execution Plan
----------------------------------------------------------
Plan hash value: 2976307246
-------------------------------------------------------------------------------------
| Id  | Operation             | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |             |     1 |    10 |   688   (1)| 00:00:09 |
|   1 |  SORT AGGREGATE       |             |     1 |    10 |            |          |
|*  2 |   HASH JOIN RIGHT ANTI|             |  1137 | 11370 |   688   (1)| 00:00:09 |
|   3 |    TABLE ACCESS FULL  | T_XIFENFEI  | 85341 |   416K|   344   (1)| 00:00:05 |
|   4 |    TABLE ACCESS FULL  | T_XIFENFEI1 | 86260 |   421K|   344   (1)| 00:00:05 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2472  consistent gets
          0  physical reads
          0  redo size
        527  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

对于11g的版本可以通过_optimizer_null_aware_antijoin参数开启NULL-aware Anti join特性来提高not in的效率,对于11g以下版本可以通过not exists来提高效率

file$ 删除记录恢复(delete file$ recovery)

最近遭遇几次有人因为对oracle不太理解,由于各种情况下,删除了file$中的部分记录,从而使得该文件之后的文件都丢失,使得数据库出现各种异常情况。这里演示了可以重启数据库的情况下两种常见的删除file$中记录的恢复

创建表空间和表

SQL> startup
ORACLE instance started.
Total System Global Area 2421825536 bytes
Fixed Size                  2215744 bytes
Variable Size            1828716736 bytes
Database Buffers          570425344 bytes
Redo Buffers               20467712 bytes
Database mounted.
Database opened.
SQL> create tablespace tbs_delete_file datafile '/home/oracle/oradata/xifenfei/file01.dbf' size 128M;
Tablespace created.
SQL> alter tablespace tbs_delete_file add datafile '/home/oracle/oradata/xifenfei/file02.dbf' size 128M;
Tablespace altered.
SQL> alter tablespace tbs_delete_file add datafile '/home/oracle/oradata/xifenfei/file03.dbf' size 128M;
Tablespace altered.
SQL> create table t_xifenfei tablespace tbs_delete_file
  2  as  select * from dba_objects;
Table created.
SQL> insert into t_xifenfei select * from dba_objects;
71895 rows created.
SQL> /
71895 rows created.
SQL> /
71895 rows created.
SQL> /
71895 rows created.
SQL> commit;
Commit complete.
SQL> select count(*) from t_xifenfei;
  COUNT(*)
----------
    359475

删除file$中记录

SQL> select ts#,file# from file$;
       TS#      FILE#
---------- ----------
         0          1
         1          2
         2          3
         4          4
         6          5
         6          6
         6          7
7 rows selected.
[oracle@localhost ~]$ ss
SQL*Plus: Release 11.2.0.1.0 Production on Fri Jun 2 23:30:57 2017
Copyright (c) 1982, 2009, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> delete from file$ where file#=5;
1 row deleted.
SQL> commit;
Commit complete.
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

不重建控制文件,重启数据库

[oracle@localhost tmp]$ ss
SQL*Plus: Release 11.2.0.1.0 Production on Fri Jun 2 23:46:33 2017
Copyright (c) 1982, 2009, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 2421825536 bytes
Fixed Size                  2215744 bytes
Variable Size            1828716736 bytes
Database Buffers          570425344 bytes
Redo Buffers               20467712 bytes
Database mounted.
Database opened.
SQL> select file# from file$;
     FILE#
----------
         1
         2
         3
         4
         6
         7
6 rows selected.
SQL> select file# from v$datafile;
     FILE#
----------
         1
         2
         3
         4
         5
         6
         7
7 rows selected.
SQL> select count(*) from t_xifenfei;
  COUNT(*)
----------
    359475

数据库启动正常,而且文件也未从控制文件中删除,而且记录查询正常,考虑通过逻辑方式迁移数据。

测试重建控制文件

SQL> startup nomount
ORACLE instance started.
Total System Global Area 2421825536 bytes
Fixed Size                  2215744 bytes
Variable Size            1828716736 bytes
Database Buffers          570425344 bytes
Redo Buffers               20467712 bytes
SQL> CREATE CONTROLFILE REUSE DATABASE "TEST" NORESETLOGS  NOARCHIVELOG
  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
    MAXDATAFILES 100
    MAXINSTANCES 8
    MAXLOGHISTORY 292
  4    5    6    7  LOGFILE
  8    GROUP 1 '/home/oracle/oradata/xifenfei/redo01.log'  SIZE 50M BLOCKSIZE 512,
  GROUP 2 '/home/oracle/oradata/xifenfei/redo02.log'  SIZE 50M BLOCKSIZE 512,
  GROUP 3 '/home/oracle/oradata/xifenfei/redo03.log'  SIZE 50M BLOCKSIZE 512
  9   10   11  DATAFILE
 12    '/home/oracle/oradata/xifenfei/system01.dbf',
  '/home/oracle/oradata/xifenfei/sysaux01.dbf',
  '/home/oracle/oradata/xifenfei/undotbs01.dbf',
 13   14   15    '/home/oracle/oradata/xifenfei/users01.dbf',
  '/home/oracle/oradata/xifenfei/file01.dbf',
 16   17    '/home/oracle/oradata/xifenfei/file02.dbf',
  '/home/oracle/oradata/xifenfei/file03.dbf'
CHARACTER SET AL32UTF8
 18   19   20  ;
Control file created.
SQL> alter database open;
Database altered.
SQL> select file# from v$datafile;
     FILE#
----------
         1
         2
         3
         4
SQL> select file# from file$;
     FILE#
----------
         1
         2
         3
         4
         6
         7
6 rows selected.
SQL> select count(*) from t_xifenfei;
select count(*) from t_xifenfei
                     *
ERROR at line 1:
ORA-00600: internal error code, arguments: [25029], [6], [], [], [], [], [],[], [], [], [], []

删除file$中记录,然后重启库之后,大于删除的file#之后的数据文件全部丢失.

alert日志报错

Fri Jun 02 23:49:42 2017
alter database open
Fri Jun 02 23:49:42 2017
Thread 1 advanced to log sequence 9 (thread open)
Thread 1 opened at log sequence 9
  Current log# 3 seq# 9 mem# 0: /home/oracle/oradata/xifenfei/redo03.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Jun 02 23:49:42 2017
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 2.
Dictionary check beginning
Tablespace 'TEMP' #3 found in data dictionary,
but not in the controlfile. Adding to controlfile.
File #5 in the controlfile not found in data dictionary.
Removing file from controlfile.
data file 5: '/home/oracle/oradata/xifenfei/file01.dbf'
File #6 in the controlfile not found in data dictionary.
Removing file from controlfile.
data file 6: '/home/oracle/oradata/xifenfei/file02.dbf'
File #7 in the controlfile not found in data dictionary.
Removing file from controlfile.
data file 7: '/home/oracle/oradata/xifenfei/file03.dbf'
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
*********************************************************************
WARNING: The following temporary tablespaces contain no files.
         This condition can occur when a backup controlfile has
         been restored.  It may be necessary to add files to these
         tablespaces.  That can be done using the SQL statement:
         ALTER TABLESPACE <tablespace_name> ADD TEMPFILE
         Alternatively, if these temporary tablespaces are no longer
         needed, then they can be dropped.
           Empty temporary tablespace: TEMP
*********************************************************************
Database Characterset is AL32UTF8
No Resource Manager plan active
**********************************************************
WARNING: Files may exists in db_recovery_file_dest
that are not known to the database. Use the RMAN command
CATALOG RECOVERY AREA to re-catalog any such files.
If files cannot be cataloged, then manually delete them
using OS command.
One of the following events caused this:
1. A backup controlfile was restored.
2. A standby controlfile was restored.
3. The controlfile was re-created.
4. db_recovery_file_dest had previously been enabled and
   then disabled.
**********************************************************
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Fri Jun 02 23:49:43 2017
QMNC started with pid=20, OS id=11886
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Completed: alter database open
Fri Jun 02 23:49:44 2017
db_recovery_file_dest_size of 3882 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Fri Jun 02 23:49:44 2017
Errors in file /opt/oracle/diag/rdbms/test/test/trace/test_m001_11890.trc  (incident=84344):
ORA-00600: internal error code, arguments: [25029], [6], [], [], [], [], [], [], [], [], [], []
Incident details in: /opt/oracle/diag/rdbms/test/test/incident/incdir_84344/test_m001_11890_i84344.trc
Errors in file /opt/oracle/diag/rdbms/test/test/trace/test_m001_11890.trc:
ORA-00600: internal error code, arguments: [25029], [6], [], [], [], [], [], [], [], [], [], []
Fri Jun 02 23:49:45 2017
Trace dumping is performing id=[cdmp_20170602234945]
Fri Jun 02 23:49:46 2017
Starting background process CJQ0
Fri Jun 02 23:49:46 2017
CJQ0 started with pid=21, OS id=11902

这里报错明显,由于file$.file#=5被删除,重建控制文件之后,file#在5之后的数据文件全部从控制文件中删除,类似提示File #5 in the controlfile not found in data dictionary. Removing file from controlfile.

插入file$记录恢复

--通过dump文件头和相关block,找出来对应记录
插入被file$删除记录
SQL> insert into file$ values(5,2,16384,6,5,0,0,0,993135,null,20971522,null,null,null);
1 row created.
SQL> commit;
Commit complete.
SQL> select file# from file$;
     FILE#
----------
         1
         2
         3
         4
         5
         6
         7
7 rows selected.
--重启数据库,创建控制文件
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup nomount
ORACLE instance started.
Total System Global Area 2421825536 bytes
Fixed Size                  2215744 bytes
Variable Size            1828716736 bytes
Database Buffers          570425344 bytes
Redo Buffers               20467712 bytes
SQL> CREATE CONTROLFILE REUSE DATABASE "TEST" NORESETLOGS  NOARCHIVELOG
  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
    MAXDATAFILES 100
    MAXINSTANCES 8
  4    5    6      MAXLOGHISTORY 292
  7  LOGFILE
  GROUP 1 '/home/oracle/oradata/xifenfei/redo01.log'  SIZE 50M BLOCKSIZE 512,
  GROUP 2 '/home/oracle/oradata/xifenfei/redo02.log'  SIZE 50M BLOCKSIZE 512,
  8    9   10    GROUP 3 '/home/oracle/oradata/xifenfei/redo03.log'  SIZE 50M BLOCKSIZE 512
DATAFILE
 11   12    '/home/oracle/oradata/xifenfei/system01.dbf',
  '/home/oracle/oradata/xifenfei/sysaux01.dbf',
 13   14    '/home/oracle/oradata/xifenfei/undotbs01.dbf',
 15    '/home/oracle/oradata/xifenfei/users01.dbf',
  '/home/oracle/oradata/xifenfei/file01.dbf',
  '/home/oracle/oradata/xifenfei/file02.dbf',
 16   17   18    '/home/oracle/oradata/xifenfei/file03.dbf'
 19  CHARACTER SET AL32UTF8
; 20
Control file created.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 5 needs media recovery
ORA-01110: data file 5: '/home/oracle/oradata/xifenfei/file01.dbf'
SQL> recover database;
Media recovery complete.
SQL> alter database open;
Database altered.
SQL> select file# from file$;
     FILE#
----------
         1
         2
         3
         4
         5
         6
         7
7 rows selected.
SQL> select file# from v$datafile;
     FILE#
----------
         1
         2
         3
         4
         5
         6
         7
7 rows selected.
SQL> select count(*) from t_xifenfei;
  COUNT(*)
----------
    359475

alert日志正常,未提示删除控制文件中数据文件

alter database open
Fri Jun 02 23:56:52 2017
Thread 1 advanced to log sequence 10 (thread open)
Thread 1 opened at log sequence 10
  Current log# 1 seq# 10 mem# 0: /home/oracle/oradata/xifenfei/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Jun 02 23:56:52 2017
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 2.
Dictionary check beginning
Tablespace 'TEMP' #3 found in data dictionary,
but not in the controlfile. Adding to controlfile.
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
*********************************************************************
WARNING: The following temporary tablespaces contain no files.
         This condition can occur when a backup controlfile has
         been restored.  It may be necessary to add files to these
         tablespaces.  That can be done using the SQL statement:
         ALTER TABLESPACE <tablespace_name> ADD TEMPFILE
         Alternatively, if these temporary tablespaces are no longer
         needed, then they can be dropped.
           Empty temporary tablespace: TEMP
*********************************************************************
Database Characterset is AL32UTF8
No Resource Manager plan active
**********************************************************
WARNING: Files may exists in db_recovery_file_dest
that are not known to the database. Use the RMAN command
CATALOG RECOVERY AREA to re-catalog any such files.
If files cannot be cataloged, then manually delete them
using OS command.
One of the following events caused this:
1. A backup controlfile was restored.
2. A standby controlfile was restored.
3. The controlfile was re-created.
4. db_recovery_file_dest had previously been enabled and
   then disabled.
**********************************************************
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Fri Jun 02 23:56:53 2017
QMNC started with pid=20, OS id=12127
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Completed: alter database open

通过插入删除记录,重建控制文件,数据库恢复正常,而且相关记录也可以查询。

结论总结
1. 如果删除file$中的记录,而且控制文件未重建,数据库可以正常启动,而且可以查询数据
2. 如果删除file$中的记录,而且控制文件被重建,在数据库启动过程中,从被删除文件之后的所有文件记录从控制文件中删除(类似:File N in the controlfile not found in data dictionary. Removing file from controlfile.).可以在数据库open之后,插入被删除的记录,重建控制文件,数据会被恢复回来.如果数据库无法启动,需要通过其他方式进行恢复被删除记录

.wncry比特币勒索病毒恢复

以前也关注过各种比特币勒索,对于oracle数据库主要集中在pl/sql dev文件加密勒索,以前不管哪种勒索都没有这次的波及的范围只广,影响之大.就连天朝的公安网都严重感染,很多部门业务无法正常情况.
感染之后现象
btb
wncry


这里可以发现这次的比特币加密是有选择性的加密,不是所有文件都加密,而是根据文件后缀名来判断,然后给予加密勒索.
查看加密后文件
1
2


这次的故障和以往的加密勒索不一样,这次是整个文件全部加密,和以往的加密区别挺大的,因为是全文加密也给恢复带来了非常大的难度.

收到比特币
https://btc.com/12t9YDPgwueZ9NyMgw519p7AA8isjr6SMw
这个链表你可以发现.利索人收到大量比特币,一般不建议交比特币:1)助长了这种气焰,2)交钱也不一定可以解密(身边有失败的例子)
3


比较侥幸,我们虽然无法解密该加密文件,但是根据加密原理,在硬盘上运行过oracle(存放过oracle数据文件),那就在硬盘上面有痕迹,只要这个痕迹没有覆盖掉,我们就可以通过底层扫描block恢复出来其中的数据(类似:asm disk header 彻底损坏恢复).通过这个原理,我们今天顺利恢复了一个客户的数据库.如果这个方面无法自行恢复的,可以联系我们给予技术支持
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com
由于技术功底有限,目前我们只能对于数据库被加密勒索比特币的进行恢复,其他文件无法恢复.对于数据库,我们也需要评估现场之后才能确定是否可以恢复.

ora-600 2037 ORA-7445 kcbs_dump_adv_state解决

有客户系统断电,导致数据库无法启动,让我们帮忙解决,通过分析主要是ORA-600 2037和ORA-7445 _kcbs_dump_adv_state等错误,通过人工recover解决.
数据库报ORA-03113,无法启动成功

C:\Documents and Settings\Administrator>sqlplus / as sysdba
SQL*Plus: Release 10.2.0.1.0 - Production on 星期五 5月 12 09:50:36 2017
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
已连接到空闲例程。
SQL> startup
ORACLE 例程已经启动。
Total System Global Area 1258291200 bytes
Fixed Size                  1250548 bytes
Variable Size             218106636 bytes
Database Buffers         1031798784 bytes
Redo Buffers                7135232 bytes
数据库装载完毕。
ORA-03113: 通信通道的文件结束

分析alert日志

Fri May 12 09:50:43 2017
ALTER DATABASE OPEN
Fri May 12 09:50:43 2017
Beginning crash recovery of 1 threads
 parallel recovery started with 15 processes
Fri May 12 09:50:43 2017
Started redo scan
Fri May 12 09:50:43 2017
Completed redo scan
 1240 redo blocks read, 277 data blocks need recovery
Fri May 12 09:50:44 2017
Started redo application at
 Thread 1: logseq 5881, block 41179
Fri May 12 09:50:44 2017
Recovery of Online Redo Log: Thread 1 Group 1 Seq 5881 Reading mem 0
  Mem# 0 errs 0: E:\ORACLE\PRODUCT\10.2.0\ORADATA\xff\REDO01.LOG
Fri May 12 09:50:44 2017
Completed redo application
Fri May 12 09:50:44 2017
Errors in file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p006_6072.trc:
ORA-00600: internal error code, arguments: [6110], [193], [3], [], [], [], [], []
Fri May 12 09:50:44 2017
Hex dump of (file 3, block 14004) in trace file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p000_6024.trc
Corrupt block relative dba: 0x00c036b4 (file 3, block 14004)
Bad header found during crash/instance recovery
Data in bad block:
 type: 255 format: 7 rdba: 0x06010601
 last change scn: 0xa206.a2060601 seq: 0xb4 flg: 0x36
 spare1: 0x1 spare2: 0x6 spare3: 0x673
 consistency value in tail: 0x1b0a0708
 check value in block header: 0x36b4
 computed block checksum: 0xe4f5
Fri May 12 09:50:44 2017
Hex dump of (file 9, block 65507) in trace file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p003_6056.trc
Corrupt block relative dba: 0x0240ffe3 (file 9, block 65507)
Bad header found during crash/instance recovery
Data in bad block:
 type: 3 format: 6 rdba: 0x06020601
 last change scn: 0xa206.a2060602 seq: 0xe3 flg: 0xff
 spare1: 0x1 spare2: 0x6 spare3: 0x6dc
 consistency value in tail: 0xc1028001
 check value in block header: 0xffe3
 computed block checksum: 0xff01
Fri May 12 09:50:44 2017
Reread of rdba: 0x00c036b4 (file 3, block 14004) found different data
Fri May 12 09:50:44 2017
Reread of rdba: 0x0240ffe3 (file 9, block 65507) found different data
Fri May 12 09:50:44 2017
Errors in file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p005_6060.trc:
ORA-00600: internal error code,arguments:[2037],[17442602],[2718302723],[255],[9],[203],[657105414],[2147549568]
Fri May 12 09:50:44 2017
Errors in file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p000_6024.trc:
ORA-07445:exception encountered:core dump[ACCESS_VIOLATION][_kclcomplete+79][PC:0x72B0C7][ADDR:0x220][UNABLE_TO_READ][]
Fri May 12 09:50:44 2017
Errors in file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p006_6072.trc:
ORA-07445: exception encountered:core dump[ACCESS_VIOLATION][_kcbzdh+2496][PC:0x4A4928][ADDR:0xB][UNABLE_TO_READ][]
ORA-00600: internal error code, arguments: [6110], [193], [3], [], [], [], [], []
Errors in file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p012_6128.trc:
ORA-07445: exception encountered: core dump [ACCESS_VIOLATION] [_kcbs_dump_adv_state+723]
                                 [PC:0x5975A3] [ADDR:0xCBC0CBB2] [UNABLE_TO_READ] []
ORA-00600:internal error code,arguments:[2037],[17430318],[2718303745],[128],[1],[203],[4147028486],[2147549568]

错误比较明显由于坏块导致应用日志恢复异常,主要错误集中在ORA-600 2037,ORA-7445 _kcbs_dump_adv_state,ORA-7445_kcbzdh,ORA-7445 _kclcomplete等

dbv检查数据文件

E:\>dbv file=E:\ORACLE\PRODUCT\10.2.0\ORADATA\xff\SYSAUX01.DBF
DBVERIFY: Release 10.2.0.1.0 - Production on 星期五 5月 12 09:57:39 2017
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
DBVERIFY - 开始验证: FILE = E:\ORACLE\PRODUCT\10.2.0\ORADATA\xff\SYSAUX01.DBF
页 13353 标记为损坏
Corrupt block relative dba: 0x00c03429 (file 3, block 13353)
Bad header found during dbv:
Data in bad block:
 type: 1 format: 6 rdba: 0x3429a206
 last change scn: 0x066f.066f3429 seq: 0x0 flg: 0x00
 spare1: 0x6 spare2: 0xa2 spare3: 0x8c96
 consistency value in tail: 0x06018001
 check value in block header: 0x0
 block checksum disabled
页 14004 标记为损坏
Corrupt block relative dba: 0x00c036b4 (file 3, block 14004)
Bad header found during dbv:
Data in bad block:
 type: 1 format: 6 rdba: 0x36b4a206
 last change scn: 0x0673.067336b4 seq: 0x0 flg: 0x00
 spare1: 0x6 spare2: 0xa2 spare3: 0xfb97
 consistency value in tail: 0x06010210
 check value in block header: 0x0
 block checksum disabled
页 15261 标记为损坏
Corrupt block relative dba: 0x00c03b9d (file 3, block 15261)
Bad header found during dbv:
Data in bad block:
 type: 2 format: 6 rdba: 0x3b9da206
 last change scn: 0x0673.06733b9d seq: 0x0 flg: 0x00
 spare1: 0x6 spare2: 0xa2 spare3: 0x0
 consistency value in tail: 0x06018001
 check value in block header: 0x5549
 block checksum disabled
DBVERIFY - 验证完成
检查的页总数: 58880
处理的页总数 (数据): 19318
失败的页总数 (数据): 0
处理的页总数 (索引): 18610
失败的页总数 (索引): 0
处理的页总数 (其它): 13747
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 7202
标记为损坏的总页数: 3
流入的页总数: 0
最高块 SCN            : 178325323 (0.178325323)
E:\>dbv file=E:\ORACLE\PRODUCT\10.2.0\ORADATA\xff\xff_BSE02
DBVERIFY: Release 10.2.0.1.0 - Production on 星期五 5月 12 10:10:24 2017
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
DBVERIFY - 开始验证: FILE = E:\ORACLE\PRODUCT\10.2.0\ORADATA\xff\xff_BSE02
页 65507 标记为损坏
Corrupt block relative dba: 0x0240ffe3 (file 9, block 65507)
Bad header found during dbv:
Data in bad block:
 type: 2 format: 6 rdba: 0xffe3a206
 last change scn: 0x06dc.06dcffe3 seq: 0x0 flg: 0x00
 spare1: 0x6 spare2: 0xa2 spare3: 0xb32
 consistency value in tail: 0x060102ff
 check value in block header: 0x0
 block checksum disabled
DBVERIFY - 验证完成
检查的页总数: 1310720
处理的页总数 (数据): 34102
失败的页总数 (数据): 0
处理的页总数 (索引): 30270
失败的页总数 (索引): 0
处理的页总数 (其它): 10850
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 1235497
标记为损坏的总页数: 1
流入的页总数: 0
最高块 SCN            : 178325221 (0.178325221)

确实如alert日志报错,file 3和9 都出现坏块导致实例恢复无法进行。根据错误ORA-600 2037和ORA-7445 _kcbs_dump_adv_state,初步判断和During Startup (Open Database) Alert Log Shows ORA-600[2037] and ORA-7445[kcbs_dump_adv_state] (Doc ID 551993.1)文章描述相符(而且版本也相符)

尝试recover datafile部分file

E:\>sqlplus / as sysdba
SQL*Plus: Release 10.2.0.1.0 - Production on 星期五 5月 12 10:16:00 2017
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
连接到:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
SQL> recover datafile 1;
完成介质恢复。
SQL> recover datafile 2;
完成介质恢复。
SQL> recover datafile 3;
完成介质恢复。
SQL> recover datafile 4;
完成介质恢复。
SQL> recover datafile 9;
完成介质恢复。
SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [kcratr1_lastbwr], [], [], [], [], [], [], []

ORA-00600 kcratr1_lastbwr错误比较明显,见ORA-00600:[Kcratr1_lastbwr] During Database Startup after a Crash (Doc ID 393984.1)

通过recover database处理

SQL> recover database;
完成介质恢复。
SQL> alter database open;
数据库已更改。

然后通过查询dba_extents 处理坏块对象

补充ORA-600 2037错误

Format: ORA-600 [2037] [a] [b]  [d] [e] [f] [g]
VERSIONS:
  versions 8.0 and above
DESCRIPTION:
  During recovery we are examining a block to ensure that it is not
  corrupt prior to applying any change vectors.
  The block has failed this check and this exception is raised.
ARGUMENTS:
  Arg [a] Relative Data Block Address (RDBA) that the redo vector is for
  Arg [b] The Block format
  Arg {c} RDBA in the block itself
  Arg [d] The block type
  Arg [e] The sequence number
  Arg [f] Flags, if set
  Arg [g] The return value from the block head/tail checker.

asm磁盘分区丢失恢复

有朋友反馈,他们做了xx存储的双活之后,重启主机发现gi无法正常启动,分析发现所有该存储的磁盘分区信息丢失,导致asmlib无法发现磁盘(使用分区做asm disk)
类似如下错误(磁盘分区丢失)

--fdisk -l 显示部分结果
Disk /dev/mapper/datahds1: 1099.5 GB, 1099511627776 bytes
255 heads, 63 sectors/track, 133674 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
--ls -l /dev/mapper/   显示结果无分区信息
lrwxrwxrwx 1 root root      7 May  6 03:44 datahds1 -> ../dm-1
lrwxrwxrwx 1 root root      7 May  6 03:26 datahds2 -> ../dm-3
lrwxrwxrwx 1 root root      7 May  6 03:26 datahds3 -> ../dm-8
lrwxrwxrwx 1 root root      7 May  6 03:26 ocrhds1 -> ../dm-0
lrwxrwxrwx 1 root root      7 May  6 03:26 ocrhds2 -> ../dm-2
lrwxrwxrwx 1 root root      7 May  6 03:26 ocrhds3 -> ../dm-4

asm日志显示

SUCCESS: diskgroup DATADG was mounted
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 3
SUCCESS: diskgroup OCRHDS was mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA"

分析系统日志

May  6 02:23:27 db2 kernel: sdb: unknown partition table
May  6 02:23:27 db2 kernel: sde: unknown partition table
May  6 02:23:27 db2 kernel: sdc: unknown partition table
May  6 02:23:27 db2 kernel: sdf: unknown partition table
May  6 02:23:27 db2 kernel: sdd: unknown partition table
May  6 02:23:27 db2 kernel: sdj:Dev sdj: unable to read RDB block 0
May  6 02:23:27 db2 kernel: unable to read partition table
May  6 02:23:27 db2 kernel: sdi: sdi1
May  6 02:23:27 db2 kernel: sdk: sdk1
May  6 02:23:27 db2 kernel: sdg: unknown partition table
May  6 02:23:27 db2 kernel: sdl: sdl1
May  6 02:23:27 db2 kernel: sdm:Dev sdm: unable to read RDB block 0
May  6 02:23:27 db2 kernel: unable to read partition table
May  6 02:23:27 db2 kernel: sdo:Dev sdo: unable to read RDB block 0
May  6 02:23:27 db2 kernel: unable to read partition table
May  6 02:23:27 db2 kernel: sdn:Dev sdn: unable to read RDB block 0
May  6 02:23:27 db2 kernel: unable to read partition table
May  6 02:23:27 db2 kernel: sdp:Dev sdp: unable to read RDB block 0
May  6 02:23:27 db2 kernel: unable to read partition table
May  6 02:23:27 db2 kernel: sds:Dev sds: unable to read RDB block 0
May  6 02:23:27 db2 kernel: unable to read partition table
May  6 02:23:27 db2 kernel: sdh:
May  6 02:23:27 db2 kernel: sdt: sdt1
May  6 02:23:27 db2 kernel: sdv:Dev sdv: unable to read RDB block 0
May  6 02:23:27 db2 kernel: unable to read partition table
May  6 02:23:27 db2 kernel: sdq:Dev sdq: unable to read RDB block 0
May  6 02:23:27 db2 kernel: unable to read partition table
May  6 02:23:27 db2 kernel: sd 1:0:1:9: [sdr] Very big device. Trying to use READ CAPACITY(16).
May  6 02:23:27 db2 kernel: sdr:Dev sdr: unable to read RDB block 0
May  6 02:23:27 db2 kernel: unable to read partition table
May  6 02:23:27 db2 kernel: sd 2:0:0:9: [sdab] Very big device. Trying to use READ CAPACITY(16).
May  6 02:23:27 db2 kernel: sdab: unknown partition table
May  6 02:23:27 db2 kernel: sdac: unknown partition table
May  6 02:23:27 db2 kernel: sdw: sdw1
May  6 02:23:27 db2 kernel: sdu:Dev sdu: unable to read RDB block 0
May  6 02:23:27 db2 kernel: unable to read partition table
May  6 02:23:27 db2 kernel: sdx: sdx1
May  6 02:23:27 db2 kernel: sdy: sdy1
May  6 02:23:27 db2 kernel: sdaa: sdaa1
May  6 02:23:27 db2 kernel: sdz: sdz1
May  6 02:23:27 db2 kernel: sdae: unknown partition table
May  6 02:23:27 db2 kernel: sdaf: unknown partition table
May  6 02:23:27 db2 kernel: sdag: unknown partition table
May  6 02:23:27 db2 kernel: sdai:
May  6 02:23:27 db2 kernel: sdah: unknown partition table
May  6 02:23:27 db2 kernel: sdad: unknown partition table
May  6 02:23:28 db2 mcelog: failed to prefill DIMM database from DMI data

这里错误比较明显unknown partition table,磁盘的分区信息损坏.使用fdisk无法发现分区

partprobe也无效

[root@db2 oracle]# partprobe /dev/mapper/ocrhds3
[root@db2 oracle]#
[root@db2 oracle]# ls -l /dev/mapper/ocrhds3*
lrwxrwxrwx 1 root root 7 May  6 07:30 /dev/mapper/ocrhds3 -> ../dm-4

从尚需信息看,磁盘的分区表信息应该已经损坏,现在能够做的,就是希望运气好,磁盘的分区的实际数据没有损坏

分析磁盘实际分区数据

[root@db2 ~]$ dd if=/dev/mapper/datahds1 of=/tmp/datahds1.dd bs=1024k count=50
[root@db2 ~]$ dd if=/tmp/datahds1.dd of=/tmp/xff01.dd  bs=3225 skip=1
[grid@db2 ~]$ kfed read /tmp/xff01.dd |more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  3110278718 ; 0x00c: 0xb963163e
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr: ORCLDISKHDSDATA1 ; 0x000: length=16
kfdhdb.driver.reserved[0]:   1146307656 ; 0x008: 0x44534448
kfdhdb.driver.reserved[1]:    826364993 ; 0x00c: 0x31415441
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                186646528 ; 0x020: 0x0b200000
kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:             DATADG_0000 ; 0x028: length=11
kfdhdb.grpname:                  DATADG ; 0x048: length=6
kfdhdb.fgname:              DATADG_0000 ; 0x068: length=11
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             33050696 ; 0x0a8: HOUR=0x8 DAYS=0x2 MNTH=0x4 YEAR=0x7e1
kfdhdb.crestmp.lo:           3813740544 ; 0x0ac: USEC=0x0 MSEC=0x44 SECS=0x35 MINS=0x38
kfdhdb.mntstmp.hi:             33050701 ; 0x0b0: HOUR=0xd DAYS=0x2 MNTH=0x4 YEAR=0x7e1
kfdhdb.mntstmp.lo:            411385856 ; 0x0b4: USEC=0x0 MSEC=0x150 SECS=0x8 MINS=0x6

通过上述分析,我们可以初步判断,分区磁盘的信息很可能是好的(因为asm disk header是好的,根据一般的规则从前往后覆盖,既然header是好的,后面的block被覆盖的概率非常小)

通过准备新磁盘直接把磁盘分区dd到新设备上

dd if=/dev/mapper/ocrhds1 of=/dev/mapper/ocrhdsnew1 skip=1 bs=3225
dd if=/dev/mapper/ocrhds2 of=/dev/mapper/ocrhdsnew2 skip=1 bs=3225
dd if=/dev/mapper/ocrhds3 of=/dev/mapper/ocrhdsnew3 skip=1 bs=3225
dd if=/dev/mapper/datahds1 of=/dev/mapper/datahdsnew1 skip=1 bs=3225
dd if=/dev/mapper/datahds2 of=/dev/mapper/datahdsnew2 skip=1 bs=3225
dd if=/dev/mapper/datahds3 of=/dev/mapper/datahdsnew3 skip=1 bs=3225

asmlib重新扫描磁盘

[root@db1 disks]# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...
Instantiating disk "HDSOCR3"
Instantiating disk "HDSDATA2"
Instantiating disk "HDSDATA1"
Instantiating disk "HDSDATA3"
Instantiating disk "HDSOCR1"
Instantiating disk "HDSOCR2"
[root@db1 disks]# ls -ltr
total 0
brw-rw---- 1 grid asmadmin  8, 160 May  6 13:49 HDSOCR3
brw-rw---- 1 grid asmadmin  8, 192 May  6 13:49 HDSDATA2
brw-rw---- 1 grid asmadmin  8, 176 May  6 13:49 HDSDATA1
brw-rw---- 1 grid asmadmin  8, 208 May  6 13:49 HDSDATA3
brw-rw---- 1 grid asmadmin  8, 128 May  6 13:49 HDSOCR1
brw-rw---- 1 grid asmadmin  8, 144 May  6 13:49 HDSOCR2

kfed验证拷贝的分区

[root@db2 tmp]# /oracle/app/11.2.0/grid_1/bin/kfed read /dev/oracleasm/disks/HDSDATA1
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  3110278718 ; 0x00c: 0xb963163e
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr: ORCLDISKHDSDATA1 ; 0x000: length=16
kfdhdb.driver.reserved[0]:   1146307656 ; 0x008: 0x44534448
kfdhdb.driver.reserved[1]:    826364993 ; 0x00c: 0x31415441
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                186646528 ; 0x020: 0x0b200000
kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:             DATADG_0000 ; 0x028: length=11
kfdhdb.grpname:                  DATADG ; 0x048: length=6
kfdhdb.fgname:              DATADG_0000 ; 0x068: length=11
kfdhdb.capname:                         ; 0x088: length=0

asm和数据库启动正常

[grid@db2 ~]$ asmcmd
ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576   3145710  2378034                0         2378034              0             N  DATADG/
MOUNTED  NORMAL  N         512   4096  1048576     15342    14416             5114            4651              0             Y  OCRHDS/
ASMCMD>
[oracle@db2 ~]$ sqlplus  / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Sat May 6 13:54:21 2017
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 3.6077E+10 bytes
Fixed Size                  2260648 bytes
Variable Size            7247757656 bytes
Database Buffers         2.8723E+10 bytes
Redo Buffers              104382464 bytes
Database mounted.
Database opened.
SQL>

asm-disk-partition-lost-recovery


通过上述恢复,实现asm磁盘分区丢失数据0丢失
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com