2018年5月 – Database SOS

ORA-600 17182导致oracle异常

正常运行的数据库突然爆ORA-00600 17182,然后直接crash,以前遇到过类似的case:分享一例由于主库逻辑坏块导致dataguard容灾失效,这又是一例数据库正常crash之后无法启动成功的case

Tue May 22 08:32:12 2018
Archived Log entry 84344 added for thread 1 sequence 90196 ID 0x103430df dest 1:
Tue May 22 09:05:42 2018
Thread 1 cannot allocate new log, sequence 90198
Private strand flush not complete
  Current log# 4 seq# 90197 mem# 0: +DATA/xifenfei/onlinelog/group_4.279.887464919
Thread 1 advanced to log sequence 90198 (LGWR switch)
  Current log# 2 seq# 90198 mem# 0: +DATA/xifenfei/onlinelog/group_2.263.887465041
Tue May 22 09:05:46 2018
Archived Log entry 84345 added for thread 1 sequence 90197 ID 0x103430df dest 1:
Tue May 22 09:07:42 2018
Errors in file /u01/app/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_16297.trc  (incident=592822):
ORA-00600: 内部错误代码, 参数: [17182], [0x7FE274EADBF8], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/diag/rdbms/xifenfei/xifenfei/incident/incdir_592822/xifenfei_ora_16297_i592822.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue May 22 09:07:45 2018
Dumping diagnostic data in directory=[cdmp_20180522090745], requested by (instance=1, osid=16297), summary=[incident=592822].
Tue May 22 09:07:46 2018
Sweep [inc][592822]: completed
Sweep [inc2][592822]: completed
Tue May 22 09:08:29 2018
Errors in file /u01/app/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_16297.trc  (incident=592824):
ORA-07445: 出现异常错误: 核心转储 [kghrcdepth()+168] [SIGSEGV] [ADDR:0x7FE2766ADD04] [PC:0x2C2B886] [Invalid permissions for mapped object] []
ORA-00600: 内部错误代码, 参数: [kghrcdepth:ds], [0x7FE274EADBE8], [], [], [], [], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [17182], [0x7FE274EADBF8], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/diag/rdbms/xifenfei/xifenfei/incident/incdir_592824/xifenfei_ora_16297_i592824.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue May 22 09:08:31 2018
Dumping diagnostic data in directory=[cdmp_20180522090831], requested by (instance=1, osid=16297), summary=[incident=592823].
Tue May 22 09:08:44 2018
Block recovery from logseq 90198, block 37639 to scn 161030804187
Recovery of Online Redo Log: Thread 1 Group 2 Seq 90198 Reading mem 0
  Mem# 0: +DATA/xifenfei/onlinelog/group_2.263.887465041
Block recovery completed at rba 90198.97219.16, scn 37.2117014236
Errors in file /u01/app/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_pmon_7690.trc  (incident=592118):
ORA-00600: internal error code, arguments: [17182], [0x7F96BDA2AA70], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/diag/rdbms/xifenfei/xifenfei/incident/incdir_592118/xifenfei_pmon_7690_i592118.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x97E6B5C, kghpmfal()+216] [flags: 0x0, count: 1]
Errors in file /u01/app/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_pmon_7690.trc  (incident=592119):
ORA-07445: exception encountered: core dump [kghpmfal()+216] [SIGSEGV] [ADDR:0x0] [PC:0x97E6B5C] [SI_KERNEL(general_protection)] []
ORA-00600: internal error code, arguments: [17182], [0x7F96BDA2AA70], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/diag/rdbms/xifenfei/xifenfei/incident/incdir_592119/xifenfei_pmon_7690_i592119.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue May 22 09:08:45 2018
Dumping diagnostic data in directory=[cdmp_20180522090845], requested by (instance=1, osid=7690 (PMON)), summary=[incident=592118].
Tue May 22 09:08:47 2018
Sweep [inc][592824]: completed
Sweep [inc][592823]: completed
Sweep [inc][592119]: completed
Sweep [inc][592118]: completed
Sweep [inc2][592824]: completed
Sweep [inc2][592119]: completed
Sweep [inc2][592118]: completed
Tue May 22 09:08:47 2018
ARC2 (ospid: 7834): terminating the instance due to error 472
Instance terminated by ARC2, pid = 7834

无法正常open

Completed: ALTER DATABASE   MOUNT
Tue May 22 09:26:44 2018
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 15 processes
Started redo scan
Completed redo scan
 read 12232 KB redo, 4787 data blocks need recovery
Started redo application at
 Thread 1: logseq 90199, block 233846
Recovery of Online Redo Log: Thread 1 Group 3 Seq 90199 Reading mem 0
  Mem# 0: +DATA/xifenfei/onlinelog/group_3.262.887465049
Completed redo application of 10.34MB
Completed crash recovery at
 Thread 1: logseq 90199, block 258311, scn 161030851622
 4787 data blocks read, 4787 data blocks written, 12232 redo k-bytes read
LGWR: STARTING ARCH PROCESSES
Tue May 22 09:26:45 2018
ARC0 started with pid=48, OS id=18632
Tue May 22 09:26:46 2018
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Thread 1 advanced to log sequence 90200 (thread open)
Tue May 22 09:26:46 2018
ARC1 started with pid=49, OS id=18636
Tue May 22 09:26:46 2018
ARC2 started with pid=50, OS id=18640
Tue May 22 09:26:46 2018
ARC3 started with pid=51, OS id=18644
ARC1: Archival started
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Thread 1 opened at log sequence 90200
  Current log# 5 seq# 90200 mem# 0: +DATA/xifenfei/onlinelog/group_5.280.887465135
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue May 22 09:26:46 2018
SMON: enabling cache recovery
[18512] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:2704839736 end:2704839986 diff:250 (2 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is AL32UTF8
Archived Log entry 84347 added for thread 1 sequence 90199 ID 0x103430df dest 1:
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Tue May 22 09:26:47 2018
Errors in file /u01/app/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_p019_18664.trc  (incident=624628):
ORA-00600: internal error code, arguments: [17182], [0x7F7A4A50DBF8], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/diag/rdbms/xifenfei/xifenfei/incident/incdir_624628/xifenfei_p019_18664_i624628.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Starting background process QMNC
Tue May 22 09:26:48 2018
QMNC started with pid=71, OS id=18737
Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x97ECF6C, kghalo()+570] [flags: 0x0, count: 1]
Errors in file /u01/app/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_p019_18664.trc  (incident=624629):
ORA-00600: internal error code, arguments: [17182], [0x7F7A4A50DBF8], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/diag/rdbms/xifenfei/xifenfei/incident/incdir_624629/xifenfei_p019_18664_i624629.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue May 22 09:26:49 2018
Tue May 22 09:26:50 2018
Starting background process EMNC
Tue May 22 09:26:50 2018
EMNC started with pid=76, OS id=18814
Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x97ECF6C, kghalo()+570] [flags: 0x0, count: 2]
Completed: ALTER DATABASE OPEN
Errors in file /u01/app/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_p019_18664.trc  (incident=624630):
ORA-07445: exception encountered: core dump [kghalo()+570] [SIGSEGV] [ADDR:0x0] [PC:0x97ECF6C] [SI_KERNEL(general_protection)] []
ORA-07445: exception encountered: core dump [kghalo()+570] [SIGSEGV] [ADDR:0x0] [PC:0x97ECF6C] [SI_KERNEL(general_protection)] []
ORA-00600: internal error code, arguments: [17182], [0x7F7A4A50DBF8], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/diag/rdbms/xifenfei/xifenfei/incident/incdir_624630/xifenfei_p019_18664_i624630.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue May 22 09:27:01 2018
Block recovery from logseq 90200, block 59 to scn 161030851961
Recovery of Online Redo Log: Thread 1 Group 5 Seq 90200 Reading mem 0
  Mem# 0: +DATA/xifenfei/onlinelog/group_5.280.887465135
Block recovery stopped at EOT rba 90200.935.16
Block recovery completed at rba 90200.935.16, scn 37.2117062009
Starting background process CJQ0
Tue May 22 09:27:01 2018
SMON: slave died unexpectedly, downgrading to serial recovery
Tue May 22 09:27:01 2018
CJQ0 started with pid=56, OS id=18922
Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x9823AA3, kghalo()+567] [flags: 0x0, count: 1]
Errors in file /u01/app/diag/rdbms/posdg/posdg/trace/posdg_p019_11656.trc  (incident=1136658):
ORA-07445: exception encountered: core dump [kghalo()+567] [SIGSEGV] [ADDR:0x0] [PC:0x9823AA3] [SI_KERNEL(general_protection)] []
ORA-00600: internal error code, arguments: [17182], [0x7F813F61DBF8], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/diag/rdbms/posdg/posdg/incident/incdir_1136658/posdg_p019_11656_i1136658.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/diag/rdbms/posdg/posdg/trace/posdg_ora_10925.trc:
ORA-00600: internal error code, arguments: [2252], [49410], [2147581953], [3726], [1009467392], [], [], [], [], [], [], []
Errors in file /u01/app/diag/rdbms/posdg/posdg/trace/posdg_ora_10925.trc:
ORA-00600: internal error code, arguments: [2252], [49410], [2147581953], [3726], [1009467392], [], [], [], [], [], [], []
Errors in file /u01/app/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_smon_18434.trc  (incident=624292):
ORA-00600: internal error code, arguments: [17182], [0x7F7488BDD7A0], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/diag/rdbms/xifenfei/xifenfei/incident/incdir_624292/xifenfei_smon_18434_i624292.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x97E64D7, kghalf()+537] [flags: 0x0, count: 1]
Errors in file /u01/app/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_smon_18434.trc  (incident=624293):
ORA-07445: exception encountered: core dump [kghalf()+537] [SIGSEGV] [ADDR:0x0] [PC:0x97E64D7] [SI_KERNEL(general_protection)] []
ORA-00600: internal error code, arguments: [17182], [0x7F7488BDD7A0], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/diag/rdbms/xifenfei/xifenfei/incident/incdir_624293/xifenfei_smon_18434_i624293.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue May 22 09:27:03 2018
Dumping diagnostic data in directory=[cdmp_20180522092703], requested by (instance=1, osid=18434 (SMON)), summary=[incident=624292].
PMON (ospid: 18383): terminating the instance due to error 474
Tue May 22 09:27:05 2018
opiodr aborting process unknown ospid (18839) as a result of ORA-1092
System state dump requested by (instance=1, osid=18383 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_diag_18402_20180522092705.trc
Tue May 22 09:27:05 2018
ORA-1092 : opitsk aborting process
Instance terminated by PMON, pid = 18383

通过对于启动过程的观察,比较明显,由于数据库无法正常回滚,导致smon进程异常,从而使得数据库无法启动成功.恢复方法比较简单,就是对异常事务进行提交或者跳过即可

truncate IDL_UB1$恢复

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：truncate IDL_UB1$恢复

世界之大无奇不有,已经记不清这是第几个客户咨询IDL_UB1$ 被truncate之后导致数据库无法启动的case了.idl_ub1$表是用来存储PL/SQL的代码单元的，包括DIANA等，IDL在这里代表Interface Definition Language. 在数据库的启动过程中通过10046跟踪可以知道,有类似:select /*+ index(idl_ub1$ i_idl_ub11) +*/ piece#,length,piece from idl_ub1$ where obj#=:1 and part=:2 and version=:3 order by piece#的查询语句,由于该表被truncate之后,导致数据库启动无法绕过该sql,而hang住无法完全open成功.下午闲着没事通过模拟,可以对该故障实现正常open,并且导出数据
模拟业务数据

create user xff identified by oracle;
grant dba to xff;
conn xff/oracle
create table t_xifenfei as select * from dba_objects;
create index i_xifenfei on t_xifenfei(object_id);
create view v_xifenfei as select * from t_xifenfei;
create or replace procedure proc1(
para1 varchar2,
para2 out varchar2,
para3 in out varchar2
) as
v_name varchar2(20);
begin
 v_name :='xifenfei';
 para3 := v_name;
dbms_output.put_line('para3:'||para3);
end;
/
alter system checkpoint;

创建xff账户,并且创建有代表性的表,索引,存储过程,视图等.

模拟truncate IDL_UB1$表

SQL> conn / as sysdba
Connected.
SQL> truncate table IDL_UB1$;
truncate table IDL_UB1$
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 31325
Session ID: 177 Serial number: 7

重启数据库

SQL> conn / as sysdba
Connected.
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>
SQL>
SQL>
SQL>
SQL> startup
ORACLE instance started.
Total System Global Area 7499329536 bytes
Fixed Size                  2267832 bytes
Variable Size            1409287496 bytes
Database Buffers         6073352192 bytes
Redo Buffers               14422016 bytes
Database mounted.

数据库在mount之后,一直处于hang住状态,查看alert日志

Sun May 20 17:02:34 2018
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x18] [PC:0x98D94A7, hshuid()+273] [flags: 0x0, count: 1]
Errors in file /home/u01/app/oracle/diag/rdbms/test/test/trace/test_ora_31325.trc  (incident=21781):
ORA-07445: exception encountered: core dump [hshuid()+273] [SIGSEGV] [ADDR:0x18] [PC:0x98D94A7] [Address not mapped to object] []
Incident details in: /home/u01/app/oracle/diag/rdbms/test/test/incident/incdir_21781/test_ora_31325_i21781.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sun May 20 17:02:37 2018
Sweep [inc][21781]: completed
Sweep [inc2][21781]: completed
Sun May 20 17:02:37 2018
Dumping diagnostic data in directory=[cdmp_20180520170237], requested by (instance=1, osid=31325), summary=[incident=21781].
Sun May 20 17:02:55 2018
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x18] [PC:0x98D94A7, hshuid()+273] [flags: 0x0, count: 1]
Errors in file /home/u01/app/oracle/diag/rdbms/test/test/trace/test_m000_31373.trc  (incident=21821):
ORA-07445: exception encountered: core dump [hshuid()+273] [SIGSEGV] [ADDR:0x18] [PC:0x98D94A7] [Address not mapped to object] []
Incident details in: /home/u01/app/oracle/diag/rdbms/test/test/incident/incdir_21821/test_m000_31373_i21821.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sun May 20 17:02:56 2018
Dumping diagnostic data in directory=[cdmp_20180520170256], requested by (instance=1, osid=31373 (M000)), summary=[incident=21821].

这类问题比较明显,正常方法无法打开,通过工具分析system文件,发现虽然truncate IDL_UB1$操作报错了,但是IDL_UB1$和对应的index I_IDL_UB11 obj#,dataobj#均已经改变,而且相关对象的segment header也变化为新dataobj#(truncate之后的),也就是说truncate在数据库中的主要更改操作已经完成.现在在缺少记录的情况下,数据库执行如下sql无法获取记录,从而无法open

PARSING IN CURSOR #140342551421712 len=132 dep=2 uid=0 oct=3 lid=0 tim=1526843464635335 hv=4260389146 ad='21af73218' sqlid='cvn54b7yz0s8u'
select /*+ index(idl_ub1$ i_idl_ub11) +*/ piece#,length,piece from idl_ub1$ where obj#=:1 and part=:2 and version=:3 order by piece#
END OF STMT
PARSE #140342551421712:c=0,e=14,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,plh=3246118364,tim=1526843464635335
BINDS #140342551421712:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fa40bec7d80  bln=22  avl=03  flg=05
  value=1310
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fa40bec7d50  bln=24  avl=01  flg=05
  value=0
 Bind#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fa40bec7d20  bln=24  avl=06  flg=05
  value=184549376
EXEC #140342551421712:c=0,e=76,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,plh=3246118364,tim=1526843464635449
FETCH #140342551421712:c=0,e=4,p=0,cr=1,cu=0,mis=0,r=0,dep=2,og=4,plh=3246118364,tim=1526843464635462
STAT #140342551421712 id=1 cnt=0 pid=0 pos=1 obj=225 op='TABLE ACCESS BY INDEX ROWID IDL_UB1$ (cr=1 pr=0 pw=0 time=4 us cost=3 size=44 card=2)'
STAT #140342551421712 id=2 cnt=0 pid=1 pos=1 obj=236 op='INDEX RANGE SCAN I_IDL_UB11 (cr=1 pr=0 pw=0 time=4 us cost=2 size=0 card=2)'
CLOSE #140342551421712:c=0,e=2,dep=2,type=0,tim=1526843464635496

通过对数据库采用技术欺骗手段,让数据库启动相关sql能够获取到记录(和正常查询的相同),从而实现数据库正常open

SQL> startup mount
ORACLE instance started.
Total System Global Area 7499329536 bytes
Fixed Size                  2267832 bytes
Variable Size            1409287496 bytes
Database Buffers         6073352192 bytes
Redo Buffers               14422016 bytes
Database mounted.
SQL> alter database open;
Database altered.

open成功之后,后台报大量的ORA-08103: object no longer exists,通过分析是由于truncate IDL_UB1$没有完全成功,导致出现该错误.解决方法就是对Oracle数据字典进行人工更新,把没有完成的truncate操作在数据库中给予完成.
导出数据
exp导出数据成功

但是expdp无法执行

[oracle@bogon oradata]$ expdp '"/ as sysdba"' schemas=xff file=1.dmp
Export: Release 11.2.0.4.0 - Production on Sun May 20 17:10:53 2018
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
UDE-31623: operation generated ORACLE error 31623
ORA-31623: a job is not attached to this session via the specified handle
ORA-06512: at "SYS.DBMS_DATAPUMP", line 3326
ORA-06512: at "SYS.DBMS_DATAPUMP", line 4551
ORA-06512: at line 1

暂时未去研究对这个表进行重建,使用exp导出然后再imp导入是比较理想的办法

恶意删除bootstrap$导致数据库无法正常启动

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：恶意删除bootstrap$导致数据库无法正常启动

有客户10.2.0.5的数据库关闭之后,无法正常启动报ORA-00704 ORA-00702错误.

Fri May 18 22:42:26  2018
ALTER DATABASE OPEN
Fri May 18 22:42:27  2018
Beginning crash recovery of 1 threads
 parallel recovery started with 7 processes
Fri May 18 22:42:27  2018
Started redo scan
Fri May 18 22:42:27  2018
Completed redo scan
 1 redo blocks read, 0 data blocks need recovery
Fri May 18 22:42:27  2018
Started redo application at
 Thread 1: logseq 2, block 2, scn 8448162573
Fri May 18 22:42:27  2018
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: D:\DATABASE\xifenfei\REDO02.LOG
Fri May 18 22:42:27  2018
Completed redo application
Fri May 18 22:42:27  2018
Completed crash recovery at
 Thread 1: logseq 2, block 3, scn 8448182575
 0 data blocks read, 0 data blocks written, 1 redo blocks read
Fri May 18 22:42:28  2018
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=23, OS id=3188
ARC1 started with pid=24, OS id=3168
ARC2 started with pid=25, OS id=996
ARC3 started with pid=26, OS id=432
ARC4 started with pid=27, OS id=3728
Fri May 18 22:42:28  2018
ARC0: Archival started
ARC1: Archival started
ARC5 started with pid=28, OS id=2876
Fri May 18 22:42:28  2018
ARC2: Archival started
ARC3: Archival started
ARC4: Archival started
ARC5: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
Fri May 18 22:42:28  2018
Thread 1 advanced to log sequence 3 (thread open)
Thread 1 opened at log sequence 3
  Current log# 3 seq# 3 mem# 0: D:\DATABASE\xifenfei\REDO03.LOG
Successful open of redo thread 1
Fri May 18 22:42:28  2018
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri May 18 22:42:28  2018
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Fri May 18 22:42:28  2018
ARC2: Becoming the heartbeat ARCH
Fri May 18 22:42:28  2018
SMON: enabling cache recovery
Fri May 18 22:42:28  2018
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\udump\xifenfei_ora_3148.trc:
ORA-00704: 引导程序进程失败
ORA-00702: 引导程序版本 '' 与版本 '8.0.0.0.0' 不一致
Fri May 18 22:42:28  2018
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 3148
ORA-1092 signalled during: ALTER DATABASE OPEN...

根据以前恢复经验ORA-00702: bootstrap verison ” inconsistent with version ’8.0.0.0.0′,很可能是由于bootstrap$表异常了.
通过dbv检查system文件确认没有坏块

通过bbed分析,确认记录被删除
把数据文件拷贝到本地,通过bbed进行分析,确认记录丢失

BBED> map
 File: d:/system01.dbf (0)
 Block: 379                                   Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Table/Cluster)
 struct kcbh, 20 bytes                      @0
 struct ktbbh, 48 bytes                     @20
 struct kdbh, 14 bytes                      @68
 struct kdbt[1], 4 bytes                    @82
 sb2 kdbr[24]                               @86
 ub1 freespace[1158]                        @134
 ub1 rowdata[6896]                          @1292
 ub4 tailchk                                @8188
BBED> p *kdbr[0]
rowdata[6875]
-------------
ub1 rowdata[6875]                           @8167     0x3c
BBED> x /rnnc
rowdata[6875]                               @8167
-------------
flag@8167: 0x3c (KDRHFL, KDRHFF, KDRHFD, KDRHFH)
lock@8168: 0x01
cols@8169:    0

故障原因跟踪
有人在数据库中注入了恶意脚本，导致数据库删除了bootstrap$中数据,关闭之后无法正常启动

处理方法
通过oracle bbed 修复数据字典,正常启动数据库

使用_unnest_subquery优化sql

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：使用_unnest_subquery优化sql

一个复杂的sql查询,使用了大量EXISTS和NOT EXISTS 关联导致sql执行效率低下,这里挑选出来最核心的部分进行演示

SQL> explain plan for   select
  2   a.aab034, a.aac001
  3    from si_dp.ac01_ac02 a
  4   where exists (select 1
  5            from ic40
  6           where aac001 = a.aac001
  7             and aae045 <= '201803'
  8             and aae120 = '0')
  9     and not exists (select 1
 10            from ic15
 11           where aac001 = a.aac001
 12             and aae002 <= '201803')
 13     and not EXISTS (select aab001
 14            from ab01
 15           where aab019 in ('91', '93')
 16             AND aab001 = a.aab001)
 17    and exists (select 1
 18            from ac13
 19           where aac001 = a.aac001
 20             and aae140 = '11'
 21             and aae114 in ('0', '1')
 22             and aae002 <= '201803')
 23     AND EXISTS (SELECT 1
 24            FROM AC13
 25           WHERE AAC001 = A.AAC001
 26             and aae140 = '11'
 27             AND AAE143 = '02'
 28             AND AAE003 < '201707'
 29             AND AAE002 BETWEEN '201801' AND '201803'
 30             and aae114 = '1')
 31     AND not EXISTS (SELECT 1
 32         FROM AC13
 33           WHERE AAC001 = A.AAC001
 34             and aae140 = '11'
 35          AND AAE002 < '201801')
 36     AND not EXISTS (SELECT 1
 37            FROM ac02
 38           WHERE AAC001 = A.AAC001
 39             and aae140 = '11'
 40             AND AAE036 < date '2018-1-1');
Explained.
Elapsed: 00:00:00.36
SQL> select * from table (dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name               | Rows  | Bytes |TempSpc| Cost (%CPU)|
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                    |     1 |   202 |       | 11172   (2)|
|   1 |  NESTED LOOPS SEMI                |                    |     1 |   202 |       | 11172   (2)|
|   2 |   NESTED LOOPS ANTI               |                    |     1 |   175 |       | 11168   (2)|
|   3 |    NESTED LOOPS SEMI              |                    |     1 |   150 |       | 11164   (2)|
|   4 |     NESTED LOOPS ANTI             |                    |     1 |   126 |       | 11160   (2)|
|   5 |      NESTED LOOPS SEMI            |                    |     1 |   104 |       | 11158   (2)|
|   6 |       NESTED LOOPS ANTI           |                    |     1 |    67 |       | 11145   (2)|
|   7 |        HASH JOIN ANTI             |                    |     1 |    50 |  8640K| 11143   (2)|
|   8 |         TABLE ACCESS FULL         | AC01_AC02          |   245K|  5755K|       |   356   (2)|
|   9 |         TABLE ACCESS FULL         | AC02               |   559K|    13M|       |  9346   (2)|
|  10 |        TABLE ACCESS BY INDEX ROWID| AB01               |     2 |    34 |       |     2   (0)|
|  11 |         INDEX UNIQUE SCAN         | PK_AB01            |     1 |       |       |     1   (0)|
|  12 |       TABLE ACCESS BY INDEX ROWID | AC13               |   325K|    11M|       |    13   (0)|
|  13 |        INDEX RANGE SCAN           | I_AC13_AAE143      |   446 |       |       |     4   (0)|
|  14 |      INDEX RANGE SCAN             | PK_IC15            |  1771K|    37M|       |     2   (0)|
|  15 |     TABLE ACCESS BY INDEX ROWID   | IC40               |    17M|   395M|       |     4   (0)|
|  16 |      INDEX RANGE SCAN             | PK_IC40            |     1 |       |       |     3   (0)|
|  17 |    TABLE ACCESS BY INDEX ROWID    | AC13               |    51M|  1236M|       |     4   (0)|
|  18 |     INDEX RANGE SCAN              | RELATION_233112_FK |     3 |       |       |     3   (0)|
|  19 |   TABLE ACCESS BY INDEX ROWID     | AC13               |    52M|  1350M|       |     4   (0)|
|  20 |    INDEX RANGE SCAN               | RELATION_233112_FK |     3 |       |       |     3   (0)|
-----------------------------------------------------------------------------------------------------

这条sql,在一个10.2.0.3的系统中执行了十几个小时无法出结果,开发商反馈,该大部分客户的11.2的环境中,大概十几分钟出结果.从来没有遇到此类情况.让我们给他优化sql.看到这个sql,第一反应就是很可能大量的NESTED LOOPS效率低下,怀疑统计信息错误,结果收集完统计信息之后,执行计划依旧,我就在思考怎么调整sql,让其不这样大量嵌套执行.想起来的_unnest_subquery是控制子查询嵌套转换的,从9i开始默认为true,尝试设置为false测试.

SQL> alter session set "_unnest_subquery"=false;
Session altered.
Elapsed: 00:00:00.00
SQL> explain plan for   select
  2   a.aab034, a.aac001
  3    from si_dp.ac01_ac02 a
  4   where exists (select 1
  5            from ic40
  6           where aac001 = a.aac001
  7             and aae045 <= '201803'
  8             and aae120 = '0')
  9     and not exists (select 1
 10            from ic15
 11           where aac001 = a.aac001
 12             and aae002 <= '201803')
 13     and not EXISTS (select aab001
 14            from ab01
 15           where aab019 in ('91', '93')
 16             AND aab001 = a.aab001)
 17    and exists (select 1
 18            from ac13
 19          where aac001 = a.aac001
 20             and aae140 = '11'
 21             and aae114 in ('0', '1')
 22             and aae002 <= '201803')
 23     AND EXISTS (SELECT 1
 24            FROM AC13
 25           WHERE AAC001 = A.AAC001
 26             and aae140 = '11'
 27             AND AAE143 = '02'
 28             AND AAE003 < '201707'
 29             AND AAE002 BETWEEN '201801' AND '201803'
 30             and aae114 = '1')
 31     AND not EXISTS (SELECT 1
 32            FROM AC13
 33           WHERE AAC001 = A.AAC001
 34             and aae140 = '11'
 35             AND AAE002 < '201801')
 36     AND not EXISTS (SELECT 1
 37            FROM ac02
 38           WHERE AAC001 = A.AAC001
 39             and aae140 = '11'
 40             AND AAE036 < date '2018-1-1');
Explained.
Elapsed: 00:00:00.07
SQL> select * from table (dbms_xplan.display);
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)|
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                  |   185K|    19M|       |  2991K  (2)|
|   1 |  FILTER                       |                  |       |       |       |            |
|   2 |   HASH JOIN RIGHT SEMI        |                  |   185K|    19M|    16M|   758K  (3)|
|   3 |    TABLE ACCESS BY INDEX ROWID| AC13             |   353K|    12M|       |  4556   (1)|
|   4 |     INDEX SKIP SCAN           | I_AC13_AAB001    | 23608 |       |       |  2287   (1)|
|   5 |    HASH JOIN SEMI             |                  |   201K|    14M|    11M|   751K  (3)|
|   6 |     HASH JOIN SEMI            |                  |   201K|  9452K|  8640K|   123K  (3)|
|   7 |      TABLE ACCESS FULL        | AC01_AC02        |   245K|  5755K|       |   357   (2)|
|   8 |      TABLE ACCESS FULL        | IC40             |    21M|   481M|       | 86122   (3)|
|   9 |     TABLE ACCESS FULL         | AC13             |    52M|  1350M|       |   530K  (3)|
|  10 |   INDEX RANGE SCAN            | PK_IC15          |     2 |    44 |       |     3   (0)|
|  11 |   VIEW                        | index$_join$_009 |     1 |    17 |       |     3  (34)|
|  12 |    HASH JOIN                  |                  |       |       |       |            |
|  13 |     INDEX RANGE SCAN          | PK_AB01          |     1 |    17 |       |     2   (0)|
|  14 |     INLIST ITERATOR           |                  |       |       |       |            |
|  15 |      INDEX RANGE SCAN         | IDX_AB01_AAB019  |     1 |    17 |       |     8   (0)|
|  16 |   TABLE ACCESS BY INDEX ROWID | AC13             |     2 |    50 |       |     5   (0)|
|  17 |    INDEX RANGE SCAN           | I_AC13_SEARCH    |   152 |       |       |     4   (0)|
|  18 |   TABLE ACCESS BY INDEX ROWID | AC02             |     1 |    26 |       |     4   (0)|
|  19 |    INDEX RANGE SCAN           | PK_AC02          |     1 |       |       |     3   (0)|
-----------------------------------------------------------------------------------------------

让开发设置该参数,然后执行sql,结果3分钟不到出结果,非常圆满完成任务.该sql还有进一步优化空间,但是考虑到已经满足要求,不再折腾.

ORA-19821故障分析

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：ORA-19821故障分析

数据库报错
数据库启动报ORA-00283和ORA-19821错

SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-19821: an intentionally corrupt log file was found
SQL> recover datafile 1;
ORA-00283: recovery session canceled due to errors
ORA-19821: an intentionally corrupt log file was found

Thu May 03 12:06:51 2018
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1 Reading mem 0
  Mem# 0: D:\APP\SOHTDB\ORADATA\xifenfei\REDO01.LOG
Media Recovery failed with error 19821
Errors in file d:\app\sohtdb\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_pr00_660.trc:
ORA-00283: recovery session canceled due to errors
ORA-19821: an intentionally corrupt log file was found
Slave exiting with ORA-283 exception
Errors in file d:\app\sohtdb\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_pr00_660.trc:
ORA-00283: recovery session canceled due to errors
ORA-19821: an intentionally corrupt log file was found
ORA-283 signalled during: ALTER DATABASE RECOVER  database  ...

ORA-19821报错原因
这个错误相对比较少见，查询mos，由于设置了_disable_logging = TRUE导致该问题

检查alert日志,_disable_logging参数确实被认为设置为true了.

解决方法
根据官方的描述,这样的情况无法常规恢复,但是我们知道设置这个参数是为了不产生日志,因此出现这种情况,只能通过隐含参数,禁止数据库进行实例恢复,强制打开数据库.在这样的过程中非常容易遭遇类似ORA-600 2662的错误.

设置这个参数是为了不产生日志，当数据库非干净关闭(主机断电，数据库crash，shutdown abort等）,就非常可能导致数据库无法正常启动.***千不可万不能在生产环境中设置_disable_logging = TRUE***
参考文档:Ora-19821 during the recovery (Doc ID 1217143.1)
Init.ora Parameter “_DISABLE_LOGGING” [Hidden] Reference Note (Doc ID 29552.1)

Oracle 数据文件大小为0kb或者文件丢失恢复

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：Oracle 数据文件大小为0kb或者文件丢失恢复

接到一个朋友恢复请求,由于rose频繁切换导致文件系统部分数据文件变化为0kb和文件丢失.
故障现象
部分数据文件变化为0kb和文件丢失.

这里比较明显,数据库的users03变为了0kb和users04丢失.数据库alert日志报错信息如下:

Completed: alter database mount exclusive
alter database open
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_dbw0_12008.trc:
ORA-01157: ????/?????? 7 - ??? DBWR ????
ORA-01110: ???? 7: 'E:\APP\ADMINISTRATOR\ORADATA\ORCL\USERS03.DBF'
ORA-27047: ??????????
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 38) 已到文件结尾。
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_dbw0_12008.trc:
ORA-01157: ????/?????? 8 - ??? DBWR ????
ORA-01110: ???? 8: 'E:\APP\ADMINISTRATOR\ORADATA\ORCL\USERS04.DBF'
ORA-27041: ??????
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_12040.trc:
ORA-01157: ????/?????? 7 - ??? DBWR ????
ORA-01110: ???? 7: 'E:\APP\ADMINISTRATOR\ORADATA\ORCL\USERS03.DBF'
ORA-1157 signalled during: alter database open...
Fri May 04 09:35:10 2018
Checker run found 2 new persistent data failures

alert日志的报错也比较明显,users03是文件超过了大小(大小为0kb,读取之后肯定超过大小),users04提示无法打开文件(文件在文件系统层面已经丢失).现在问题比较明显由于文件系统故障导致文件大小为0和丢失

碎片扫描恢复
常规的方法肯定无法恢复,比较好的方法只能是底层碎片扫描重组,结合多种扫描工具,最后发现一个做底层恢复的朋友的工具效果不错,扫描结果如下

通过工具分析坏块情况

C:\Users\Administrator>dbv FiLe=D:\0504\ORCL_TS.4_FILE.7_10.ora
DBVERIFY: Release 11.2.0.4.0 - Production on 星期六 5月 5 08:52:53 2018
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
DBVERIFY - 开始验证: FILE = D:\0504\ORCL_TS.4_FILE.7_10.ora
………………
页 382565 标记为损坏
Corrupt block relative dba: 0x01c5d665 (file 7, block 382565)
Completely zero block found during dbv:
页 382566 标记为损坏
Corrupt block relative dba: 0x01c5d666 (file 7, block 382566)
Completely zero block found during dbv:
页 382567 标记为损坏
Corrupt block relative dba: 0x01c5d667 (file 7, block 382567)
Completely zero block found during dbv:
DBVERIFY - 验证完成
检查的页总数: 1374720
处理的页总数 (数据): 27582
失败的页总数 (数据): 0
处理的页总数 (索引): 20114
失败的页总数 (索引): 0
处理的页总数 (其他): 1319752
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 1
标记为损坏的总页数: 7271
流入的页总数: 0
加密的总页数        : 0
最高块 SCN            : 228271996 (0.228271996)
C:\Users\Administrator>dbv FiLe=D:\0504\ORCL_TS.4_FILE.8_8.ora
DBVERIFY: Release 11.2.0.4.0 - Production on 星期六 5月 5 08:52:53 2018
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
DBVERIFY - 开始验证: FILE = D:\0504\ORCL_TS.4_FILE.8_8.ora
DBVERIFY - 验证完成
检查的页总数: 1136896
处理的页总数 (数据): 36639
失败的页总数 (数据): 0
处理的页总数 (索引): 57038
失败的页总数 (索引): 0
处理的页总数 (其他): 1043218
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 1
标记为损坏的总页数: 0
流入的页总数: 0
加密的总页数        : 0
最高块 SCN            : 228271997 (0.228271997)
C:\Users\Administrator>

这里通过分析恢复的两个文件总的block数量2511618，其中连续损坏7271个block损坏,由于出现问题之后,数据库被offline这两个文件继续启动运行了几个小时,导致少量block被覆盖,恢复软件直接置空.后续的恢复比较顺利,正常open数据库,然后处理坏块对象(正好不是业务核心表的lob字段,所有部分丢失影响不是非常大).

温馨提醒:
1. 数据文件和备份不要放在同一个阵列上,更不能是同一个分区（卷）上
2. 出现此类问题之后,应当理解停止对该分区的任何写操作,方式丢失或者大小为0KB的文件被覆盖.
如果需要专业ORACLE数据库恢复技术支持，请联系我们
Phone:17813235971 Q Q:107644445 E-Mail:dba@xifenfei.com

月度归档：5月 2018