MON_MODS$表ORA-600 13013报错处理

有朋友反馈数据库启动运行一点时间之后,然后就自动crash,让我们帮忙找原因,通过分析是由于smon进程触发ORA-600 13013导致数据库异常
alert日志报错信息

Thu Aug  4 18:39:44 2016
Database Characterset is ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=33, OS id=22935
Thu Aug  4 18:39:44 2016
Completed: ALTER DATABASE OPEN
Thu Aug  4 18:39:44 2016
db_recovery_file_dest_size of 2048 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Thu Aug  4 18:48:41 2016
Thread 1 advanced to log sequence 86746
  Current log# 3 seq# 86746 mem# 0: /opt/ora10/oradata/ora10g/redo03.log
Thu Aug  4 18:58:13 2016
Errors in file /opt/ora10/admin/ora10g/bdump/ora10g_smon_22449.trc:
ORA-00600: internal error code, arguments: [13013], [5001], [482], [4198075], [40], [4198075], [17], []
Thu Aug  4 18:58:56 2016
Non-fatal internal error happenned while SMON was doing flushing of monitored table stats.
SMON encountered 8 out of maximum 100 non-fatal internal errors.
Thu Aug  4 18:59:06 2016
Errors in file /opt/ora10/admin/ora10g/bdump/ora10g_smon_22449.trc:
ORA-00600: internal error code, arguments: [13013], [5001], [482], [4198075], [40], [4198075], [17], []
Thu Aug  4 18:59:08 2016
Errors in file /opt/ora10/admin/ora10g/bdump/ora10g_pmon_22413.trc:
ORA-00474: SMON process terminated with error
Thu Aug  4 18:59:08 2016
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 22413

通过trace文件大概可以发现是由于ORA-600 13013错误导致数据库crash,而且这里有类似”SMON was doing flushing of monitored table stats”错误提示,根据经验,很可能是smon把表的dml操作收集信息相关.

ORA-600 [13013] 含义

ORA-600 [13013] [a] [b] {c} [d] [e] [f]
This format relates to Oracle Server 8.0.3 to 10.1
Arg [a] Passcount
Arg [b] Data Object number
Arg {c} Tablespace Relative DBA of block containing the row to be updated
Arg [d] Row Slot number
Arg [e] Relative DBA of block being updated (should be same as )
Arg [f] Code

根据这个错误信息,以及How to resolve ORA-00600 [13013], [5001] [ID 816784.1]中的描述

ORA-600 13013 对应对象

SQL> select object_name from dba_objects where object_id=482
OBJECT_NAME
--------------------------------------------------------------------------------
MON_MODS$

该对象正是和监控dml变化相关的表,smon会对其进行相关操作,以前写过一篇:MON_MODS$和MON_MODS_ALL$统计DML操作次数的文章
对于MON_MODS$表ORA-600 13013处理

SQL> analyze table mon_mods$ validate structure cascade;
analyze table mon_mods$ validate structure cascade
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file
SQL> select index_name from dba_indexes where table_name='MON_MODS$';
INDEX_NAME
------------------------------
I_MON_MODS$_OBJ
SQL> ALTER INDEX I_MON_MODS$_OBJ REBUILD;
Index altered.
SQL> analyze table mon_mods$ validate structure cascade;
analyze table mon_mods$ validate structure cascade
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file
SQL> CREATE TABLE MON_MODS_BAK AS SELECT * FROM MON_MODS$;
Table created.
SQL> SELECT COUNT(*) FROM MON_MODS$;
  COUNT(*)
----------
      1247
SQL> C/MON_MODS$/MON_MODS_BAK;
  1* SELECT COUNT(*) FROM MON_MODS_BAK
SQL> /
  COUNT(*)
----------
      1247
SQL> TRUNCATE TABLE MON_MODS$;
Table truncated.
SQL> INSERT INTO MON_MODS$ SELECT * fROM MON_MODS_BAK;
1247 rows created.
SQL> COMMIT;
Commit complete.
SQL>  analyze table mon_mods$ validate structure cascade;
Table analyzed.

自此关于MON_MODS$表相关的ORA-600 13013异常处理完全,当然也可以通过重建I_MON_MODS$_OBJ索引来解决,但是不能通过rebuild index解决.数据库也就不会因此而crash了.

ORA-600 4042 故障恢复

通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check) 检查结果
wrong_scn
通过上图可以知道file 2未能正常恢复(需要看日志分析原因),file 3以前就被offline,需要历史归档(非归档状态,所以这个先放着,后续再处理)

分析file 2 不成功原因

Wed Aug  3 15:21:11 2016
ALTER DATABASE RECOVER  datafile 2
Wed Aug  3 15:21:11 2016
Media Recovery Start
 parallel recovery started with 2 processes
Wed Aug  3 15:21:11 2016
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1916 Reading mem 0
  Mem# 0 errs 0: /home/oracle/orabase/oradata/ORACLE/redo01.log
Wed Aug  3 15:21:11 2016
Errors in file /u01/app/oracle/admin/oracle/bdump/oracle_p001_22017.trc:
ORA-00600: internal error code, arguments: [3020], [2], [41], [8388649], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 41)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 2: '/home/oracle/orabase/oradata/ORACLE/undotbs01.dbf'
ORA-10560: block type '0'
Wed Aug  3 15:21:13 2016
Errors in file /u01/app/oracle/admin/oracle/bdump/oracle_p001_22017.trc:
ORA-00600: internal error code, arguments: [3020], [2], [41], [8388649], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 41)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 2: '/home/oracle/orabase/oradata/ORACLE/undotbs01.dbf'
ORA-10560: block type '0'
Wed Aug  3 15:21:18 2016
Media Recovery failed with error 12801
ORA-283 signalled during: ALTER DATABASE RECOVER  datafile 2  ...

通过日志可以知道由于ORA-600 3020导致file 2不能正常的恢复.
处理file 2

SQL> recover  datafile 2 allow 1 corruption;
Media recovery complete.
Thu Aug  4 01:58:35 2016
ALTER DATABASE RECOVER  datafile 2 allow 1 corruption
Media Recovery Start
 ALLOW CORRUPTION option must use serial recovery
Thu Aug  4 01:58:35 2016
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1916 Reading mem 0
  Mem# 0 errs 0: /home/oracle/orabase/oradata/ORACLE/redo01.log
Thu Aug  4 01:58:35 2016
Media Recovery Complete (oracle)
Completed: ALTER DATABASE RECOVER  datafile 2 allow 1 corruption

尝试open数据库

SQL> alter database open ;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
Thu Aug  4 01:59:20 2016
alter database open
Thu Aug  4 01:59:21 2016
Beginning crash recovery of 1 threads
 parallel recovery started with 2 processes
Thu Aug  4 01:59:21 2016
Started redo scan
Thu Aug  4 01:59:21 2016
Completed redo scan
 1619 redo blocks read, 0 data blocks need recovery
Thu Aug  4 01:59:21 2016
Started redo application at
 Thread 1: logseq 1916, block 12724
Thu Aug  4 01:59:21 2016
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1916 Reading mem 0
  Mem# 0 errs 0: /home/oracle/orabase/oradata/ORACLE/redo01.log
Thu Aug  4 01:59:21 2016
Completed redo application
Thu Aug  4 01:59:21 2016
Completed crash recovery at
 Thread 1: logseq 1916, block 14343, scn 3303614971196
 0 data blocks read, 0 data blocks written, 1619 redo blocks read
Thu Aug  4 01:59:21 2016
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=18, OS id=5542
Thu Aug  4 01:59:21 2016
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=19, OS id=5544
Thu Aug  4 01:59:21 2016
Thread 1 advanced to log sequence 1917
Thread 1 opened at log sequence 1917
  Current log# 2 seq# 1917 mem# 0: /home/oracle/orabase/oradata/ORACLE/redo02.log
Successful open of redo thread 1
Thu Aug  4 01:59:21 2016
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Aug  4 01:59:21 2016
ARC1: STARTING ARCH PROCESSES
Thu Aug  4 01:59:21 2016
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Thu Aug  4 01:59:21 2016
SMON: enabling cache recovery
Thu Aug  4 01:59:21 2016
ARC2: Archival started
ARC1: STARTING ARCH PROCESSES COMPLETE
ARC1: Becoming the heartbeat ARCH
ARC2 started with pid=20, OS id=5546
Thu Aug  4 01:59:21 2016
db_recovery_file_dest_size of 2048 MB is 1.05% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Thu Aug  4 01:59:22 2016
Errors in file /u01/app/oracle/admin/oracle/udump/oracle_ora_5505.trc:
ORA-00600: internal error code, arguments: [4042], [0], [], [], [], [], [], []
Thu Aug  4 01:59:23 2016
Errors in file /u01/app/oracle/admin/oracle/udump/oracle_ora_5505.trc:
ORA-00600: internal error code, arguments: [4042], [0], [], [], [], [], [], []
Thu Aug  4 01:59:23 2016
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 5505
ORA-1092 signalled during: alter database open ...

由于ORA-600 4042错误导致数据库无法正常open.
分析ORA-600 4042

PARSING IN CURSOR #4 len=142 dep=1 uid=0 oct=3 lid=0 tim=1435788503594313 hv=361892850 ad='a7ab2db8'
select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,
DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1
END OF STMT
PARSE #4:c=0,e=11,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,tim=1435788503594311
BINDS #4:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=2aae75802218  bln=22  avl=02  flg=05
  value=3
EXEC #4:c=0,e=39,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,tim=1435788503594393
FETCH #4:c=0,e=8,p=0,cr=2,cu=0,mis=0,r=1,dep=1,og=3,tim=1435788503594412
STAT #4 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=0 pw=0 time=8 us)'
STAT #4 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=0 pw=0 time=3 us)'
WAIT #1: nam='db file sequential read' ela= 10 file#=2 block#=41 blocks=1 obj#=-1 tim=1435788503594468
Dump of buffer cache at level 4 for tsn=1, rdba=8388649
BH (0x95ff3c58) file#: 2 rdba: 0x00800029 (2/41) class: 21 ba: 0x95ef0000
  set: 3 blksize: 8192 bsi: 0 set-flg: 2 pwbcnt: 0
  dbwrid: 0 obj: -1 objn: 0 tsn: 1 afn: 2
  hash: [a8b77880,a8b77880] lru: [95ff3dd0,a8e70338]
  ckptq: [NULL] fileq: [NULL] objq: [a43da110,a43da110]
  use: [a8e6e658,a8e6e658] wait: [NULL]
  st: XCURRENT md: SHR tch: 0
  flags: gotten_in_current_mode
  LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
  buffer tsn: 1 rdba: 0x00800029 (2/41)
  scn: 0x0000.00000000 seq: 0x01 flg: 0x01 tail: 0x00000001
  frmt: 0x02 chkval: 0x0000 type: 0x00=unknown
Hex dump of block: st=0, typ_found=0
Dump of memory from 0x0000000095EF0000 to 0x0000000095EF2000
095EF0000 0000A200 00800029 00000000 01010000  [....)...........]
095EF0010 00000000 00000000 00000000 00000000  [................]
        Repeat 509 times
095EF1FF0 00000000 00000000 00000000 00000001  [................]
Dump of memory from 0x0000000095EF0014 to 0x0000000095EF1FFC
095EF0010          00000000 00000000 00000000      [............]
095EF0020 00000000 00000000 00000000 00000000  [................]

这里可以发现,file 2 block 41的type为unknown,注意观察ORA-600 3020的错误,我们发现当时报的坏块也正好是该block.基本上可以确定由于前面的allow 1 corruption操作导致了后面的ORA-600 4042的错误.官方关于ORA-600[4042]解释
ORA-600-4042


通过修改undo$中的回滚段状态(参考:bbed修改undo$(回滚段)状态)
正常open数据库,修改file 3的scn并online数据文件

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.
Total System Global Area 1224736768 bytes
Fixed Size                  2020384 bytes
Variable Size             318770144 bytes
Database Buffers          889192448 bytes
Redo Buffers               14753792 bytes
Database mounted.
SQL>   SELECT thread#,
  2           a.sequence#,
  3           a.group#,
  4           TO_CHAR (first_change#, '9999999999999999') "SCN",
  5           a.status,
  6           MEMBER
  7      FROM v$log a, v$logfile b
  8     WHERE a.group# = B.GROUP#
  9  ORDER BY a.sequence# DESC;
   THREAD#  SEQUENCE#     GROUP# SCN
---------- ---------- ---------- ----------------------------------
STATUS
--------------------------------
MEMBER
--------------------------------------------------------------------------------
         1       1919          1     3303615011212
CURRENT
/home/oracle/orabase/oradata/ORACLE/redo01.log
         1       1918          3     3303614991206
INACTIVE
/home/oracle/orabase/oradata/ORACLE/redo03.log
   THREAD#  SEQUENCE#     GROUP# SCN
---------- ---------- ---------- ----------------------------------
STATUS
--------------------------------
MEMBER
--------------------------------------------------------------------------------
         1       1917          2     3303614971197
INACTIVE
/home/oracle/orabase/oradata/ORACLE/redo02.log
SQL> recover database using backup controlfile;
ORA-00279: change 3303615011452 generated at 08/04/2016 02:06:52 needed for
thread 1
ORA-00289: suggestion :
/u01/app/oracle/flash_recovery_area/ORACLE/archivelog/2016_08_04/o1_mf_1_1919_%u
_.arc
ORA-00280: change 3303615011452 for thread 1 is in sequence #1919
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/home/oracle/orabase/oradata/ORACLE/redo01.log
Log applied.
Media recovery complete.
SQL> alter database datafile 3 online;
Database altered.
SQL> alter database open resetlogs;
Database altered.
SQL>

至此该数据库基本上恢复完成,强烈建议使用逻辑方式导出导入重建库.

ORA-600 4194/ORA-600 4193/ORA-600 4137故障解决

对于常见的undo异常错误,ORA-600 4193,ORA-600 4194,ORA-600 4137等错误的处理一般步骤.
适用版本

Oracle Database - Enterprise Edition - Version 9.2.0.1 to 11.2.0.4 [Release 9.2 to 11.2]
Information in this document applies to any platform.

报错现象

The following error is occurring in the alert.log right before the database crashes.
ORA-00600: internal error code, arguments: [4194], [#], [#], [], [], [], [], []
This error indicates that a mismatch has been detected between redo records and rollback (undo) records.
ARGUMENTS:
Arg [a] - Maximum Undo record number in Undo block
Arg [b] - Undo record number from Redo block
Since we are adding a new undo record to our undo block, we would expect that the new record number
 is equal to the maximum record number in the undo block plus one. Before Oracle can add
a new undo record to the undo block it validates that this is correct. If this validation fails,
 then an ORA-600 [4194] will be triggered.

报错原因

This also can be cause by the following defect
Bug 8240762 Abstract: Undo corruptions with ORA-600 [4193]/ORA-600 [4194] or ORA-600 [4137] after SHRINK
Details:
Undo corruption may be caused after a shrink and the same undo block may be used
for two different transactions causing several internal errors like:
ORA-600 [4193] / ORA-600 [4194] for new transactions
ORA-600 [4137] for a transaction rollback

处理步骤

Best practice to create a new undo tablespace.
This method includes segment check.
Create pfile from spfile to edit
>create pfile from spfile;
1. Shutdown the instance
2. set the following parameters in the pfile
    undo_management = manual
    event = '10513 trace name context forever, level 2'
3. >startup restrict pfile=<initsid.ora>
4. >select tablespace_name, status, segment_name from dba_rollback_segs where status != 'OFFLINE';
This is critical - we are looking for all undo segments to be offline - System will always be online.
If any are 'PARTLY AVAILABLE' or 'NEEDS RECOVERY' - Please open an issue with Oracle Support or update the current SR.
If all offline then continue to the next step
5. Create new undo tablespace - example
>create undo tablespace <new undo tablespace> datafile <datafile> size 2000M;
6. Drop old undo tablespace
>drop tablespace <old undo tablespace> including contents and datafiles;
7. >shutdown immediate;
8 >startup nomount;  --> Using your Original spfile
9 modify the spfile with the new undo tablespace name
  Alter system set undo_tablespace = '<new tablespace created in step 5>' scope=spfile;
10. >shutdown immediate;
11. >startup;  --> Using spfile
The reason we create a new undo tablespace first is to use new undo segment numbers
 that are higher then the current segments being used.
This way when a transaction goes to do block clean-out
the reference to that undo segment does not exist and continues with the block clean-out.

参考:tep by step to resolve ORA-600 4194 4193 4197 on database crash (Doc ID 1428786.1)

ORA-600 3020

ORA-600 3020官方解释说明

ERROR:
  Format: ORA-600 [3020] [a] [b] {c} [d] [e]
VERSIONS:
  version 6.0 and above
DESCRIPTION:
  This is called a 'STUCK RECOVERY'.
  There is an inconsistency between the information stored in the redo
  and the information stored in a database block being recovered.
ARGUMENTS:
For Oracle 9.2 and earlier:
  Arg [a] Block DBA
  Arg [b] Redo Thread
  Arg  Redo RBA Seq
  Arg [d] Redo RBA Block No
  Arg [e] Redo RBA Offset.
For Oracle 10.1
  Arg [a] Absolute file number of the datafile.
  Arg [b] Block number
  Arg  Block DBA
FUNCTIONALITY:
  kernel cache recovery parallel
IMPACT:
  INSTANCE FAILURE during recovery.
SUGGESTIONS:
  There have been cases of receiving this error when RECOVER has
  been issued, but either some datafiles were not restored to disk,
  or the restoration has not finished.
  Therefore, ensure that the entire backup has been restored and that
  the restore has finished PRIOR to issuing a RECOVER database command.
  If problems continue, consider restoring from a backup and doing a
  point-in-time recovery to a time PRIOR to the one implied by
  the ORA-600[3020] error.
  Example:
     SQL> recover database until time 'YYYY-MON-DD:HH:MI:SS';
  This error can also be caused by a lost update.
  During normal operations, block updates/writes are being performed to
  a number of files including database datafiles, redo log files,
  archived redo log files etc.
  This error can be reported if any of these updates are lost for some
  reason.
  Therefore, thoroughly check your operating system and disk hardware.
  In the case of a lost update, restore an old copy of the datafile and
  attempt to recover and roll forward again.

相关bug信息

NB

Prob

Bug

Fixed

Description

III

16384983

11.2.0.4.6, 11.2.0.4.BP15, 12.1.0.2, 12.2.0.0

RMAN bad backup – causes recovery to fail with ORA-600 [3020]

III

16057129

11.2.0.3.BP22, 11.2.0.4, 12.1.0.1.2, 12.1.0.2

Exadata cell optimized incremental backup can skip some blocks to backup

I

22302666

12.2.0.0

ORA-753 ORA-756 or ORA-600 [3020] with KCOX_FUTURE after RMAN Restore / PITR with BCT after Open Resetlogs in 12c

II

21629064

12.1.0.2.160719, 12.2.0.0

ORA-600 [3020] KCOX_FUTURE by RECOVERY for KTU UNDO BLOCK SEQ:254 sometime after RMAN Restore of UNDO datafile in Source Database

III

20509482

11.2.0.4.BP20, 12.1.0.2.160119, 12.1.0.2.DBBP09, 12.2.0.0

ORA-600 [3020] / ORA-752 Wrong Results after Parallel Direct Load or RMAN ORA-600 [krcrfr_nohist] in RAC (caused by fix for bug 9962369)

II

19445860

12.2.0.0

ORA-1172 or ORA-600 [3020] Stuck recovery in RAC after attempted block rebuild

III

18284763

11.2.0.4.BP13, 12.1.0.2, 12.2.0.0

ORA-600 [3020] on ASSM blocks in Standby Database after CONVERT TO PHYSICAL or ORA-8103 ORA-600 [4552] in non-standby after FLASHBACK

III

18268201

12.1.0.2, 12.2.0.0

ORA-600 [3020] ORA-10567 and ORA-10560: block type ‘0’ / ORA-600 [kdBlkCheckError] ORA-600 [ktfbbset-2] after flashback which reversed a datafile extend – superseded

I

16891789

11.2.0.4, 12.1.0.2, 12.2.0.0

ORA-600 [3020] or ORA-752 if db_lost_write_protect is enabled. Bystander standby recovers wrong redo log after switchover or failover.

+

III

17761775

11.2.0.3.9, 11.2.0.3.BP22, 11.2.0.4.2, 11.2.0.4.BP03, 12.1.0.1.3, 12.1.0.2

ORA-600 [kclchkblkdma_3] ORA-600 [3020] or ORA-600 [kcbchg1_16] Join of temp and permanent table in RAC might lead to corruption – superseded

I

16844448

11.2.0.3.9, 11.2.0.3.BP22, 11.2.0.4, 12.1.0.2

ORA-600 [3020] after flashback database in a RAC

III

15999982

11.2.0.3.BP24, 11.2.0.4, 12.1.0.2

ORA-600 [kcl_sanity_check_cr_1] ORA-600 [kclchkblkdma_3] in RAC or ORA-752 ORA-600 [3020] during recovery

II

21425496

ORA-752 or ORA-600 [3020] on recovery of Block Cleanout Operation OP:4.6

I

9847338

Session hang after applying the patch for Bug 9587912 which causes ORA-600 [3020]

+

III

13467683

11.2.0.2.9, 11.2.0.2.BP15, 11.2.0.3.3, 11.2.0.3.BP04, 11.2.0.4, 12.1.0.1

Join of temp and permanent tables in RAC might cause corruption of permanent table. Regression by bug 10352368

E

12831782

11.2.0.2.BP11, 11.2.0.3.BP01, 11.2.0.4, 12.1.0.1

ORA-600 [3020] / ORA-333 Recovery of datafile or async transport do not read mirror if there is a stale block

II

12821418

11.2.0.3.8, 11.2.0.3.BP18, 11.2.0.4, 12.1.0.1

Direct NFS appears to be sending zero length windows to storage device. It may also cause Lost Writes

I

12582839

11.2.0.3, 12.1.0.1

ORA-8103/ORA-600 [3020] on RMAN recovered locally managed tablespace

P

I

12330911

12.1.0.1

EXADATA LSI firmware for lost writes

III

11689702

11.2.0.2.5, 11.2.0.2.BP13, 11.2.0.2.GIPSU05, 11.2.0.3, 12.1.0.1

ORA-600 [3020] during recovery after datafile RESIZE (to smaller size)

+

II

10425010

11.2.0.3, 12.1.0.1

Stale data blocks may be returned by Exadata FlashCache

10329146

11.2.0.1.BP10, 11.2.0.2.2, 11.2.0.2.BP03, 11.2.0.2.GIBUNDLE02, 11.2.0.2.GIPSU02, 11.2.0.3, 12.1.0.1

Lost write in ASM with multiple DBWs and a disk is offlined and then onlined

II

10218814

11.2.0.2.2, 11.2.0.2.BP02, 11.2.0.3, 12.1.0.1

ORA-600 [3020] during recovery / on standby

+

II

10209232

11.1.0.7.7, 11.2.0.1.BP08, 11.2.0.2.1, 11.2.0.2.BP02, 11.2.0.2.GIBUNDLE01, 11.2.0.3, 12.1.0.1

ORA-1578 / ORA-600 [3020] Corruption. Misplaced Blocks and Lost Write in ASM

*

III

10205230

11.2.0.1.6, 11.2.0.1.BP09, 11.2.0.2.2, 11.2.0.2.BP04, 11.2.0.3, 12.1.0.1

ORA-600 / corruption possible during shutdown in RAC

II

10094823

11.2.0.2.4, 11.2.0.2.BP09, 11.2.0.3, 12.1.0.1

Block change tracking on physical standby can cause data loss

10071193

11.2.0.2.BP02, 11.2.0.3, 12.1.0.1

Lost write / ORA-600 [kclchkblk_3] / ORA-600 [3020] in RAC – superceded

9587912

11.2.0.2, 12.1.0.1

ORA-600 [3020] in datafile that went offline/online in a RAC instance

8774868

11.2.0.1.2, 11.2.0.1.BP06, 11.2.0.2, 12.1.0.1

OERI[3020] reinstating primary

+

III

8769473

11.2.0.2, 12.1.0.1

ORA-600 [kcbzib_5] on multi block read in RAC. Invalid lock in RAC. ORA-600 [3020] in Recovery

P

II

8635179

10.2.0.5, 11.2.0.2, 12.1.0.1

Solaris: directio may be disabled for RAC file access. Corruption / Lost Write

+

II

8597106

11.2.0.1.BP06, 11.2.0.2, 12.1.0.1

Lost Write in ASM when normal redundancy is used

+E

II

17752121

11.2.0.3.9, 11.2.0.3.BP22, 11.2.0.4.2, 11.2.0.4.BP03

ORA-600 [kclchkblkdma_3] ORA-600 [3020] RAC diagnostic/fix to avoid a block being modified in Shared Mode and prevent corruption – Superseded

III

8826708

10.2.0.5, 11.2.0.2

ORA-600 [3020] for block type 0x3a (58) during recovery for block restored by RMAN backup

I

16579042

11.2.0.4

ORA-600 [kjbmpocr:alh] ORA-600 [kclchkblkdma_3] by LMS in RAC which may lead to corruption

11684626

11.2.0.1

ORA-600 [3020] on standby involving “BRR” redo when db_lost_write_protect is enabled

8230457

10.2.0.4.1, 10.2.0.5, 11.1.0.7.1, 11.2.0.1

Physical standby media recovery gets OERI[krr_media_12]

+

II

7680907

10.2.0.5, 11.1.0.7.1, 11.2.0.1

ORA-600 [kclexpandlock_2] in LMS / instance crash. Incorrect locks in RAC. ORA-600 [3020] in recovery

II

4637668

10.2.0.3, 11.1.0.6

IMU transactions can produce out-of-order redo (OERI [3020] on recovery)

4594917

9.2.0.8, 10.2.0.2, 11.1.0.6

Write IO error can cause incorrect file header checkpoint information

4453449

10.2.0.2, 11.1.0.6

OERI:3020 / corruption errors from multiple FLASHBACK DATABASE

III

7197445

10.2.0.4.1, 10.2.0.5

Standby Recovery session cancelled due to ORA-600 [3020] “CHANGE IN FUTURE OF BLOCK”

II

5610267

10.2.0.5

MRP terminated by ORA-600[krr_media_12] / OERI:3020 after flashback

3762714

9.2.0.7, 10.1.0.4, 10.2.0.1

ALTER DATABASE RECOVER MANAGED STANDBY fails with OERI[3020]

I

3560209

10.2.0.1

OERI[3020] stuck recovery under RAC

3397181

9.2.0.5, 10.1.0.3, 10.2.0.1

ALTER SYSTEM KILL SESSION of recovery slave causes stuck recovery

*

3381950

10.2.0.1

Backups from RAC DB before Data Guard Failover cannot be used

3535712

9.2.0.6, 10.1.0.4

OERI[3020] / ORA-10567 from RAC with standby in max performance mode

4594912

9.2.0.8, 10.1.0.2

Incorrect checkpoint possible in datafile headers

3635331

9.2.0.6, 10.1.0.4

Stuck recovery (OERI:3020) / ORA-1172 on startup after a crash

II

2322620

9.2.0.1

OERI:3020 possible on recovery of LOB DATA

P+

656370

7.3.3.4, 7.3.4.0, 8.0.3.0

AlphaNT only: Corrupt Redo (zeroed byte) OERI:3020

ORA-600 kdsgrp1

在硬件恢复,断电,redo异常等恢复case中ORA-600 [kdsgrp1]是一个比较常见的错误,这里该出来官方关于该错误的解释说明和处理方法

RROR:
  Format: ORA-600 [kdsgrp1]
VERSIONS:
  versions 10.1 and above
DESCRIPTION:
 This error was introduced in 10g with the fix to Bug 2442351, it provides
 for an extra health check on a block, we detected a null row header,
 see Note:2442351.9 for more information.
 Error may be caused by:
 Case 1. A row referenced in an index that does not exist in the table.
 Case 2. An non-existent rowid pointed to by a chained row.
 Trace Examples:
 Case 1. Mismatch between table and index:
====================================================
 Trace file has:
 row 02433566.13 continuation at
 file# 9 block# 210278 slot 20 not found
 The file=9 block=210278 is rdba=0x02433566 which was taken from an index:
 row#3[7549] flag: ------, lock: 0, len=85, data:(6):  02 43 35 66 00 14
 But the slot 20 does not exist in the table block:
 tab 0, row 1, @0x1e62
 tl: 2 fb: --HDFL-- lb: 0x3
 tab 0, row 12, @0x191a
 tl: 2 fb: --HDFL-- lb: 0x1
 tab 0, row 17, @0x1675
 tl: 2 fb: --HDFL-- lb: 0x2
 tab 0, row 21, @0x1459
 tl: 2 fb: --HDFL-- lb: 0x4
 ORA-1499 may be produced by analyze:
 analyze table <table name> validate structure cascade;
 Case 2. A row points to another rowid which does not exist (Chained row does not exist).
============================================================================================
 Trace file has:
 row 1186b11a.ffffffff continuation at
 file# 70 block# 441621 slot 1 not found
 It means that row with rdba 0x1186b11a continues in file# 70 block# 441621 slot 1.
 But the information in file# 70 block# 441621 slot 1 does not exist.  It is:
 tab 0, row 16, @0xd7f    ---> This is the slot with the problem.
 tl: 29 fb: -------- lb: 0x0  cc: 11
 nrid:  0x1186bd15.1      ---> It points to rdba=0x1186bd15 slot 1
(file# 70 block# 441621 slot 1) but that row does not exist in that block.
 For this case ANALYZE TABLE .. VALIDATE STRUCTURE is not detecting this logical corruption
Referece Bug 6858313
Run an export (exp) or Full Table Scan to identify if there is a permanent invalid chained row.
 Workaround for Case 2:
 The row producing the ORA-600 [kdsgrp1] can be skipped by setting the Event 10231
 Note that a testcase has concluded that event 10231 does not skip rows in an Index Organized Table (IOT)
 when there is an invalid nrid as explained in Case 2.  It only works for regular tables.
 Event 43810 skip corrupt block in IOT?s (10.2.0.4)
nor  parameter _index_scan_check_skip_corrupt (11g) work for this case 2 on IOTs either.
FUNCTIONALITY:
  Kernel Data layer Seek/Scan
IMPACT:
  PROCESS FAILURE
  POSSIBLE PHYSICAL CORRUPTION

ORA-600 2663

在大家熟悉的ORA-600 2662错误中还有一个类似的ORA-600 2663的错误,以下是对该2663的解释说明

ERROR:
  Format: ORA-600 [2663] [a] [b] {c} [d]
VERSIONS:
  versions 10.1 to 11.1
DESCRIPTION:
  A data block SCN is ahead of the current SCN.
  The ORA-600 [2663] occurs when an SCN is compared to the dependent SCN
  stored in a UGA variable.
  If the SCN is less than the dependent SCN then we signal the ORA-600 [2663]
  internal error.
ARGUMENTS:
  Arg [a]  Current SCN WRAP
  Arg [b]  Current SCN BASE
  Arg {c}  dependent SCN WRAP
  Arg [d]  dependent SCN BASE
FUNCTIONALITY:
  Kernel Cache Redo File Redo Generation
IMPACT:
  INSTANCE FAILURE
  POSSIBLE PHYSICAL CORRUPTION
SUGGESTIONS:
  There are different situations where ORA-600 [2663] can be raised.
  It can be raised on startup or duing database operation.
  If not using RAC, Real Application Clusters,, check that 2 instances
  have not mounted the same database.
  Check for SMON traces and have the alert.log and trace files ready
  to send to support.
  Check the SCN difference [argument d]-[argument b].
  If the SCNs in the error are very close, then try to shutdown and startup
  the instance several times.
  In some situations, the SCN increment during startup may permit the
  database to open. Keep track of the number of times you attempted a
  startup.

ORA-600 2662

在数据库恢复中,ORA-600 2662我想是很多人都非常熟悉的错误,下文是对于该错误的一些解释
ORA-600 2662解释说明

ERROR:
  Format: ORA-600 [2662] [a] [b] {c} [d] [e]
VERSIONS:
  versions 6.0 to 12.1
DESCRIPTION:
  A data block SCN is ahead of the current SCN.
  The ORA-600 [2662] occurs when an SCN is compared to the dependent SCN
  stored in a UGA variable.
  If the SCN is less than the dependent SCN then we signal the ORA-600 [2662]
  internal error.
ARGUMENTS:
  Arg [a]  Current SCN WRAP
  Arg [b]  Current SCN BASE
  Arg {c}  dependent SCN WRAP
  Arg [d]  dependent SCN BASE
  Arg [e]  Where present this is the DBA where the dependent SCN came from.

出现ORA-600 2662可能的原因

  (1) doing an open resetlogs with _ALLOW_RESETLOGS_CORRUPTION enabled
  (2) a hardware problem, like a faulty controller, resulting in a failed
      write to the control file or the redo logs
  (3) restoring parts of the database from backup and not doing the
      appropriate recovery
  (4) restoring a control file and not doing a RECOVER DATABASE USING BACKUP
      CONTROLFILE
  (5) having _DISABLE_LOGGING set during crash recovery
  (6) problems with the DLM in a parallel server environment
  (7) a bug

ORA-600 2662解决方法

   (1) if the SCNs in the error are very close, attempting a startup several
       times will bump up the dscn every time we open the database even if
       open fails. The database will open when dscn=scn.
   (2)You can bump the SCN either on open or while the database is open
      using Event:ADJUST_SCN
      Be aware that you should rebuild the database if you use this
      option.

ORA-600 999 异常恢复

有网友数据库启动报ORA-600 999错误,无法正常open,让我们介入分析,帮忙恢复其中部分数据
数据库启动报ORA-600 999

Sun Jul 31 23:09:36 2016
SMON: enabling cache recovery
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_3356.trc  (incident=179779):
ORA-00600: internal error code, arguments: [999], [0x7FFAE748013], [], [], [], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_179779\orcl_smon_3356_i179779.trc
No Resource Manager plan active
Starting background process QMNC
Sun Jul 31 23:09:37 2016
QMNC started with pid=20, OS id=5068
ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering transaction (7, 1).
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_3356.trc:
ORA-00600: internal error code, arguments: [999], [0x7FFAE748013], [], [], [], [], [], [], [], [], [], []
Completed: alter database open
Sun Jul 31 23:09:38 2016
db_recovery_file_dest_size of 8680 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Trace dumping is performing id=[cdmp_20160731230939]
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_3356.trc  (incident=179785):
ORA-00600: internal error code, arguments: [999], [0x7FFAE748013], [], [], [], [], [], [], [], [], [], []
ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering transaction (7, 1).
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_3356.trc:
ORA-00600: internal error code, arguments: [999], [0x7FFAE748013], [], [], [], [], [], [], [], [], [], []
Sun Jul 31 23:09:41 2016
Starting background process CJQ0
Sun Jul 31 23:09:41 2016
CJQ0 started with pid=25, OS id=2572
Process debug not enabled via parameter _debug_enable
Trace dumping is performing id=[cdmp_20160731230942]
PMON (ospid: 3948): terminating the instance due to error 474
Sun Jul 31 23:09:48 2016
opiodr aborting process unknown ospid (2592) as a result of ORA-1092
Sun Jul 31 23:09:48 2016
ORA-1092 : opitsk aborting process
Sun Jul 31 23:09:52 2016
Instance terminated by PMON, pid = 3948

设置_offline_rollback_segments数据库启动正常

Sun Jul 31 23:18:13 2016
ALTER DATABASE OPEN
Thread 1 opened at log sequence 16
  Current log# 1 seq# 16 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG
Successful open of redo thread 1
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 5.
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_4372.trc  (incident=182188):
ORA-00600: internal error code, arguments: [kdBlkCheckError], [3], [224], [38508], [], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_182188\orcl_smon_4372_i182188.trc
Doing block recovery for file 3 block 224
Resuming block recovery (PMON) for file 3 block 224
Block recovery from logseq 16, block 2945 to scn 15431544
Recovery of Online Redo Log: Thread 1 Group 1 Seq 16 Reading mem 0
  Mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG
Trace dumping is performing id=[cdmp_20160731231815]
Block recovery stopped at EOT rba 16.2952.16
Block recovery completed at rba 16.2952.16, scn 0.15431543
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_4372.trc:
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [kdBlkCheckError], [3], [224], [38508], [], [], [], [], [], [], [], []
Sun Jul 31 23:18:19 2016
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_4372.trc  (incident=182189):
ORA-00600: internal error code, arguments: [kdBlkCheckError], [3], [224], [38508], [], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_182189\orcl_smon_4372_i182189.trc
Starting background process QMNC
Sun Jul 31 23:18:19 2016
QMNC started with pid=20, OS id=4920
Doing block recovery for file 3 block 224
Resuming block recovery (PMON) for file 3 block 224
Block recovery from logseq 16, block 2945 to scn 15431544
Recovery of Online Redo Log: Thread 1 Group 1 Seq 16 Reading mem 0
  Mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG
Block recovery completed at rba 16.2952.16, scn 0.15431545
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_4372.trc:
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [kdBlkCheckError], [3], [224], [38508], [], [], [], [], [], [], [], []
Starting background process SMCO
Sun Jul 31 23:18:19 2016
SMCO started with pid=21, OS id=3176
Sun Jul 31 23:18:20 2016
Trace dumping is performing id=[cdmp_20160731231820]
Completed: ALTER DATABASE OPEN

尝试删除异常回滚段

Sun Jul 31 23:15:07 2016
drop rollback segment "_SYSSMU7_1101470402$"
Sun Jul 31 23:15:07 2016
Corrupt Block Found
         TSN = 2, TSNAME = UNDOTBS1
         RFN = 3, BLK = 224, RDBA = 12583136
         OBJN = -1, OBJD = -1, OBJECT = , SUBOBJECT =
         SEGMENT OWNER = , SEGMENT TYPE =
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_5300.trc  (incident=181035):
ORA-00600: 内部错误代码, 参数: [kdBlkCheckError], [3], [224], [38508], [], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_181035\orcl_ora_5300_i181035.trc
Doing block recovery for file 3 block 224
Resuming block recovery (PMON) for file 3 block 224
Block recovery from logseq 14, block 8682 to scn 15397854
Recovery of Online Redo Log: Thread 1 Group 2 Seq 14 Reading mem 0
  Mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO02.LOG
Block recovery completed at rba 14.8688.16, scn 0.15397855
ORA-607 signalled during: drop rollback segment "_SYSSMU7_1101470402$"...
Corrupt Block Found
         TSN = 2, TSNAME = UNDOTBS1
         RFN = 3, BLK = 224, RDBA = 12583136
         OBJN = -1, OBJD = -1, OBJECT = , SUBOBJECT =
         SEGMENT OWNER = , SEGMENT TYPE =

从这里看,我们可以确定file 3 block 224异常,导致删除回滚段异常.和mos官方给出来的案例类似,由于undo坏块导致数据库报ORA-600 999错误

mos中ORA-600 999报错信息
官方的益处ORA-600[999]报错,也是由于undo坏块引起和本文的报错基本上一致
ORA-600-999


因为只要部分数据,直接屏蔽回滚段,数据库不再crash,导出需要对象即可

_ALLOW_RESETLOGS_CORRUPTION

我相信_ALLOW_RESETLOGS_CORRUPTION 这个参数一定很多人都熟悉,是redo异常恢复的杀手锏之一,以下文章是来自官方的解释

DB_Parameter _ALLOW_RESETLOGS_CORRUPTION
========================================
This documentation has been prepared avoiding the mention of the complex
structures from the code and to simply give an insight to the 'damage it could
cause'.  The usage of this parameter leads to an in-consistent Database with no
other alternative but to rebuild the complete Database.  This parameter could
be used when we realize that there are no stardard options available and are
convinced that the customer understands the implications of using the Oracle's
secret parameter.  The factors to be considered are ;--
1. Customer does not have a good backup.
2. A lot of time and money has been invested after the last good backup and
   there is no possibility for reproduction of the lost data.
3. The customer has to be ready to export the full database and import it
   back after creating a new one.
4. There is no 100% guarantee that by using this parameter the database would
   come up.
5. Oracle does not support the database after using this parameter for
   recovery.
6. ALL OPTIONS including the ones mentioned in the action part of the error
   message have been tried.
By setting _ALLOW_RESETLOGS_CORRUPTION=TRUE, certain consistency checks are
SKIPPED during database open stage.  This basically means it does not check
the datafile headers as to what the status was before the shutdown and how it
was shutdown.  The following cases mention few of the checks that were skipped.
Case-I
------
Verification that the datafile present has not been restored from a BACKUP
taken before the database was opened successfully by using RESETLOGS.
ORA-01190: control file or data file %s is from before the last RESETLOGS
    Cause: Attempting to use a data file when the log reset information in
           the file does not match the control file.  Either the data file or
           the control file is a backup that was made before the most recent
           ALTER DATABASE OPEN RESETLOGS.
   Action: Restore file from a more recent backup.
Case-II
-------
Verification that the status bit of the datafile is not in a FUZZY state.
The datafile could be in this state due to the database going down when the
 - Datafile was on-line and open
 - Datafile was not closed cleanly (maybe due to OS).
ORA-01194: file %s needs more recovery to be consistent
    Cause: An incomplete recover session was started, but an insufficient
           number of logs were applied to make the file consistent.  The
           reported file was not closed cleanly when it was last opened by
           the database.  It must be recovered to a time when it was not
           being updated.  The most likely cause of this error is forgetting
           to restore the file from a backup before doing incomplete
           recovery.
   Action: Either apply more logs until the file is consistent or restore the
           file from an older backup and repeat recovery.
Case-III
--------
Verification that the COMPLETE recover strategies have been applied for
recovering the datafile and not any of the INCOMPLETE recovery options.
Basically because the complete recovery is one in which we even apply the
ON-LINE redo log files and open the DB without reseting the logs.
ORA-01113: file '%s' needs media recovery starting at log sequence # %s
    Cause: An attempt was made to open a database file that is in need of
           media recovery.
   Action: First apply media recovery to the file.
Case-IV
-------
Verification that the datafile has been recovered through an END BACKUP if the
control file indicates that it was in backup mode.
This is useful when the DB has crashed while in hot backup mode and we lost
all log files in DB version's less than V7.2.
ORA-01195: on-line backup of file %s needs more recovery to be consistent"
    Cause: An incomplete recovery session was started, but an insufficient
           number of logs were applied to make the file consistent.  The
           reported file is an on-line backup which must be recovered to the
           time the backup ended.
   Action: Either apply more logs until the file is consistent or resotre
          the database files from an older backup and repeat recovery.
In version 7.2, we could simply issue the ALTER DATABASE DATAFILE xxxx END
BACKUP statement and proceed with the recovery.  But again to issue this
statement, we need to have the ON-LINE redo logs or else we still are forced to
use this parameter.
Case-V
------
Verification that the data file status is not still in (0x10) MEDIA recovery
FUZZY.
When recovery is started, a flag is set in the datafile header status flag to
indicate that the file is presently in media recovery.  This is reset when
recovery is completed and at times when it has not been reset we are forced to
use this paramter.
ORA-01196: file %s is inconsistent due to a failed media recovery session
    Cause: The file was being recovered but the recovery did not terminate
           normally.  This left the file in an inconsistent state.  No more
           recovery was successfully completed on this file.
   Action: Either apply more logs until the file is consistent or restore the
           backup again and repeat recovery.
Case-VI
-------
Verification that the datafile has been restored form a proper backup to
correspond with the log files.  This situation could happen when we have
decided that the data file is invalid since its SCN is ahead of the last
applied logs SCN but it has not failed on one of the ABOVE CHECKS.
ORA-01152: file '%s' was not restored from a sufficientluy old backup"
    Cause: A manual recovery session was started, but an insufficient number
           of logs were applied to make the database consistent.  This file is
           still in the future of the last log applied.  Note that this
           mistake can not always be caught.
   Action: Either apply more logs until the database is consistent or
           restore the database file from an older backup and repeat
           recovery.

使用_ALLOW_RESETLOGS_CORRUPTION 参数需谨慎,因为该参数可能导致数据库逻辑不一致,甚至可能把本来很简单的一个恢复弄的非常复杂甚至不可恢复的后果,建议在oracle support支持下使用.另外使用该参数resetlogs库之后,强烈建议通过逻辑方式重建库

数据库恢复的敏感性—重建控制文件使用不合适数据文件

有客户数据库由于某种原因无法open,请求我们技术支持.通过检查alert日志发现数据库是由于ORA-600 kccpb_sanity_check_2错误.并且他们已经重建控制文件,通过我们的Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)
datafile 6 异常
wrong-file1
wrong-file
wrong-file-redo


通过这里我们发现datafile 6 数据文件头是干净的,而且对应的redo seq为5270,而其他文件头都是fuzzy,而且对应的redo为295902613.这里怀疑datafile 6 可能是错误的.当然对于这样scn相距比较大的情况,我们可以通过隐含参数,修改scn等方法强制让该库起来(或者该文件online),但是从现在看到的情况,文件很可能异常,这样强制恢复,可能没有实际意义.
分析alert日志
alert-log


这个里面可以发现是先删除了sde表空间,然后创建了同一个表空间,只是数据文件路径不一样了.而且正好在seq为5270的地方操作的.现在出现datafile 6异常的原因已经清楚,就是创建数据控制文件的时候,使用了错误的数据文件导致.

完美恢复数据库

D:\>sqlplus / as sysdba
SQL*Plus: Release 10.2.0.1.0 - Production on 星期六 7月 30 16:31:01 2016
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
连接到:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
SQL> shutdown immediate;
ORA-01109: 数据库未打开
已经卸载数据库。
ORACLE 例程已经关闭。
SQL> STARTUP NOMOUNT
ORACLE 例程已经启动。
Total System Global Area 5133828096 bytes
Fixed Size                  2011360 bytes
Variable Size            2382368544 bytes
Database Buffers         2734686208 bytes
Redo Buffers               14761984 bytes
SQL> CREATE CONTROLFILE REUSE DATABASE "LANDDB" NORESETLOGS  ARCHIVELOG
  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
  4      MAXDATAFILES 100
  5      MAXINSTANCES 8
  6      MAXLOGHISTORY 292
  7  LOGFILE
  8    GROUP 1 'F:\ORADATA\LANDDB\ONLINELOG\O1_MF_1_4JCM05KL_.LOG'  SIZE 50M,
  9    GROUP 2 'F:\ORADATA\LANDDB\ONLINELOG\O1_MF_2_4JCM064D_.LOG'  SIZE 50M,
 10    GROUP 3 'F:\ORADATA\LANDDB\ONLINELOG\O1_MF_3_4JCM06PG_.LOG'  SIZE 50M
 11  DATAFILE
 12    'F:\ORADATA\LANDDB\DATAFILE\O1_MF_SYSTEM_4JCLYY6T_.DBF',
 13    'F:\ORADATA\LANDDB\DATAFILE\O1_MF_UNDOTBS1_4JCLYY8S_.DBF',
 14    'F:\ORADATA\LANDDB\DATAFILE\O1_MF_SYSAUX_4JCLYY7B_.DBF',
 15    'F:\ORADATA\LANDDB\DATAFILE\O1_MF_USERS_4JCLYY98_.DBF',
 16    'F:\ORADATA\LANDDB\DATAFILE\FUJIAN',
 17    'D:\data\sde.dbf'
 18  CHARACTER SET ZHS16GBK
 19  ;
控制文件已创建。
SQL> recover database;
完成介质恢复。
SQL> alter database open;
数据库已更改。
SQL>

这个库比较幸运,客户发现异常之后,里面停止了有风险性的操作(比如使用_allow_resetlogs_corruption参数,resetlogs库等),使得数据完美恢复0丢失.如果条件允许最好使用老的控制文件来重建新控制文件,而不要通过人工去系统中找数据文件来实现恢复,这样很可能有遗落或者使用错误的数据文件