ORA-600 4042 故障恢复

通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check) 检查结果
wrong_scn
通过上图可以知道file 2未能正常恢复(需要看日志分析原因),file 3以前就被offline,需要历史归档(非归档状态,所以这个先放着,后续再处理)

分析file 2 不成功原因

Wed Aug  3 15:21:11 2016
ALTER DATABASE RECOVER  datafile 2
Wed Aug  3 15:21:11 2016
Media Recovery Start
 parallel recovery started with 2 processes
Wed Aug  3 15:21:11 2016
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1916 Reading mem 0
  Mem# 0 errs 0: /home/oracle/orabase/oradata/ORACLE/redo01.log
Wed Aug  3 15:21:11 2016
Errors in file /u01/app/oracle/admin/oracle/bdump/oracle_p001_22017.trc:
ORA-00600: internal error code, arguments: [3020], [2], [41], [8388649], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 41)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 2: '/home/oracle/orabase/oradata/ORACLE/undotbs01.dbf'
ORA-10560: block type '0'
Wed Aug  3 15:21:13 2016
Errors in file /u01/app/oracle/admin/oracle/bdump/oracle_p001_22017.trc:
ORA-00600: internal error code, arguments: [3020], [2], [41], [8388649], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 41)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 2: '/home/oracle/orabase/oradata/ORACLE/undotbs01.dbf'
ORA-10560: block type '0'
Wed Aug  3 15:21:18 2016
Media Recovery failed with error 12801
ORA-283 signalled during: ALTER DATABASE RECOVER  datafile 2  ...

通过日志可以知道由于ORA-600 3020导致file 2不能正常的恢复.
处理file 2

SQL> recover  datafile 2 allow 1 corruption;
Media recovery complete.

Thu Aug  4 01:58:35 2016
ALTER DATABASE RECOVER  datafile 2 allow 1 corruption
Media Recovery Start
 ALLOW CORRUPTION option must use serial recovery
Thu Aug  4 01:58:35 2016
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1916 Reading mem 0
  Mem# 0 errs 0: /home/oracle/orabase/oradata/ORACLE/redo01.log
Thu Aug  4 01:58:35 2016
Media Recovery Complete (oracle)
Completed: ALTER DATABASE RECOVER  datafile 2 allow 1 corruption

尝试open数据库

SQL> alter database open ;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced

Thu Aug  4 01:59:20 2016
alter database open
Thu Aug  4 01:59:21 2016
Beginning crash recovery of 1 threads
 parallel recovery started with 2 processes
Thu Aug  4 01:59:21 2016
Started redo scan
Thu Aug  4 01:59:21 2016
Completed redo scan
 1619 redo blocks read, 0 data blocks need recovery
Thu Aug  4 01:59:21 2016
Started redo application at
 Thread 1: logseq 1916, block 12724
Thu Aug  4 01:59:21 2016
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1916 Reading mem 0
  Mem# 0 errs 0: /home/oracle/orabase/oradata/ORACLE/redo01.log
Thu Aug  4 01:59:21 2016
Completed redo application
Thu Aug  4 01:59:21 2016
Completed crash recovery at
 Thread 1: logseq 1916, block 14343, scn 3303614971196
 0 data blocks read, 0 data blocks written, 1619 redo blocks read
Thu Aug  4 01:59:21 2016
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=18, OS id=5542
Thu Aug  4 01:59:21 2016
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=19, OS id=5544
Thu Aug  4 01:59:21 2016
Thread 1 advanced to log sequence 1917
Thread 1 opened at log sequence 1917
  Current log# 2 seq# 1917 mem# 0: /home/oracle/orabase/oradata/ORACLE/redo02.log
Successful open of redo thread 1
Thu Aug  4 01:59:21 2016
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Aug  4 01:59:21 2016
ARC1: STARTING ARCH PROCESSES
Thu Aug  4 01:59:21 2016
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Thu Aug  4 01:59:21 2016
SMON: enabling cache recovery
Thu Aug  4 01:59:21 2016
ARC2: Archival started
ARC1: STARTING ARCH PROCESSES COMPLETE
ARC1: Becoming the heartbeat ARCH
ARC2 started with pid=20, OS id=5546
Thu Aug  4 01:59:21 2016
db_recovery_file_dest_size of 2048 MB is 1.05% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Thu Aug  4 01:59:22 2016
Errors in file /u01/app/oracle/admin/oracle/udump/oracle_ora_5505.trc:
ORA-00600: internal error code, arguments: [4042], [0], [], [], [], [], [], []
Thu Aug  4 01:59:23 2016
Errors in file /u01/app/oracle/admin/oracle/udump/oracle_ora_5505.trc:
ORA-00600: internal error code, arguments: [4042], [0], [], [], [], [], [], []
Thu Aug  4 01:59:23 2016
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 5505
ORA-1092 signalled during: alter database open ...

由于ORA-600 4042错误导致数据库无法正常open.
分析ORA-600 4042

PARSING IN CURSOR #4 len=142 dep=1 uid=0 oct=3 lid=0 tim=1435788503594313 hv=361892850 ad='a7ab2db8'
select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,
DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1
END OF STMT
PARSE #4:c=0,e=11,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,tim=1435788503594311
BINDS #4:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=2aae75802218  bln=22  avl=02  flg=05
  value=3
EXEC #4:c=0,e=39,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,tim=1435788503594393
FETCH #4:c=0,e=8,p=0,cr=2,cu=0,mis=0,r=1,dep=1,og=3,tim=1435788503594412
STAT #4 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=0 pw=0 time=8 us)'
STAT #4 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=0 pw=0 time=3 us)'
WAIT #1: nam='db file sequential read' ela= 10 file#=2 block#=41 blocks=1 obj#=-1 tim=1435788503594468
Dump of buffer cache at level 4 for tsn=1, rdba=8388649
BH (0x95ff3c58) file#: 2 rdba: 0x00800029 (2/41) class: 21 ba: 0x95ef0000
  set: 3 blksize: 8192 bsi: 0 set-flg: 2 pwbcnt: 0
  dbwrid: 0 obj: -1 objn: 0 tsn: 1 afn: 2
  hash: [a8b77880,a8b77880] lru: [95ff3dd0,a8e70338]
  ckptq: [NULL] fileq: [NULL] objq: [a43da110,a43da110]
  use: [a8e6e658,a8e6e658] wait: [NULL]
  st: XCURRENT md: SHR tch: 0
  flags: gotten_in_current_mode
  LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
  buffer tsn: 1 rdba: 0x00800029 (2/41)
  scn: 0x0000.00000000 seq: 0x01 flg: 0x01 tail: 0x00000001
  frmt: 0x02 chkval: 0x0000 type: 0x00=unknown
Hex dump of block: st=0, typ_found=0
Dump of memory from 0x0000000095EF0000 to 0x0000000095EF2000
095EF0000 0000A200 00800029 00000000 01010000  [....)...........]
095EF0010 00000000 00000000 00000000 00000000  [................]
        Repeat 509 times
095EF1FF0 00000000 00000000 00000000 00000001  [................]
Dump of memory from 0x0000000095EF0014 to 0x0000000095EF1FFC
095EF0010          00000000 00000000 00000000      [............]
095EF0020 00000000 00000000 00000000 00000000  [................]

这里可以发现,file 2 block 41的type为unknown,注意观察ORA-600 3020的错误,我们发现当时报的坏块也正好是该block.基本上可以确定由于前面的allow 1 corruption操作导致了后面的ORA-600 4042的错误.官方关于ORA-600[4042]解释
ORA-600-4042

通过修改undo$中的回滚段状态(参考:bbed修改undo$(回滚段)状态)
正常open数据库,修改file 3的scn并online数据文件

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.
Total System Global Area 1224736768 bytes
Fixed Size                  2020384 bytes
Variable Size             318770144 bytes
Database Buffers          889192448 bytes
Redo Buffers               14753792 bytes
Database mounted.
SQL>   SELECT thread#,
  2           a.sequence#,
  3           a.group#,
  4           TO_CHAR (first_change#, '9999999999999999') "SCN",
  5           a.status,
  6           MEMBER
  7      FROM v$log a, v$logfile b
  8     WHERE a.group# = B.GROUP#
  9  ORDER BY a.sequence# DESC;
   THREAD#  SEQUENCE#     GROUP# SCN
---------- ---------- ---------- ----------------------------------
STATUS
--------------------------------
MEMBER
--------------------------------------------------------------------------------
         1       1919          1     3303615011212
CURRENT
/home/oracle/orabase/oradata/ORACLE/redo01.log
         1       1918          3     3303614991206
INACTIVE
/home/oracle/orabase/oradata/ORACLE/redo03.log
   THREAD#  SEQUENCE#     GROUP# SCN
---------- ---------- ---------- ----------------------------------
STATUS
--------------------------------
MEMBER
--------------------------------------------------------------------------------
         1       1917          2     3303614971197
INACTIVE
/home/oracle/orabase/oradata/ORACLE/redo02.log
SQL> recover database using backup controlfile;
ORA-00279: change 3303615011452 generated at 08/04/2016 02:06:52 needed for
thread 1
ORA-00289: suggestion :
/u01/app/oracle/flash_recovery_area/ORACLE/archivelog/2016_08_04/o1_mf_1_1919_%u
_.arc
ORA-00280: change 3303615011452 for thread 1 is in sequence #1919
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/home/oracle/orabase/oradata/ORACLE/redo01.log
Log applied.
Media recovery complete.
SQL> alter database datafile 3 online;
Database altered.
SQL> alter database open resetlogs;
Database altered.
SQL>

至此该数据库基本上恢复完成,强烈建议使用逻辑方式导出导入重建库.

ORA-600 4042 故障恢复

发表评论 取消回复

发表评论取消回复