记录一次ORA-600 4000数据库故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:记录一次ORA-600 4000数据库故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

ORA-600[4000]错误
一朋友数据库因为当前redo丢失,在恢复的过程中启动报ORA-600[4000]错误

SMON: enabling cache recovery
Thu May 30 16:24:17 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/udump/xifenfei1_ora_1458370.trc:
ORA-00600: internal error code, arguments: [4000], [83], [], [], [], [], [], []
Thu May 30 16:24:19 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/udump/xifenfei1_ora_1458370.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [83], [], [], [], [], [], []
Thu May 30 16:24:19 2013
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 1458370
ORA-1092 signalled during: alter database open resetlogs...

分析trace文件

*** 2013-05-30 16:24:17.979
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4000], [83], [], [], [], [], [], []
Current SQL statement for this session:
select ctime, mtime, stime from obj$ where obj# = :1
--确定是obj$对象异常,通过某种手段找到obj$的objid和dataobjid均为16,对应16进制为12
Block header dump:  0x0040007a
 Object id on Block? Y
 seg/obj: 0x12  csc: 0xc1e.a329e76f  itc: 1  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0053.02a.000598bd  0x0d407e46.4f52.2f  --U-    1  fsc 0x0000.a329e772

这里比较明显obj$对象在rdba为0040007a的block上,scn为0c1e.a329e76f(13325725984623)且未提交的事务,这样的现象就决定了处理的特殊性(不是因为块延迟清理导致访问undo现象,该现象直接推进scn解决,而该情况不行)

数据文件头scn

SQL> SELECT DISTINCT CHECKPOINT# FROM V$DATAFILE_HEADER;
    CHECKPOINT_CHANGE#
-------------------------
           13324676536960

bbed查看文件头scn

   struct kcvfhckp, 160 bytes               @484
      struct kcvcpscn, 8 bytes              @484
         ub4 kscnbas                        @484      0x649c9a80
         ub2 kscnwrp                        @488      0x0c1e

这里看到的文件头scn也是为13324676536960(0c1e.649c9a80)和sql查询结果一致,也就是说数据库中的obj$的某个对象含有事务,且scn大于文件头scn(因为当前redo丢失,无法前滚,所以出现该情况),当数据库访问obj$的时候,为了事务的一致性,就需要访问undo(这里提示为83 回滚段),而undo异常,所以smon进程回滚失败,数据库无法正常启动

使用bbed提交事务

BBED> map
 File: /oradata/sys/xifenfei/system01.dbf (1)
 Block: 122                                   Dba:0x0040007a
------------------------------------------------------------
 KTB Data Block (Table/Cluster)
 struct kcbh, 20 bytes                      @0
 struct ktbbh, 48 bytes                     @20
 struct kdbh, 14 bytes                      @68
 struct kdbt[1], 4 bytes                    @82
 sb2 kdbr[108]                              @86
 ub1 freespace[802]                         @302
 ub1 rowdata[7084]                          @1104
 ub4 tailchk                                @8188
BBED> p ktbbh
struct ktbbh, 48 bytes                      @20
   ub1 ktbbhtyp                             @20       0x01 (KDDBTDATA)
   union ktbbhsid, 4 bytes                  @24
      ub4 ktbbhsg1                          @24       0x00000012
      ub4 ktbbhod1                          @24       0x00000012
   struct ktbbhcsc, 8 bytes                 @28
      ub4 kscnbas                           @28       0xa329e76f
      ub2 kscnwrp                           @32       0x0c1e
   b2 ktbbhict                              @36       1
   ub1 ktbbhflg                             @38       0x02 (NONE)
   ub1 ktbbhfsl                             @39       0x00
   ub4 ktbbhfnx                             @40       0x00000000
   struct ktbbhitl[0], 24 bytes             @44
      struct ktbitxid, 8 bytes              @44
         ub2 kxidusn                        @44       0x0053
         ub2 kxidslt                        @46       0x002a
         ub4 kxidsqn                        @48       0x000598bd
      struct ktbituba, 8 bytes              @52
         ub4 kubadba                        @52       0x0d407e46
         ub2 kubaseq                        @56       0x4f52
         ub1 kubarec                        @58       0x2f
      ub2 ktbitflg                          @60       0x2001 (KTBFUPB)<--需要提交
      union _ktbitun, 2 bytes               @62
         b2 _ktbitfsc                       @62       0
         ub2 _ktbitwrp                      @62       0x0000
      ub4 ktbitbas                          @64       0xa329e772
BBED> set count 32
        COUNT           32
BBED> set offset 60
        OFFSET          60
BBED> d
 File: /oradata/sys/xifenfei/system01.dbf (1)
 Block: 122              Offsets:   60 to   91           Dba:0x0040007a
------------------------------------------------------------------------
 20010000 a329e772 0001006c ffff00ea 040c0368 03680000 006c1f7c 1f3c1efb
 <32 bytes per line>
BBED> m /x 8001
 File: /oradata/sys/xifenfei/system01.dbf (1)
 Block: 122              Offsets:   60 to   91           Dba:0x0040007a
------------------------------------------------------------------------
 80010000 a329e772 0001006c ffff00ea 040c0368 03680000 006c1f7c 1f3c1efb
 <32 bytes per line>
BBED> sum apply
Check value for File 1, Block 122:
current = 0xafd6, required = 0xafd6

尝试open数据库ORA-600[2662]解决

Thu May 30 21:16:00 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/bdump/xifenfei1_smon_819664.trc:
ORA-00600: internal error code, arguments:[2662],[3102],[2737532996],[3102],[2745973074],[4194397],[],[]
Non-fatal internal error happenned while SMON was doing non-existent object cleanup.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Thu May 30 21:16:01 2013
Trace dumping is performing id=[cdmp_20130530211601]
Thu May 30 21:16:02 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/bdump/xifenfei1_smon_819664.trc:
ORA-00600: internal error code, arguments:[2662],[3102],[2737532997],[3102],[2745973074],[4194397],[],[]
Thu May 30 21:16:03 2013
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.
SMON encountered 2 out of maximum 100 non-fatal internal errors.
Thu May 30 21:16:05 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/bdump/xifenfei1_smon_819664.trc:
ORA-00600: internal error code, arguments:[2662],[3102],[2737532997],[3102],[2745973074],[4194397],[],[]
Thu May 30 21:16:08 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/bdump/xifenfei1_pmon_958764.trc:
ORA-00474: SMON process terminated with error
Thu May 30 21:16:08 2013
PMON: terminating instance due to error 474

数据库在open过程中遇到大量ORA-00600[2662],这个是因为数据库中文件头的scn小于访问的数据块scn导致该问题,解决方法推荐scn,如果数据库的scn本身就很大(和时间理论scn较接近),推进过程中可能遇到如下错误,这个时候就需要选择合适的方法/合适的值来推进scn

SQL> startup pfile=/home/oracle/pfile force
ORACLE instance started.
Total System Global Area 5.5835E+10 bytes
Fixed Size                  2177056 bytes
Variable Size            3.2867E+10 bytes
Database Buffers         2.2951E+10 bytes
Redo Buffers               14598144 bytes
Database mounted.
ORA-01052: required destination LOG_ARCHIVE_DUPLEX_DEST is not specified

后面的工作因为没有redo前滚,而且该库故障时有大量事务在跑,现在无法前滚,导致大量的undo回滚段异常,index和data不一致等故障,需要做的就是屏蔽undo seg,重建undo,重建库

重现ORA-600 4000异常

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:重现ORA-600 4000异常

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

对于数据库恢复感兴趣的人,一定对于ORA-600[4000]这个著名的错误记忆犹新,这里通过试验重现ORA-600[4000]
查询数据库obj$.con$记录

[oracle@xifenfei ~]$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.1.0 - Production on Fri Nov 4 06:32:36 2011
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area  318767104 bytes
Fixed Size                  1219160 bytes
Variable Size              92276136 bytes
Database Buffers          218103808 bytes
Redo Buffers                7168000 bytes
Database mounted.
Database opened.
SQL> select OBJ#,dbms_rowid.rowid_relative_fno(rowid) rel_fno,
  2  dbms_rowid.rowid_block_number(rowid) block_num from obj$
  3  where name='CON$';
      OBJ#    REL_FNO  BLOCK_NUM
---------- ---------- ----------
        28          1        122
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

bbed修改数据块
根据催华的研究,当OBJ$中部分数据块中的csc和itl的scn都大于header scn的时候,数据库启动就会报ORA-600[4000]

[oracle@xifenfei ~]$ bbed listfile=list mode=edit password=blockedit
BBED: Release 2.0.0.0.0 - Limited Production on Fri Nov 4 06:47:09 2011
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
************* !!! For Oracle Internal Use only !!! ***************
BBED> info
 File#  Name                                                        Size(blks)
 -----  ----                                                        ----------
     1  /u01/oracle/oradata/XFF/system01.dbf                                 0
     2  /u01/oracle/oradata/XFF/undotbs01.dbf                                0
     3  /u01/oracle/oradata/XFF/sysaux01.dbf                                 0
     4  /u01/oracle/oradata/XFF/users01.dbf                                  0
     5  /u01/oracle/oradata/XFF/datfttuser.dbf                               0
BBED> set block 1
        BLOCK#          1
BBED> set file 1 block 1
        FILE#           1
        BLOCK#          1
BBED>  p kcvfhckp.kcvcpscn
struct kcvcpscn, 8 bytes                    @484
   ub4 kscnbas                              @484      0x00210f97
   ub2 kscnwrp                              @488      0x0000
BBED> set block 122
        BLOCK#          122
BBED> p ktbbh
struct ktbbh, 48 bytes                      @20
   ub1 ktbbhtyp                             @20       0x01 (KDDBTDATA)
   union ktbbhsid, 4 bytes                  @24
      ub4 ktbbhsg1                          @24       0x00000012
      ub4 ktbbhod1                          @24       0x00000012
   struct ktbbhcsc, 8 bytes                 @28         <==csc(SCN of the last block cleanout)
      ub4 kscnbas                           @28       0x0020770d
      ub2 kscnwrp                           @32       0x0000
   sb2 ktbbhict                             @36       1
   ub1 ktbbhflg                             @38       0x02 (NONE)
   ub1 ktbbhfsl                             @39       0x00
   ub4 ktbbhfnx                             @40       0x00000000
   struct ktbbhitl[0], 24 bytes             @44
      struct ktbitxid, 8 bytes              @44
         ub2 kxidusn                        @44       0x0005  <==回滚段序号
         ub2 kxidslt                        @46       0x0029
         ub4 kxidsqn                        @48       0x0000029a
      struct ktbituba, 8 bytes              @52
         ub4 kubadba                        @52       0x00802381
         ub2 kubaseq                        @56       0x01f9
         ub1 kubarec                        @58       0x03
      ub2 ktbitflg                          @60       0x2001 (KTBFUPB)
      union _ktbitun, 2 bytes               @62
         sb2 _ktbitfsc                      @62       0
         ub2 _ktbitwrp                      @62       0x0000
      ub4 ktbitbas                          @64       0x0020770e <==itl commit scn
BBED> set count 16
        COUNT           16
BBED> m /x 0d772010 offset 28
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: /u01/oracle/oradata/XFF/system01.dbf (1)
 Block: 122              Offsets:   28 to  60           Dba:0x0040007a
------------------------------------------------------------------------
 0d772010 00000000 01000200 00000000
 <32 bytes per line>
BBED> m /x 0e772010 offset 64
 File: /u01/oracle/oradata/XFF/system01.dbf (1)
 Block: 122              Offsets:   64 to  96           Dba:0x0040007a
------------------------------------------------------------------------
 0e772010 00016c00 ffffea00 53046903
 <32 bytes per line>
BBED> p ktbbh
struct ktbbh, 48 bytes                      @20
   ub1 ktbbhtyp                             @20       0x01 (KDDBTDATA)
   union ktbbhsid, 4 bytes                  @24
      ub4 ktbbhsg1                          @24       0x00000012
      ub4 ktbbhod1                          @24       0x00000012
   struct ktbbhcsc, 8 bytes                 @28
      ub4 kscnbas                           @28       0x1020770d
      ub2 kscnwrp                           @32       0x0000
   sb2 ktbbhict                             @36       1
   ub1 ktbbhflg                             @38       0x02 (NONE)
   ub1 ktbbhfsl                             @39       0x00
   ub4 ktbbhfnx                             @40       0x00000000
   struct ktbbhitl[0], 24 bytes             @44
      struct ktbitxid, 8 bytes              @44
         ub2 kxidusn                        @44       0x0005
         ub2 kxidslt                        @46       0x0029
         ub4 kxidsqn                        @48       0x0000029a
      struct ktbituba, 8 bytes              @52
         ub4 kubadba                        @52       0x00802381
         ub2 kubaseq                        @56       0x01f9
         ub1 kubarec                        @58       0x03
      ub2 ktbitflg                          @60       0x2001 (KTBFUPB)
      union _ktbitun, 2 bytes               @62
         sb2 _ktbitfsc                      @62       0
         ub2 _ktbitwrp                      @62       0x0000
      ub4 ktbitbas                          @64       0x1020770e
BBED> sum apply
Check value for File 1, Block 122:
current = 0xc902, required = 0xc902

启动数据库

SQL> startup
ORACLE instance started.
Total System Global Area  318767104 bytes
Fixed Size                  1219160 bytes
Variable Size              92276136 bytes
Database Buffers          218103808 bytes
Redo Buffers                7168000 bytes
Database mounted.
ORA-01092: ORACLE instance terminated. Disconnection forced

查看日志

Fri Nov  4 06:50:38 2011
Database mounted in Exclusive Mode
Completed: ALTER DATABASE   MOUNT
Fri Nov  4 06:50:38 2011
ALTER DATABASE OPEN
Fri Nov  4 06:50:38 2011
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=16, OS id=7048
Fri Nov  4 06:50:38 2011
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=17, OS id=7050
ARC1: STARTING ARCH PROCESSES
Fri Nov  4 06:50:38 2011
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Fri Nov  4 06:50:38 2011
Thread 1 opened at log sequence 38
  Current log# 3 seq# 38 mem# 0: /u01/oracle/oradata/XFF/redo03.log
Successful open of redo thread 1
Fri Nov  4 06:50:38 2011
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Nov  4 06:50:38 2011
SMON: enabling cache recovery
Fri Nov  4 06:50:38 2011
ARC2: Archival started
ARC1: STARTING ARCH PROCESSES COMPLETE
ARC1: Becoming the heartbeat ARCH
ARC2 started with pid=18, OS id=7052
Fri Nov  4 06:50:38 2011
Errors in file /u01/oracle/admin/XFF/udump/xff_ora_7046.trc:
ORA-00600: internal error code, arguments: [4000], [5], [], [], [], [], [], []
Fri Nov  4 06:50:40 2011
Errors in file /u01/oracle/admin/XFF/udump/xff_ora_7046.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [5], [], [], [], [], [], []
Fri Nov  4 06:50:40 2011
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 7046
ORA-1092 signalled during: ALTER DATABASE OPEN...

在ORACL 11.2的版本中,同样的方法无法重新该错误,数据库依然能够正常打开,所以如果要测试的朋友请选择11G以下版本进行.