ORA-00600[kcratr1_lostwrt]/ORA-00600[3020]错误恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-00600[kcratr1_lostwrt]/ORA-00600[3020]错误恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

open数据库alert日志报ORA-00600[kcratr1_lostwrt]错误

Mon May 14 14:57:28 2012
ALTER DATABASE OPEN
Mon May 14 14:57:29 2012
Beginning crash recovery of 1 threads
Mon May 14 14:57:29 2012
Started redo scan
Mon May 14 14:57:29 2012
Errors in file d:\oracle\admin\cqgasold\udump\cqgasold_ora_504.trc:
ORA-00600: 内部错误代码,参数: [kcratr1_lostwrt], [], [], [], [], [], [], []
ORA-600 signalled during: alter database open...

查询相关SCN
同一个查询中SCN相同,省略

SQL> select file#,online_status "STATUS",to_char(change#,'9999999999999999') "SCN",
  2  To_char(time,'yyyy-mm-dd hh24:mi:ss')"TIME" from v$recover_file;
未选定行
SQL> select file#,to_char(checkpoint_change#,'999999999999999') "SCN",
  2  to_char(last_change#,'999999999999999')"STOP_SCN" from v$datafile;
     FILE# SCN              STOP_SCN
---------- ---------------- ----------------
         1       7842987188
         2       7842987188
         3       7842987188
SQL> select file#,to_char(checkpoint_change#,'9999999999999999') "SCN",
  2  to_char(RESETLOGS_CHANGE#,'9999999999999999') "RESETLOGS SCN"
  3  from v$datafile_header;
     FILE# SCN               RESETLOGS SCN
---------- ----------------- -----------------
         1        7842991811                 1
         2        7842991811                 1
         3        7842991811                 1

这里看到奇怪现象datafile scn小于datafile_header scn,数据库异常断电一般来说也不会出现这样的情况,个人猜测是错误的恢复或者使用历史控制文件导致,对于这样的现状,我先尝试着使用using backup controlfile方式恢复,结果失败.估计控制文件有异常,本着先拉起库原则,重建控制文件.

进行完全恢复

SQL> recover database;
ORA-00283: 恢复会话因错误而取消
ORA-00600: 内部错误代码,参数: [3020], [8388617], [1], [23403], [25], [112],[], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 9)
ORA-10564: tablespace UNDOTBS1
ORA-01110: 数据文件 2: 'D:\ORACLE\ORADATA\CQGASOLD\UNDO_1.DBF'
ORA-10560: block type 'KTU SMU HEADER BLOCK'

尝试跳过坏块继续恢复

SQL> recover database allow 1 corruption;
ORA-00283: 恢复会话因错误而取消
ORA-00600: 内部错误代码,参数: [3020], [8388610], [1], [23403], [2264], [16],[], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 2)
ORA-10564: tablespace UNDOTBS1
ORA-01110: 数据文件 2: 'D:\ORACLE\ORADATA\CQGASOLD\UNDO_1.DBF'
ORA-10560: block type 'KTFB Bitmapped File Space Header'

使用dbv检查坏块数量

C:\>dbv file='d:\oracle\oradata\cqgasold\undo_1.dbf' blocksize=8192
DBVERIFY: Release 9.2.0.5.0 - Production on 星期二 5月 15 19:43:42 2012
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.
DBVERIFY - 验证正在开始 : FILE = d:\oracle\oradata\cqgasold\undo_1.dbf
DBV-00200: 块, dba 8388617, 已经标记为崩溃
汇入的页369 - 可能是介质损坏
***
Corrupt block relative dba: 0x00800171 (file 2, block 369)
Fractured block found during dbv:
Data in bad block -
 type: 2 format: 2 rdba: 0x00800171
 last change scn: 0x0001.d37c57db seq: 0x1 flg: 0x04
 consistency value in tail: 0x4e930260
 check value in block header: 0x8202, computed block checksum: 0x4e5f
 spare1: 0x0, spare2: 0x0, spare3: 0x0
***
汇入的页417 - 可能是介质损坏
***
Corrupt block relative dba: 0x008001a1 (file 2, block 417)
Fractured block found during dbv:
Data in bad block -
 type: 2 format: 2 rdba: 0x008001a1
 last change scn: 0x0001.d37c53d4 seq: 0x2 flg: 0x04
 consistency value in tail: 0x4b6b0201
 check value in block header: 0x6ae7, computed block checksum: 0x5abc
 spare1: 0x0, spare2: 0x0, spare3: 0x0
***
…………
--类此记录很多,我放弃了跳过坏块修复的方法

恢复过程中提示坏块数据库文件离线恢复

SQL> alter database datafile 'd:\oracle\oradata\cqgasold\undo_1.dbf' offline;
数据库已更改。
SQL> recover database;
完成介质恢复。
SQL> alter database open;
alter database open
*
ERROR 位于第 1 行:
ORA-00604: 递归 SQL 层 1 出现错误
ORA-00376: 此时无法读取文件 2
ORA-01110: 数据文件 2: 'D:\ORACLE\ORADATA\CQGASOLD\UNDO_1.DBF'

到了这一步,根据经验,数据库被open的可能性很多了,很有可能是open以后因为smon回滚导致数据库down

查看日志,屏蔽回滚段,完成恢复

Tue May 15 19:59:52 2012
alter database open
Tue May 15 19:59:52 2012
Beginning crash recovery of 1 threads
Tue May 15 19:59:52 2012
Started redo scan
Tue May 15 19:59:52 2012
Completed redo scan
 323 redo blocks read, 82 data blocks need recovery
Tue May 15 19:59:52 2012
Started recovery at
 Thread 1: logseq 23404, block 3, scn 0.0
Recovery of Online Redo Log: Thread 1 Group 4 Seq 23404 Reading mem 0
  Mem# 0 errs 0: F:\ORACLE\ORADATA\LOGCQGASOLD4.ORA
Tue May 15 19:59:52 2012
Completed redo application
Tue May 15 19:59:52 2012
Ended recovery at
 Thread 1: logseq 23404, block 326, scn 1.3548264979
 82 data blocks read, 82 data blocks written, 323 redo blocks read
Crash recovery completed successfully
Tue May 15 19:59:53 2012
Thread 1 advanced to log sequence 23405
Thread 1 opened at log sequence 23405
  Current log# 2 seq# 23405 mem# 0: D:\ORACLE\ORADATA\CQGASOLD\REDO02.LOG
Successful open of redo thread 1
Tue May 15 19:59:53 2012
SMON: enabling cache recovery
SMON: enabling tx recovery
Tue May 15 19:59:54 2012
Database Characterset is ZHS16GBK
Tue May 15 19:59:55 2012
replication_dependency_tracking turned off (no async multimaster replication found)
ORA-604 signalled during: alter database open...
Tue May 15 19:59:56 2012
SMON: about to recover undo segment 1
SMON: mark undo segment 1 as needs recovery
SMON: about to recover undo segment 2
SMON: mark undo segment 2 as needs recovery
SMON: about to recover undo segment 3
SMON: mark undo segment 3 as needs recovery
SMON: about to recover undo segment 4
SMON: mark undo segment 4 as needs recovery
SMON: about to recover undo segment 5
SMON: mark undo segment 5 as needs recovery
SMON: about to recover undo segment 6
SMON: mark undo segment 6 as needs recovery
SMON: about to recover undo segment 7
SMON: mark undo segment 7 as needs recovery
SMON: about to recover undo segment 8
SMON: mark undo segment 8 as needs recovery
SMON: about to recover undo segment 9
SMON: mark undo segment 9 as needs recovery
SMON: about to recover undo segment 10
SMON: mark undo segment 10 as needs recovery
Tue May 15 20:00:37 2012
Shutting down instance (abort)

看到这里,可以大概确定是因为undo文件离线,导致回滚段异常.
这个问题,基本上可以确定通过隐含参数屏蔽回滚段,然后open数据库,重建undo删除异常undo,数据库恢复完成。

One thought on “ORA-00600[kcratr1_lostwrt]/ORA-00600[3020]错误恢复

  1. ORA-600 [3020] “Stuck Recovery” [ID 30866.1]

    Applies to:
    Oracle Server - Enterprise Edition - Version 9.2.0.1 to 11.2.0.2 [Release 9.2 to 11.2]
    Information in this document applies to any platform.
    Symptoms
    Recovery session fails:
    SQL> recover database;
    ORA-00283: recovery session canceled due to errors
    ORA-00600: internal error code, arguments: [3020], [13204236], [1], [1],
    [33082], [236], [], []
    ORA-10567: Redo is inconsistent with data block (file# 3, block# 621324)
    ORA-10564: tablespace DATA
    ORA-01110: data file 3: '<dir>/PROD_Data01.dbf'
    ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 25535
    Cause
    This is stuck recovery - redo cannot be applied to a block because the scn of the block, when
    read from disk is NOT the scn that we expect it to be.
    Solution
    If the affected file belongs to SYSTEM or UNDO tablespace you should restore  the database
    from backup and do point in time recovery to just before the problem log is applied.
    If you do not have a backup available, than the options are VERY limited.
    Please open a Service Request with Oracle Support Services for assistance on this.
    If the affected file belongs to SYSAUX, this is the ONLY file affected and it is offline
    then a better option than point in time recovery of the whole database would be to create a NEW database
    and then use Transportable Tablespace feature to plug in all other tablespaces.  Another point worth noting:
    if SYSAUX is ONLINE but contains corrupt blocks, then before considering restore and recovery, use the SQL
    in point 3 below to identify the affected object  and raise a call with Oracle Support Services to ask if
    it is possible to drop and recreate the object  (unpublished Note 333665.1).
    For Data Guard environments, refer to Note:1265884.1
    Otherwise:
    1. Use Trial Recovery to determine the extent of the problem:
    SQL> recover database test;
    This will tell you how many blocks (n) would be left corrupted after recovery - check the alert
    log for details of the blocks affected.  If there are a large number of corruptions reported
    you may decide to restore from backup and issue point in time recovery.  However,
    if only a few blocks are reported as corrupt you could proceed with recovery :
    2. Skip the corrupted block(s)
    SQL> recover database allow 1 corruption;
    Do this <n> times,  <n> being the number of blocks reported as corrupt in step 1 above.
    This will allow recovery to continue,  'skipping' the blocks that cannot be recovered and leaving
    them marked as 'corrupt' after which the database can be opened.
    3. Take the corrupt blocks reported in the alert log and for each,
    identify the object that the block belongs to:
    SQL> SELECT tablespace_name, segment_type, owner, segment_name
         FROM dba_extents
         WHERE file_id = <file#>
           and <block> between block_id AND block_id + blocks - 1;
    Replacing <file#> and <block> with the file# and block ids reported in the alert log.
    For each object identified - take steps to resolve the corruption:
    - if it belongs to an index then simply drop and recreate the index
    - if it belong to a user object then consider recreating the object or extracting
    what you can from the object; if necessary raise a Service Request with Oracle and
    request assistance with extracting data from a corrupt object.
    Additional related documents :
    Note 1265884.1 Resolving ORA-752 or ORA-600 [3020] During Standby Recovery
    Note 283269.1 Stuck recovery of database ORA-00600[3020]
    
  2. VERSIONS:
      version 6.0 and above
    DESCRIPTION:
      This is called a 'STUCK RECOVERY'.
      There is an inconsistency between the information stored in the redo
      and the information stored in a database block being recovered.
    ARGUMENTS:
    For Oracle 9.2 and earlier:
      Arg [a] Block DBA
      Arg [b] Redo Thread
      Arg [c] Redo RBA Seq
      Arg [d] Redo RBA Block No
      Arg [e] Redo RBA Offset.
    For Oracle 10.1
      Arg [a] Absolute file number of the datafile.
      Arg [b] Block number
      Arg [c] Block DBA
    FUNCTIONALITY:
      kernel cache recovery parallel
    IMPACT:
      INSTANCE FAILURE during recovery.
    
  3. Applies to:
    Oracle Server - Enterprise Edition - Version: 9.0.1.4 to 9.2.0.8
    This problem can occur on any platform.
    Symptoms
    False ORA-600[kcratr1_lostwrt] (on 9.2) or ORA-397 (on 9.0) can happen
    during crash recovery or thread recovery in RAC system.
    If problem happen during crash recovery at starting up database, Oracle
    can't startup database. If problem happen during thread recovery, all RAC
    node might crash.
    You will see following log at terminal if problem happen at crash recovery.
    SQL> connect / as sysdba
    Connected to an idle instance.
    SQL> startup
    ORACLE instance started.
    Total System Global Area 72831496 bytes
    Fixed Size 455176 bytes
    Variable Size 62914560 bytes
    Database Buffers 8388608 bytes
    Redo Buffers 1073152 bytes
    Database mounted.
    ORA-00600: internal error code, arguments: [kcratr1_lostwrt], [], [], [], [],[], [], []
    If problem happen at thread recovery, you will see SMON error in alert.log
    Tue Oct  2 16:41:26 2007
    Instance recovery: looking for dead threads
    Tue Oct  2 16:41:26 2007
    Beginning instance recovery of 1 threads
    Tue Oct  2 16:41:26 2007
    Started redo scan
    Tue Oct  2 16:41:26 2007
    Errors in file /app/oracle/product/9.2.0/admin/ora920/bdump/ora920_smon_12345.trc:
    ORA-00600: internal error code, arguments: [kcratr1_lostwrt], [], [], [], [], [], [], []
    Cause
    Oracle signals this error during first pass of two pass recovery if it finds
    lost write. Two pass recovery is a new feature in Oracle 9i, and gives
    performance improvement on crash and thread recovery.
    In first pass of two pass recovery, Oracle read redo stream and construct
    list of blocks which need to be recovered. For this feature, DBWR writes
    BWR (Block Written Record) to redo stream to tell it has written a change
    to disk, and that change does not need to be applied during crash recovery.
    At the end of first pass, Oracle compares SCN (System Change Number)
    in the last BWR and SCN on the disk to verify changes in the last BWR is
    actually written to disk.
    But Oracle may signal false lost-write error on temporary block.
    This problem can happen if someone manages temporary object including
    temporary table, sort block, etc around the time database aborted.
    As an example, we may encounter problem when we shutdown abort database
    after using dbms_stats.gather_schema_stats (which use temporary table and
    sort operation to gather statistics)
    Solution
    First determine if this error is signaled on temporary block or not.
    Trace file written by this ORA-600 or ORA-397 has following entry at
    the beginning part of trace file before showing ORA- error message.
    Last BWR afn: 201 rdba: 0x400123(blk 291) ver: 0x0000.12345678.00 flg: 0x0c
    Disk version: 0x0000.00000000.01 flag: 0x0c
    Check values at flg: and flag: and change it to binary, if either or both have
    0x08 = 00001000 bit, you hit this false error problem.
    In above example, both flg: and flag: = 0x0c = 00001100, so it hit problem.
    If you hit this false error, add following as initialization parameter and then
    shutdown and restart database.
    _two_pass = FALSE
    With this parameter, Oracle doesn't use two pass recovery feature and we can
    complete crash recovery.
    Once database startup successfully, remove _two_pass parameter, then shutdown
    normal and startup database. You should configure this parameter only when you
    need to resolve this issue.
    
  4. 600[3020]恢复过程alert日志记录

    Mon Apr 22 21:53:41 2013
    alter database recover if needed
     datafile 1055 allow 1 corruption
    Media Recovery Start
     ALLOW CORRUPTION option must use serial recovery
    Mon Apr 22 21:53:41 2013
    Recovery of Online Redo Log: Thread 2 Group 8 Seq 54321 Reading mem 0
      Mem# 0: +DATA/q9dbdg/onlinelog/redo08.log
    Mon Apr 22 21:53:41 2013
    Recovery of Online Redo Log: Thread 1 Group 2 Seq 53587 Reading mem 0
      Mem# 0: +DATA/q9dbdg/onlinelog/redo02.log
    CORRUPTING BLOCK 456338 OF FILE 1055 AND CONTINUING RECOVERY
    Mon Apr 22 21:53:42 2013
    Errors in file /u01/oracle/app/admin/q9db/udump/q9db1_ora_12223.trc:
    ORA-10567: Redo is inconsistent with data block (file# 1055, block# 456338)
    ORA-10564: tablespace PART_DATA
    ORA-01110: data file 1055: '+DATA03/q9dbrac/datafile/part_data583.dbf'
    ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 121354
    Mon Apr 22 21:53:42 2013
    Recovery of Online Redo Log: Thread 1 Group 3 Seq 53588 Reading mem 0
      Mem# 0: +DATA/q9dbdg/onlinelog/redo03.log
    Mon Apr 22 21:53:42 2013
    Media Recovery Complete (q9db1)
    Completed: alter database recover if needed
     datafile 1055 allow 1 corruption
    Mon Apr 22 21:53:56 2013
    alter database datafile 1055 online
    Mon Apr 22 21:53:56 2013
    Completed: alter database datafile 1055 online
    

发表评论

邮箱地址不会被公开。 必填项已用*标注

3 × 3 =