undo segment header坏块异常恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:undo segment header坏块异常恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

alert日志报ORA-00600[4137]与ORA-00600 [4198]错误
数据库报如下错误,运行一段时间数据库自动down掉

Fri Jul  6 18:00:40 2012
SMON: ignoring slave err,downgrading to serial rollback
Fri Jul  6 18:00:41 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_smon_16636.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
ORACLE Instance techdb (pid = 8) - Error 600 encountered while recovering transaction (3, 17).
Fri Jul  6 18:00:41 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_smon_16636.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Fri Jul  6 18:05:53 2012
SMON: Restarting fast_start parallel rollback
Fri Jul  6 18:05:54 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_p000_17124.trc:
ORA-00600: internal error code, arguments: [4198], [9], [], [], [], [], [], []
…………
Wed Jul  6 18:50:38 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_pmon_4473.trc:
ORA-00474: SMON process terminated with error
Wed Jul  6 18:50:38 2012
PMON: terminating instance due to error 474

从三个地方得出3号回滚段异常
1.trace文件

SMON: about to recover undo segment 3
Parallel Transaction recovery caught exception 12801
Parallel Transaction recovery caught error 30317
*** 2012-07-06 17:55:19.042
SMON: Restarting fast_start parallel rollback
SMON: about to recover undo segment 3
SMON: mark undo segment 3 as available
SMON: about to recover undo segment 3
SMON: mark undo segment 3 as available
Parallel Transaction recovery caught exception 12801
Parallel Transaction recovery caught error 607
*** 2012-07-06 17:55:19.761
SMON: ignoring slave err,downgrading to serial rollback
SMON: about to recover undo segment 3
XID passed in =xid: 0x0003.011.00003c2b
XID from Undo block =xid: 0x0004.020.00002b35

2.alert中提示while recovering transaction (3, 17)
3.查询dba_rollback_segs发现_SYSSMU3$是NEED RECOVERY状态

尝试删除_SYSSMU3$
使用隐含参数_offline_rollback_segments= _SYSSMU3$

Fri Jul  6 18:16:19 2012
Completed: ALTER DATABASE OPEN
Fri Jul  6 18:16:56 2012
drop rollback segment "_SYSSMU3$"
Fri Jul  6 18:16:57 2012
Errors in file /usr/local/oracle/admin/techdb/udump/techdb_ora_17381.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [2], [41], [38508], [], [], [], []
Fri Jul  6 18:16:57 2012
Doing block recovery for file 2 block 41
Block recovery from logseq 209591, block 183 to scn 7788878085
Fri Jul  6 18:16:57 2012
Recovery of Online Redo Log: Thread 1 Group 1 Seq 209591 Reading mem 0
  Mem# 0 errs 0: /usr/local/oracle/oradata/techdb/redo01.log
Block recovery completed at rba 209591.225.16, scn 1.3493910790
ORA-607 signalled during: drop rollback segment "_SYSSMU3$"...
Fri Jul  6 18:16:57 2012
Corrupt Block Found
         TSN = 1, TSNAME = UNDOTBS1
         RFN = 2, BLK = 41, RDBA = 8388649
         OBJN = 0, OBJD = -1, OBJECT = _NEXT_OBJECT, SUBOBJECT =
         SEGMENT OWNER = SYS, SEGMENT TYPE = Invalid Type
Fri Jul  6 18:16:57 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_smon_17367.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [2], [41], [38508], [], [], [], []
Doing block recovery for file 2 block 41
Block recovery from logseq 209591, block 183 to scn 7788878085
Fri Jul  6 18:17:46 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_pmon_17355.trc:
ORA-00474: SMON process terminated with error
Fri Jul  6 18:17:46 2012
PMON: terminating instance due to error 474
Fri Jul  6 18:17:46 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_dbw0_17361.trc:
ORA-00474: SMON process terminated with error
Fri Jul  6 18:17:46 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_lgwr_17363.trc:
ORA-00474: SMON process terminated with error
Instance terminated by PMON, pid = 17355

这里可以看出在使用隐含参数删除异常回滚段的时候,因为该回滚段有坏块出现ORA-00600[kddummy_blkchk]使得数据库donw掉,重启过几次该库都因为这个错误直接down.
查看trace文件发现

SMON: about to recover undo segment 3
SMON: mark undo segment 3 as needs recovery
*** 2012-07-06 18:16:57.734
Block Checking: DBA = 8388649, Block Type = System Managed Segment Header Block
ERROR: SMU Segment Header Corrupted.  Error Code = 38508
ktu4smck: starting extent(0x77) of txn slot #0x11 is  invalid.
  valid value (0 - 0x76)
  TRN CTL:: seq: 0xed38 chd: 0x0020 ctl: 0x002a inc: 0x00000000 nfb: 0x0000
            mgc: 0x8201 xts: 0x0068 flg: 0x0001 opt: 2147483646 (0x7ffffffe)
            uba: 0x00a6610a.ed38.1d scn: 0x0001.d030de86
            Version: 0x01

因为该库是因为undo的3号回滚段的header出现坏块,即使使用了隐含参数屏蔽该回滚段恢复,smon进程依然会去读回滚段header,从而出现该错误导致直接down掉.

处理方案
1.使用隐含参数屏蔽异常回滚段_offline_rollback_segments= _SYSSMU3$
2.修改undo_tablespace=SYSTEM/undo_management=MANUAL
3.启动数据库,快速删除包含_SYSSMU3$ undo表空间
4.新建undo表空间
5.修改undo_tablespace=new_undo/undo_management=AUTO,除掉隐含参数
6.使用新参数文件重启数据库
7.建议:使用逻辑导出导入重建数据库