联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
alert日志报ORA-00600[4137]与ORA-00600 [4198]错误
数据库报如下错误,运行一段时间数据库自动down掉
Fri Jul 6 18:00:40 2012 SMON: ignoring slave err,downgrading to serial rollback Fri Jul 6 18:00:41 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_smon_16636.trc: ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], [] ORACLE Instance techdb (pid = 8) - Error 600 encountered while recovering transaction (3, 17). Fri Jul 6 18:00:41 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_smon_16636.trc: ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], [] Fri Jul 6 18:05:53 2012 SMON: Restarting fast_start parallel rollback Fri Jul 6 18:05:54 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_p000_17124.trc: ORA-00600: internal error code, arguments: [4198], [9], [], [], [], [], [], [] ………… Wed Jul 6 18:50:38 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_pmon_4473.trc: ORA-00474: SMON process terminated with error Wed Jul 6 18:50:38 2012 PMON: terminating instance due to error 474
从三个地方得出3号回滚段异常
1.trace文件
SMON: about to recover undo segment 3 Parallel Transaction recovery caught exception 12801 Parallel Transaction recovery caught error 30317 *** 2012-07-06 17:55:19.042 SMON: Restarting fast_start parallel rollback SMON: about to recover undo segment 3 SMON: mark undo segment 3 as available SMON: about to recover undo segment 3 SMON: mark undo segment 3 as available Parallel Transaction recovery caught exception 12801 Parallel Transaction recovery caught error 607 *** 2012-07-06 17:55:19.761 SMON: ignoring slave err,downgrading to serial rollback SMON: about to recover undo segment 3 XID passed in =xid: 0x0003.011.00003c2b XID from Undo block =xid: 0x0004.020.00002b35
2.alert中提示while recovering transaction (3, 17)
3.查询dba_rollback_segs发现_SYSSMU3$是NEED RECOVERY状态
尝试删除_SYSSMU3$
使用隐含参数_offline_rollback_segments= _SYSSMU3$
Fri Jul 6 18:16:19 2012 Completed: ALTER DATABASE OPEN Fri Jul 6 18:16:56 2012 drop rollback segment "_SYSSMU3$" Fri Jul 6 18:16:57 2012 Errors in file /usr/local/oracle/admin/techdb/udump/techdb_ora_17381.trc: ORA-00600: internal error code, arguments: [kddummy_blkchk], [2], [41], [38508], [], [], [], [] Fri Jul 6 18:16:57 2012 Doing block recovery for file 2 block 41 Block recovery from logseq 209591, block 183 to scn 7788878085 Fri Jul 6 18:16:57 2012 Recovery of Online Redo Log: Thread 1 Group 1 Seq 209591 Reading mem 0 Mem# 0 errs 0: /usr/local/oracle/oradata/techdb/redo01.log Block recovery completed at rba 209591.225.16, scn 1.3493910790 ORA-607 signalled during: drop rollback segment "_SYSSMU3$"... Fri Jul 6 18:16:57 2012 Corrupt Block Found TSN = 1, TSNAME = UNDOTBS1 RFN = 2, BLK = 41, RDBA = 8388649 OBJN = 0, OBJD = -1, OBJECT = _NEXT_OBJECT, SUBOBJECT = SEGMENT OWNER = SYS, SEGMENT TYPE = Invalid Type Fri Jul 6 18:16:57 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_smon_17367.trc: ORA-00600: internal error code, arguments: [kddummy_blkchk], [2], [41], [38508], [], [], [], [] Doing block recovery for file 2 block 41 Block recovery from logseq 209591, block 183 to scn 7788878085 Fri Jul 6 18:17:46 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_pmon_17355.trc: ORA-00474: SMON process terminated with error Fri Jul 6 18:17:46 2012 PMON: terminating instance due to error 474 Fri Jul 6 18:17:46 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_dbw0_17361.trc: ORA-00474: SMON process terminated with error Fri Jul 6 18:17:46 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_lgwr_17363.trc: ORA-00474: SMON process terminated with error Instance terminated by PMON, pid = 17355
这里可以看出在使用隐含参数删除异常回滚段的时候,因为该回滚段有坏块出现ORA-00600[kddummy_blkchk]使得数据库donw掉,重启过几次该库都因为这个错误直接down.
查看trace文件发现
SMON: about to recover undo segment 3 SMON: mark undo segment 3 as needs recovery *** 2012-07-06 18:16:57.734 Block Checking: DBA = 8388649, Block Type = System Managed Segment Header Block ERROR: SMU Segment Header Corrupted. Error Code = 38508 ktu4smck: starting extent(0x77) of txn slot #0x11 is invalid. valid value (0 - 0x76) TRN CTL:: seq: 0xed38 chd: 0x0020 ctl: 0x002a inc: 0x00000000 nfb: 0x0000 mgc: 0x8201 xts: 0x0068 flg: 0x0001 opt: 2147483646 (0x7ffffffe) uba: 0x00a6610a.ed38.1d scn: 0x0001.d030de86 Version: 0x01
因为该库是因为undo的3号回滚段的header出现坏块,即使使用了隐含参数屏蔽该回滚段恢复,smon进程依然会去读回滚段header,从而出现该错误导致直接down掉.
处理方案
1.使用隐含参数屏蔽异常回滚段_offline_rollback_segments= _SYSSMU3$
2.修改undo_tablespace=SYSTEM/undo_management=MANUAL
3.启动数据库,快速删除包含_SYSSMU3$ undo表空间
4.新建undo表空间
5.修改undo_tablespace=new_undo/undo_management=AUTO,除掉隐含参数
6.使用新参数文件重启数据库
7.建议:使用逻辑导出导入重建数据库
补充说明在该次故障处理过程中,忘记尝试采用event来屏蔽回滚,不知道该方法是否可以屏蔽对回滚段header的读
event = 10513 trace name context forever,level 2
ORA-600 [4137] “XID in Undo and Redo Does Not Match” [ID 43914.1]