联系:手机/微信(+86 17813235971) QQ(107644445)
标题:数据库报ORA-00607/ORA-00600[4194]错误
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
昨天晚上处理一起比较奇特的ORA-00600[4194]错误的数据库恢复案例,客户数据库刚刚上线,因为一时疏忽没有做备份.谁知天有不测风云,就这样的系统也会出问题(数据库文件总共 5g redo log sequence#=9).这个事故告诉我们:作为dba在任何时候都不要有侥幸心理,备份重于一切
数据库报ORA-00607/ORA-00600[4194]错误
Thu Jul 26 13:21:11 2012 SMON: enabling cache recovery Thu Jul 26 13:21:11 2012 Errors in file /orasvr/admin/mispdata/udump/mispdata_ora_2865.trc: ORA-00600: internal error code, arguments: [4194], [31], [2], [], [], [], [], [] Thu Jul 26 13:21:11 2012 Doing block recovery for file 1 block 18 Block recovery from logseq 3994, block 3 to scn 89979535 Thu Jul 26 13:21:11 2012 Recovery of Online Redo Log: Thread 1 Group 1 Seq 3994 Reading mem 0 Mem# 0: /orasvr/mispdata/redo01.log Block recovery stopped at EOT rba 3994.5.16 Block recovery completed at rba 3994.5.16, scn 0.89979533 Doing block recovery for file 1 block 9 Block recovery from logseq 3994, block 3 to scn 89979532 Thu Jul 26 13:21:11 2012 Recovery of Online Redo Log: Thread 1 Group 1 Seq 3994 Reading mem 0 Mem# 0: /orasvr/mispdata/redo01.log Block recovery completed at rba 3994.5.16, scn 0.89979533 Thu Jul 26 13:21:11 2012 Errors in file /orasvr/admin/mispdata/udump/mispdata_ora_2865.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-00607: Internal error occurred while making a change to a data block ORA-00600: internal error code, arguments: [4194], [31], [2], [], [], [], [], [] Error 604 happened during db open, shutting down database USER: terminating instance due to error 604 Instance terminated by USER, pid = 2865 ORA-1092 signalled during: ALTER DATABASE OPEN...
通过alert日志中,我们可以发现是因为ORA-00600[4194]导致数据库不能被正常open,但是这次不同的是在报ORA-00600之前有ORA-00607的错误出现,根据这个提示,应该是一个基本的数据块有问题导致.而ORA-00600[4194]是因为undo和redo不一致导致.对于本错误放在一起分析,大概的评估是因为内部对象的异常出现ora-607,导致undo和redo不一致出现ORA-00600[4194].
trace文件分析
--dump redo DUMP OF REDO FROM FILE '/orasvr/mispdata/redo02.log' Opcodes *.* DBAs (file#, block#): (1, 18) RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff Times: creation thru eternity FILE HEADER: Compatibility Vsn = 169870080=0xa200300 Db ID=658120234=0x273a1e2a, Db Name='MISPDATA' Activation ID=658142762=0x273a762a Control Seq=16668=0x411c, File size=102400=0x19000 File Number=2, Blksiz=512, File Type=2 LOG descrip:"Thread 0001, Seq# 0000003992, SCN 0x0000055c5e3c-0x0000055cac62" thread: 1 nab: 0x5 seq: 0x00000f98 hws: 0x6 eot: 0 dis: 0 resetlogs count: 0x2d42646a scn: 0x0000.00000001 (1) resetlogs terminal rcv count: 0x0 scn: 0x0000.00000000 prev resetlogs count: 0x0 scn: 0x0000.00000000 prev resetlogs terminal rcv count: 0x0 scn: 0x0000.00000000 Low scn: 0x0000.055c5e3c (89939516) 07/26/2012 11:17:42 Next scn: 0x0000.055cac62 (89959522) 07/26/2012 13:16:19 Enabled scn: 0x0000.00000001 (1) 08/16/2011 11:50:10 Thread closed scn: 0x0000.055cac61 (89959521) 07/26/2012 11:17:42 Disk cksum: 0x3088 Calc cksum: 0x3088 Terminal recovery stop scn: 0x0000.00000000 Terminal recovery 01/01/1988 00:00:00 Most recent redo scn: 0x0000.00000000 Largest LWN: 0 blocks End-of-redo stream : No Unprotected mode Miscellaneous flags: 0x0 Thread internal enable indicator: thr: 0, seq: 0 scn: 0x0000.00000000 --ORA-00600错误提示 *** 2012-07-26 13:21:11.566 ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [4194], [31], [2], [], [], [], [], [] Current SQL statement for this session: update undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7, xactsqn=:8,scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12,spare1=:13 where us#=:1 --ora-607 Error 607 in redo application callback TYP:0 CLS:16 AFN:1 DBA:0x00400012 OBJ:4294967295 SCN:0x0000.0551610e SEQ: 1 OP:5.1 ktudb redo: siz: 256 spc: 7892 flg: 0x0012 seq: 0x003d rec: 0x02 xid: 0x0000.026.00000035 ktubl redo: slt: 38 rci: 0 opc: 11.1 objn: 15 objd: 15 tsn: 0 Undo type: Regular undo Begin trans Last buffer split: No Temp Object: No Tablespace Undo: No 0x00000000 prev ctl uba: 0x00400012.003d.01 prev ctl max cmt scn: 0x0000.0550709b prev tx cmt scn: 0x0000.0550709c txn start scn: 0xffff.ffffffff logon user: 0 prev brb: 4194318 prev bcl: 0 KDO undo record: KTB Redo op: 0x04 ver: 0x01 op: L itl: xid: 0x0000.01e.00000035 uba: 0x00400012.003d.01 flg: C--- lkc: 0 scn: 0x0000.05511296 KDO Op code: URP row dependencies Disabled xtype: XA flags: 0x00000000 bdba: 0x0040006a hdba: 0x00400069 itli: 1 ispac: 0 maxfr: 4863 tabn: 0 slot: 1(0x1) flag: 0x2c lock: 0 ckix: 0 ncol: 17 nnew: 12 size: 0 col 1: [ 9] 5f 53 59 53 53 4d 55 31 24 col 2: [ 2] c1 02 col 3: [ 2] c1 03 col 4: [ 2] c1 0a col 5: [ 5] c4 5a 12 5a 14 col 6: [ 1] 80 col 7: [ 4] c3 08 5f 3d col 8: [ 4] c3 02 38 52 col 9: [ 1] 80 col 10: [ 2] c1 04 col 11: [ 2] c1 02 col 16: [ 2] c1 02 Block after image is corrupt: buffer tsn: 0 rdba: 0x00400012 (1/18) scn: 0x0000.0551610e seq: 0x01 flg: 0x04 tail: 0x610e0201 frmt: 0x02 chkval: 0x65f8 type: 0x02=KTU UNDO BLOCK
这里信息比较多:
1.dump redo部分得到file 1 block 18块可能异常
2.ora-600部分可以得出数据库在执行undo$对象update的回滚操作时候报错
3.通过ora-607信息得到update undo$记录对应的数据块是file 1 block 106(dba 0x00400069),在相同数据库版本数据库中查询.也就是说undo$这个回滚段回滚的时候出现错误.
SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME 2 FROM DBA_EXTENTS A 3 WHERE FILE_ID = &FILE_ID 4 AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1; Enter value for file_id: 1 old 3: WHERE FILE_ID = &FILE_ID new 3: WHERE FILE_ID = 1 Enter value for block_id: 106 old 4: AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1 new 4: AND 106 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1 OWNER ------------------------------ SEGMENT_NAME -------------------------------------------------------------------------------- SEGMENT_TYPE TABLESPACE_NAME PARTITION_NAME ------------------ ------------------------------ ------------------------------ SYS UNDO$ TABLE SYSTEM
4.发现dba 0x00400012发现坏块是file 1 block 18,查询坏块对象为
SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME 2 FROM DBA_EXTENTS A 3 WHERE FILE_ID = &FILE_ID 4 AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1; Enter value for file_id: 1 old 3: WHERE FILE_ID = &FILE_ID new 3: WHERE FILE_ID = 1 Enter value for block_id: 18 old 4: AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1 new 4: AND 18 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1 OWNER ------------------------------ SEGMENT_NAME -------------------------------------------------------------------------------- SEGMENT_TYPE TABLESPACE_NAME PARTITION_NAME ------------------ ------------------------------ ------------------------------ SYS SYSTEM ROLLBACK SYSTEM
通过这里的分析,大概可以确定这次故障的原因:
因为ROLLBACK(file 1 block 18)坏块,redo 恢复undo 出现异常出现ORA-607,使得undo和redo不一致从而出现ORA-00600[4194],导致undo$(file 1 block 106)中的一条update事务不能被正常提交或者回滚,从而使得该数据库不能被正常打开.
针对这个库因为ROLLBACK异常,使用隐含参数无法屏蔽该回滚段,因为这个数据量非常小,我们选择了挖数据文件.如果数据量比较大,可以通过bbed尝试提交undo$(file 1 block 106)数据块中事务,看人品是否能够正常启动.
还是不怎么明白是如何解决此等问题的。看到网上最多的是删除undo,然后重建。
应该是system回滚段有问题, file 1 block 9 undo 段头记录的count数跟file 1 block 18 记录的数不一致,造成4194错误,使用bbed修改
你好, 之前一看到ora-00600 [4194]错误,我的做法都是设置隐含参数undo_management=’MANUAL’,数据库起来后,重建undo表空间,而就不管之前出现的ora -00604/ora-00607错误了,但好像都可以成功;但今天,这次遇到ora-00604、00607、00600[4194],就没那么幸运了;不管怎样都没有成功, 隐含参数_allow_resetlogs_corruption=true,_corrupted_rollback_segments=’_SYSSMU1$’, —- ‘_SYSSMU20$’也设置了,但数据库就是起不来;请问你都是怎么解决呢? 还有文中最后提到“挖数据文件”这个是什么意思呢? 请指教!
你的问题可以比较复杂,我没有清楚你的环境,不好说。加入我的qq,让我了解具体情况