数据库报ORA-00607/ORA-00600[4194]错误

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:数据库报ORA-00607/ORA-00600[4194]错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

昨天晚上处理一起比较奇特的ORA-00600[4194]错误的数据库恢复案例,客户数据库刚刚上线,因为一时疏忽没有做备份.谁知天有不测风云,就这样的系统也会出问题(数据库文件总共 5g redo log sequence#=9).这个事故告诉我们:作为dba在任何时候都不要有侥幸心理,备份重于一切
数据库报ORA-00607/ORA-00600[4194]错误

Thu Jul 26 13:21:11 2012
SMON: enabling cache recovery
Thu Jul 26 13:21:11 2012
Errors in file /orasvr/admin/mispdata/udump/mispdata_ora_2865.trc:
ORA-00600: internal error code, arguments: [4194], [31], [2], [], [], [], [], []
Thu Jul 26 13:21:11 2012
Doing block recovery for file 1 block 18
Block recovery from logseq 3994, block 3 to scn 89979535
Thu Jul 26 13:21:11 2012
Recovery of Online Redo Log: Thread 1 Group 1 Seq 3994 Reading mem 0
  Mem# 0: /orasvr/mispdata/redo01.log
Block recovery stopped at EOT rba 3994.5.16
Block recovery completed at rba 3994.5.16, scn 0.89979533
Doing block recovery for file 1 block 9
Block recovery from logseq 3994, block 3 to scn 89979532
Thu Jul 26 13:21:11 2012
Recovery of Online Redo Log: Thread 1 Group 1 Seq 3994 Reading mem 0
  Mem# 0: /orasvr/mispdata/redo01.log
Block recovery completed at rba 3994.5.16, scn 0.89979533
Thu Jul 26 13:21:11 2012
Errors in file /orasvr/admin/mispdata/udump/mispdata_ora_2865.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [31], [2], [], [], [], [], []
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 2865
ORA-1092 signalled during: ALTER DATABASE OPEN...

通过alert日志中,我们可以发现是因为ORA-00600[4194]导致数据库不能被正常open,但是这次不同的是在报ORA-00600之前有ORA-00607的错误出现,根据这个提示,应该是一个基本的数据块有问题导致.而ORA-00600[4194]是因为undo和redo不一致导致.对于本错误放在一起分析,大概的评估是因为内部对象的异常出现ora-607,导致undo和redo不一致出现ORA-00600[4194].

trace文件分析

--dump redo
DUMP OF REDO FROM FILE '/orasvr/mispdata/redo02.log'
 Opcodes *.*
 DBAs (file#, block#):
      (1, 18)
 RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
 SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
 Times: creation thru eternity
 FILE HEADER:
	Compatibility Vsn = 169870080=0xa200300
	Db ID=658120234=0x273a1e2a, Db Name='MISPDATA'
	Activation ID=658142762=0x273a762a
	Control Seq=16668=0x411c, File size=102400=0x19000
	File Number=2, Blksiz=512, File Type=2 LOG
 descrip:"Thread 0001, Seq# 0000003992, SCN 0x0000055c5e3c-0x0000055cac62"
 thread: 1 nab: 0x5 seq: 0x00000f98 hws: 0x6 eot: 0 dis: 0
 resetlogs count: 0x2d42646a scn: 0x0000.00000001 (1)
 resetlogs terminal rcv count: 0x0 scn: 0x0000.00000000
 prev resetlogs count: 0x0 scn: 0x0000.00000000
 prev resetlogs terminal rcv count: 0x0 scn: 0x0000.00000000
 Low  scn: 0x0000.055c5e3c (89939516) 07/26/2012 11:17:42
 Next scn: 0x0000.055cac62 (89959522) 07/26/2012 13:16:19
 Enabled scn: 0x0000.00000001 (1) 08/16/2011 11:50:10
 Thread closed scn: 0x0000.055cac61 (89959521) 07/26/2012 11:17:42
 Disk cksum: 0x3088 Calc cksum: 0x3088
 Terminal recovery stop scn: 0x0000.00000000
 Terminal recovery  01/01/1988 00:00:00
 Most recent redo scn: 0x0000.00000000
 Largest LWN: 0 blocks
 End-of-redo stream : No
 Unprotected mode
 Miscellaneous flags: 0x0
 Thread internal enable indicator: thr: 0, seq: 0 scn: 0x0000.00000000
--ORA-00600错误提示
*** 2012-07-26 13:21:11.566
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4194], [31], [2], [], [], [], [], []
Current SQL statement for this session:
update undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,
xactsqn=:8,scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12,spare1=:13 where us#=:1
--ora-607
Error 607 in redo application callback
TYP:0 CLS:16 AFN:1 DBA:0x00400012 OBJ:4294967295 SCN:0x0000.0551610e SEQ:  1 OP:5.1
ktudb redo: siz: 256 spc: 7892 flg: 0x0012 seq: 0x003d rec: 0x02
            xid:  0x0000.026.00000035
ktubl redo: slt: 38 rci: 0 opc: 11.1 objn: 15 objd: 15 tsn: 0
Undo type:  Regular undo        Begin trans    Last buffer split:  No
Temp Object:  No
Tablespace Undo:  No
             0x00000000  prev ctl uba: 0x00400012.003d.01
prev ctl max cmt scn:  0x0000.0550709b  prev tx cmt scn:  0x0000.0550709c
txn start scn:  0xffff.ffffffff  logon user: 0  prev brb: 4194318  prev bcl: 0 KDO undo record:
KTB Redo
op: 0x04  ver: 0x01
op: L  itl: xid:  0x0000.01e.00000035 uba: 0x00400012.003d.01
                      flg: C---    lkc:  0     scn: 0x0000.05511296
KDO Op code: URP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x0040006a  hdba: 0x00400069
itli: 1  ispac: 0  maxfr: 4863
tabn: 0 slot: 1(0x1) flag: 0x2c lock: 0 ckix: 0
ncol: 17 nnew: 12 size: 0
col  1: [ 9]  5f 53 59 53 53 4d 55 31 24
col  2: [ 2]  c1 02
col  3: [ 2]  c1 03
col  4: [ 2]  c1 0a
col  5: [ 5]  c4 5a 12 5a 14
col  6: [ 1]  80
col  7: [ 4]  c3 08 5f 3d
col  8: [ 4]  c3 02 38 52
col  9: [ 1]  80
col 10: [ 2]  c1 04
col 11: [ 2]  c1 02
col 16: [ 2]  c1 02
Block after image is corrupt:
buffer tsn: 0 rdba: 0x00400012 (1/18)
scn: 0x0000.0551610e seq: 0x01 flg: 0x04 tail: 0x610e0201
frmt: 0x02 chkval: 0x65f8 type: 0x02=KTU UNDO BLOCK

这里信息比较多:
1.dump redo部分得到file 1 block 18块可能异常
2.ora-600部分可以得出数据库在执行undo$对象update的回滚操作时候报错
3.通过ora-607信息得到update undo$记录对应的数据块是file 1 block 106(dba 0x00400069),在相同数据库版本数据库中查询.也就是说undo$这个回滚段回滚的时候出现错误.

SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
  3   WHERE FILE_ID = &FILE_ID
  4     AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
Enter value for file_id: 1
old   3:  WHERE FILE_ID = &FILE_ID
new   3:  WHERE FILE_ID = 1
Enter value for block_id: 106
old   4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
new   4:    AND 106 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
OWNER
------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE       TABLESPACE_NAME                PARTITION_NAME
------------------ ------------------------------ ------------------------------
SYS
UNDO$
TABLE              SYSTEM

4.发现dba 0x00400012发现坏块是file 1 block 18,查询坏块对象为

SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
  3   WHERE FILE_ID = &FILE_ID
  4     AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
Enter value for file_id: 1
old   3:  WHERE FILE_ID = &FILE_ID
new   3:  WHERE FILE_ID = 1
Enter value for block_id: 18
old   4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
new   4:    AND 18 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
OWNER
------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE       TABLESPACE_NAME                PARTITION_NAME
------------------ ------------------------------ ------------------------------
SYS
SYSTEM
ROLLBACK           SYSTEM

通过这里的分析,大概可以确定这次故障的原因:
因为ROLLBACK(file 1 block 18)坏块,redo 恢复undo 出现异常出现ORA-607,使得undo和redo不一致从而出现ORA-00600[4194],导致undo$(file 1 block 106)中的一条update事务不能被正常提交或者回滚,从而使得该数据库不能被正常打开.
针对这个库因为ROLLBACK异常,使用隐含参数无法屏蔽该回滚段,因为这个数据量非常小,我们选择了挖数据文件.如果数据量比较大,可以通过bbed尝试提交undo$(file 1 block 106)数据块中事务,看人品是否能够正常启动.

One thought on “数据库报ORA-00607/ORA-00600[4194]错误

  1. 还是不怎么明白是如何解决此等问题的。看到网上最多的是删除undo,然后重建。

  2. 应该是system回滚段有问题, file 1 block 9 undo 段头记录的count数跟file 1 block 18 记录的数不一致,造成4194错误,使用bbed修改

  3. 你好, 之前一看到ora-00600 [4194]错误,我的做法都是设置隐含参数undo_management=’MANUAL’,数据库起来后,重建undo表空间,而就不管之前出现的ora -00604/ora-00607错误了,但好像都可以成功;但今天,这次遇到ora-00604、00607、00600[4194],就没那么幸运了;不管怎样都没有成功, 隐含参数_allow_resetlogs_corruption=true,_corrupted_rollback_segments=’_SYSSMU1$’, —- ‘_SYSSMU20$’也设置了,但数据库就是起不来;请问你都是怎么解决呢? 还有文中最后提到“挖数据文件”这个是什么意思呢? 请指教!

  4. 你好, 之前一看到ora-00600 [4194]错误,我的做法都是设置隐含参数undo_management=’MANUAL’,数据库起来后,重建undo表空间,而就不管之前出现的ora -00604/ora-00607错误了,但好像都可以成功;但今天,这次遇到ora-00604、00607、00600[4194],就没那么幸运了;不管怎样都没有成功, 隐含参数_allow_resetlogs_corruption=true,_corrupted_rollback_segments=’_SYSSMU1$’, —- ‘_SYSSMU20$’也设置了,但数据库就是起不来;请问你都是怎么解决呢? 还有文中最后提到“挖数据文件”这个是什么意思呢? 请指教!

    你的问题可以比较复杂,我没有清楚你的环境,不好说。加入我的qq,让我了解具体情况

发表评论

邮箱地址不会被公开。 必填项已用*标注

4 + 4 =