以前解决过ORA-607/ORA-600[4194]和模拟过ORA-607/ORA-600[4194]错误,所以固定思维任务ORA-607/ORA-600[4194]可能就是重大灾难,通过这个案例来说明ORA-607/ORA-600[4194]可能也就是一个常规的不能再常规的错误:有一网友数据库因意外关闭电源导致启动过程出现ORA-00607/ORA-00600[4194]/ORA-00600[4097]的错误,使得数据库启动失败.
SMON: enabling tx recovery
Fri Aug 31 23:14:08 2012
Database Characterset is ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=19, OS id=15619
Fri Aug 31 23:14:10 2012
Errors in file /u01/oradata/orcl/bdump/orcl_smon_15589.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Fri Aug 31 23:14:12 2012
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Fri Aug 31 23:14:12 2012
Completed: alter database open
Fri Aug 31 23:14:14 2012
Errors in file /u01/oradata/orcl/bdump/orcl_smon_15589.trc:
ORA-01595: error freeing extent (2) of rollback segment (4))
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [48], [34], [], [], [], [], []
Fri Aug 31 23:29:41 2012
Errors in file /u01/oradata/orcl/bdump/orcl_smon_15589.trc:
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [17], [10], [], [], [], [], []
Fri Aug 31 23:29:43 2012
Errors in file /u01/oradata/orcl/bdump/orcl_smon_15589.trc:
ORA-00600: internal error code, arguments: [4194], [48], [34], [], [], [], [], []
Fri Aug 31 23:29:44 2012
Errors in file /u01/oradata/orcl/bdump/orcl_pmon_15577.trc:
ORA-00474: SMON process terminated with error
Fri Aug 31 23:29:44 2012
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 15577
通过alert日志可以定位到SMON_SCN_TIME表或者其回滚操作可能异常,结合alert和trace分析,发现这次错误的操作主要sql语句为:
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4194], [48], [34], [], [], [], [], []
Current SQL statement for this session:
UPDATE SYS.COL_USAGE$
SET EQUALITY_PREDS = EQUALITY_PREDS +
DECODE(BITAND(:FLAG, 1), 0, 0, 1),
EQUIJOIN_PREDS = EQUIJOIN_PREDS +
DECODE(BITAND(:FLAG, 2), 0, 0, 1),
NONEQUIJOIN_PREDS = NONEQUIJOIN_PREDS +
DECODE(BITAND(:FLAG, 4), 0, 0, 1),
RANGE_PREDS = RANGE_PREDS + DECODE(BITAND(:FLAG, 8), 0, 0, 1),
LIKE_PREDS = LIKE_PREDS + DECODE(BITAND(:FLAG, 16), 0, 0, 1),
NULL_PREDS = NULL_PREDS + DECODE(BITAND(:FLAG, 32), 0, 0, 1),
TIMESTAMP = :TIME
WHERE OBJ# = :OBJN
AND INTCOL# = :COLN
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4194], [17], [10], [], [], [], [], []
Current SQL statement for this session:
UPDATE SYS.MON_MODS$
SET INSERTS = INSERTS + :INS,
UPDATES = UPDATES + :UPD,
DELETES = DELETES + :DEL,
FLAGS =
(DECODE(BITAND(FLAGS, :FLAG), :FLAG, FLAGS, FLAGS + :FLAG)),
DROP_SEGMENTS = DROP_SEGMENTS + :DROPSEG,
TIMESTAMP = :TIME
WHERE OBJ# = :OBJN
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Current SQL statement for this session:
INSERT INTO SMON_SCN_TIME
(THREAD,
TIME_MP,
TIME_DP,
SCN,
SCN_WRP,
SCN_BAS,
NUM_MAPPINGS,
TIM_SCN_MAP)
VALUES
(0, :1, :2, :3, :4, :5, :6, :7)
这里主要涉及到对oracle的三张表的操作
COL_USAGE$:主要是在收集统计信息的时候作为是否需要收集列直方图信息参考
MON_MODS$:Oracle主要利用该表来记录那些表的数据发生改变,方便收集统计信息
SMON_SCN_TIME:记录SCN和TIME的对应关系
通过这里的分析可以确定这三张表中的数据对于数据库来说不是致命的基表信息,在数据库运行过程中可以清理掉这些信息,最多就是因为数据库性能的下降或者SCN和TIME互转功能不完善.
解决思路
完整的undo异常处理顺序
1.从alert中可以看出来数据库是在open之后由于SMON回滚到上述几条sql异常导致数据库down,所以可以尝试使用system回滚段启动数据库,看看是否可以屏蔽相关问题
2.如果方法1不可行,那使用event屏蔽smon对回滚段的相关操作,使得数据库正常启动
3.如果由于存在特殊事务,event无法屏蔽,尝试使用隐含参数处理该问题
4.如果隐含参数尚无法解决给问题考虑使用bbed
5.如果bbed不能解决,那只能选择dul或者其类似工具处理
这个案例中我们明确的看到是因为上面的三条sql回滚异常出现问题导致,对于这样的问题,经过测试使用方法1和2都能够顺利解决问题(open库之后需要重建undo,删除有问题undo表空间,修改参数[可能包括event],切换undo表空间).因为遇到几次ORA-607/ORA-600[4194]是因为system rollback损坏导致,所以这次开始也认为是一次比较复杂的恢复,最后证明这次是一种非常常规的恢复.对于ORACLE的数据库恢复有经验可能会比较快的定位问题,但是如果按照固定的思路去想可能会让自己走进死胡同.