联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
有朋友找到我说他们数据库无法启动,数据库启动报ORA-600[k2vcbk_2]错误,数据库版本为11.2.0.2 RAC,操作系统是AIX 6.1
SQL> recover database; Media recovery complete. SQL> alter database open; alter database open * ERROR at line 1: ORA-01092: ORACLE instance terminated. Disconnection forced ORA-00600: internal error code, arguments: [k2vcbk_2], [], [], [], [], [], [], [], [], [], [], [] Process ID: 7930020 Session ID: 49 Serial number: 14761
数据库节点1日志
Mon Sep 21 15:45:41 2015 Thread 1 advanced to log sequence 54076 (LGWR switch) Current log# 13 seq# 54076 mem# 0: +DG01/xifenfei/onlinelog/group_13.332.779459035 Current log# 13 seq# 54076 mem# 1: +DG01/xifenfei/onlinelog/group_13.344.779582621 Mon Sep 21 15:45:44 2015 Archived Log entry 74655 added for thread 1 sequence 54075 ID 0x5a0bc0e1 dest 1: Mon Sep 21 15:56:18 2015 Errors in file /oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_18088342.trc (incident=184348): ORA-00600: 内部错误代码, 参数: [kturPOTS_0], [], [], [], [], [], [], [], [], [], [], [] Incident details in: /oracle/diag/rdbms/xifenfei/xifenfei1/incident/incdir_184348/xifenfei1_ora_18088342_i184348.trc Mon Sep 21 15:56:34 2015 Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Error 600 trapped in 2PC on transaction 7.16.120119. Cleaning up. Error stack returned to user: ORA-00600: 内部错误代码, 参数: [kturPOTS_0], [], [], [], [], [], [], [], [], [], [], [] Errors in file /oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_18088342.trc (incident=184349): ORA-00603: ORACLE 服务器会话因致命错误而终止 ORA-00600: 内部错误代码, 参数: [kturPOTS_0], [], [], [], [], [], [], [], [], [], [], [] Mon Sep 21 15:56:34 2015 Dumping diagnostic data in directory=[cdmp_20150921155634], requested by (instance=1, osid=18088342), summary=[incident=184348]. Incident details in: /oracle/diag/rdbms/xifenfei/xifenfei1/incident/incdir_184349/xifenfei1_ora_18088342_i184349.trc Mon Sep 21 15:56:35 2015 Sweep [inc][184349]: completed Sweep [inc][184348]: completed Sweep [inc2][184348]: completed opiodr aborting process unknown ospid (18088342) as a result of ORA-603 Mon Sep 21 15:57:12 2015 Errors in file /oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_smon_7536810.trc (incident=184274): ORA-00600: internal error code, arguments: [k2vcbk_2], [], [], [], [], [], [], [], [], [], [], [] Incident details in: /oracle/diag/rdbms/xifenfei/xifenfei1/incident/incdir_184274/xifenfei1_smon_7536810_i184274.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Mon Sep 21 15:57:16 2015 Dumping diagnostic data in directory=[cdmp_20150921155716], requested by (instance=1, osid=7536810 (SMON)), summary=[incident=184274]. Fatal internal error happened while SMON was doing active transaction recovery. Errors in file /oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_smon_7536810.trc: ORA-00600: internal error code, arguments: [k2vcbk_2], [], [], [], [], [], [], [], [], [], [], [] SMON (ospid: 7536810): terminating the instance due to error 474 Mon Sep 21 15:57:18 2015 ORA-1092 : opitsk aborting process
数据库节点2日志
Mon Sep 21 15:21:50 2015 Archived Log entry 74653 added for thread 2 sequence 23559 ID 0x5a0bc0e1 dest 1: Mon Sep 21 15:44:28 2015 Thread 2 advanced to log sequence 23561 (LGWR switch) Current log# 12 seq# 23561 mem# 0: +DG01/xifenfei/onlinelog/group_12.338.779457003 Current log# 12 seq# 23561 mem# 1: +DG01/xifenfei/onlinelog/group_12.265.779582493 Mon Sep 21 15:44:31 2015 Archived Log entry 74654 added for thread 2 sequence 23560 ID 0x5a0bc0e1 dest 1: Mon Sep 21 15:45:31 2015 DISTRIB TRAN xifenfei.1ebab0a5.20.3.1533822 is local tran 20.3.1533822 (hex=14.03.17677e) insert pending committed tran, scn=14590688068086 (hex=d45.28c781f6) Mon Sep 21 15:45:31 2015 DISTRIB TRAN xifenfei.1ebab0a5.20.3.1533822 is local tran 20.3.1533822 (hex=14.03.17677e)) delete pending committed tran, scn=14590688068086 (hex=d45.28c781f6) Mon Sep 21 15:56:35 2015 Dumping diagnostic data in directory=[cdmp_20150921155634], requested by (instance=1, osid=18088342), summary=[incident=184348]. Mon Sep 21 15:57:10 2015 Error 3135 trapped in 2PC on transaction 20.11.1534704. Cleaning up. Error stack returned to user: ORA-03135: 连接失去联系 opidcl aborting process unknown ospid (9175532) as a result of ORA-604 Mon Sep 21 15:57:17 2015 Dumping diagnostic data in directory=[cdmp_20150921155716], requested by (instance=1, osid=7536810 (SMON)), summary=[incident=184274]. Mon Sep 21 15:57:23 2015 Reconfiguration started (old inc 18, new inc 20) List of instances: 2 (myinst: 2) Global Resource Directory frozen * dead instance detected - domain 0 invalid = TRUE Communication channels reestablished Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned out Mon Sep 21 15:57:23 2015 LMS 2: 3 GCS shadows cancelled, 1 closed, 0 Xw survived Mon Sep 21 15:57:23 2015 LMS 0: 2 GCS shadows cancelled, 0 closed, 0 Xw survived Mon Sep 21 15:57:23 2015 LMS 1: 3 GCS shadows cancelled, 1 closed, 0 Xw survived Set master node info Submitted all remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious All grantable enqueues granted Post SMON to start 1st pass IR Mon Sep 21 15:57:23 2015 minact-scn: Inst 2 is now the master inc#:20 mmon proc-id:6816208 status:0x7 minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0d45.28c2bb5c gcalc-scn:0x0d45.28c3bd2e minact-scn: master found reconf/inst-rec before recscn scan old-inc#:20 new-inc#:20 Mon Sep 21 15:57:23 2015 Instance recovery: looking for dead threads Submitted all GCS remote-cache requests Post SMON to start 1st pass IR Fix write in gcs resources Reconfiguration complete Beginning instance recovery of 1 threads parallel recovery started with 31 processes Started redo scan Completed redo scan read 12626 KB redo, 1724 data blocks need recovery Started redo application at Thread 1: logseq 54076, block 184416 Recovery of Online Redo Log: Thread 1 Group 13 Seq 54076 Reading mem 0 Mem# 0: +DG01/xifenfei/onlinelog/group_13.332.779459035 Mem# 1: +DG01/xifenfei/onlinelog/group_13.344.779582621 Completed redo application of 9.78MB Completed instance recovery at Thread 1: logseq 54076, block 209669, scn 14590688357285 1633 data blocks read, 1794 data blocks written, 12626 redo k-bytes read Thread 1 advanced to log sequence 54077 (thread recovery) Mon Sep 21 15:57:33 2015 Error 3113 trapped in 2PC on transaction 21.18.1965522. Cleaning up. Redo thread 1 internally disabled at seq 54077 (SMON) Error stack returned to user: ORA-02050: 事务处理 21.18.1965522 已回退, 某些远程数据库可能有问题 ORA-03113: 通信通道的文件结尾 ORA-02063: 紧接着 line (起自 ZSK) Mon Sep 21 15:57:34 2015 Archived Log entry 74656 added for thread 1 sequence 54076 ID 0x5a0bc0e1 dest 1: Mon Sep 21 15:57:34 2015 ARC0: Archiving disabled thread 1 sequence 54077 Archived Log entry 74657 added for thread 1 sequence 54077 ID 0x5a0bc0e1 dest 1: Mon Sep 21 15:57:35 2015 Thread 2 advanced to log sequence 23562 (LGWR switch) Current log# 8 seq# 23562 mem# 0: +DG01/xifenfei/onlinelog/group_8.334.779456945 Current log# 8 seq# 23562 mem# 1: +DG01/xifenfei/onlinelog/group_8.267.779582453 Mon Sep 21 15:57:36 2015 Errors in file /oracle/diag/rdbms/xifenfei/xifenfei2/trace/xifenfei2_smon_6750672.trc (incident=200218): ORA-00600: internal error code, arguments: [k2vcbk_2], [], [], [], [], [], [], [], [], [], [], [] Incident details in: /oracle/diag/rdbms/xifenfei/xifenfei2/incident/incdir_200218/xifenfei2_smon_6750672_i200218.trc Archived Log entry 74658 added for thread 2 sequence 23561 ID 0x5a0bc0e1 dest 1: Mon Sep 21 15:57:38 2015 minact-scn: master continuing after IR Mon Sep 21 15:57:41 2015 Dumping diagnostic data in directory=[cdmp_20150921155741], requested by (instance=2, osid=6750672 (SMON)), summary=[incident=200218]. Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Fatal internal error happened while SMON was doing instance transaction recovery. Errors in file /oracle/diag/rdbms/xifenfei/xifenfei2/trace/xifenfei2_smon_6750672.trc: ORA-00600: internal error code, arguments: [k2vcbk_2], [], [], [], [], [], [], [], [], [], [], [] SMON (ospid: 6750672): terminating the instance due to error 474 Mon Sep 21 15:57:41 2015 ORA-1092 : opitsk aborting process Mon Sep 21 15:57:42 2015 ORA-1092 : opitsk aborting process Mon Sep 21 15:57:42 2015 License high water mark = 291 Instance terminated by SMON, pid = 6750672 USER (ospid: 18874814): terminating the instance
通过数据库日志大概可以看出来,由于节点2的分布式事事务异常,而在11.2.0.2中,分布式事务跨节点,引起节点2的pmon清理异常事务,但是由于bug,使得异常事务无法被清理掉,从而引起节点1 crash,节点1 crash之后节点2进行恢复,也因为分布式事务bug,导致smon回滚失败,实例也crash。无法进行回滚导致数据库无法正常启动,通过查询mos发现定位到是Bug 10222544 ORA-600 [k2vpci_2] from multi-branch distributed transaction
对于这类问题,由于分布事务无法清理,处理方法就是找出来事务人工提交或者直接屏蔽掉这个事务解决该问题