数据库启动报ORA-600 6711故障分析处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:数据库启动报ORA-600 6711故障分析处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

几个月以前的一个数据库故障,今天拿出来在win上重新分析,数据库启动报ORA-600 6711错

C:\Users\XFF>SQLPLUS / AS SYSDBA

SQL*Plus: Release 12.1.0.2.0 Production on 星期日 7月 14 16:17:32 2024

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

已连接到空闲例程。

SQL> startup mount pfile='d:/pfile.txt'
ORACLE 例程已经启动。

Total System Global Area 6442450944 bytes
Fixed Size                  6205768 bytes
Variable Size            1493175992 bytes
Database Buffers         4932501504 bytes
Redo Buffers               10567680 bytes
数据库装载完毕。
SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [6711], [4436379], [1], [4436389],
[0], [], [], [], [], [], [], []
进程 ID: 44144
会话 ID: 67 序列号: 39084

根据经验该报错为:ORA-600 [6711] “Cluster Key Chain corruption”,也就是说很可能是cluster相关对象异常导致该问题.


对启动过程进行跟踪

PARSING IN CURSOR #17695456 len=189 dep=4  tim=233428646426 hv=186852205 ad='7ffda1eea168' sqlid='2tkw12w5k68vd'
select user#,password,datats#,tempts#,type#,defrole,resource$, ptime,
decode(defschclass,NULL,'DEFAULT_CONSUMER_GROUP',defschclass),
spare1,spare4,ext_username,spare2 from user$ where name=:1
END OF STMT
PARSE #17695456:c=0,e=168,p=0,cr=0,cu=0,mis=1,r=0,dep=4,og=4,plh=0,tim=233428646426
BINDS #17695456:
 Bind#0
  oacdty=01 mxl=32(03) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=871 siz=32 off=0
  kxsbbbfp=010b2df0  bln=32  avl=03  flg=05
  value="SYS"
EXEC #17695456:c=0,e=418,p=0,cr=0,cu=0,mis=1,r=0,dep=4,og=4,plh=1457651150,tim=233428646901
WAIT #17695456: nam='db file sequential read' ela= 126 file#=1 block#=417 blocks=1 obj#=46 tim=233428647046
FETCH #17695456:c=0,e=153,p=1,cr=2,cu=0,mis=0,r=1,dep=4,og=4,plh=1457651150,tim=233428647069
STAT #17695456 id=1 cnt=1 pid=0 pos=1 obj=22 op='TABLE ACCESS BY INDEX ROWID USER$ 
  (cr=2 pr=1 pw=0 time=151 us cost=1 size=139 card=1)'
STAT #17695456 id=2 cnt=1 pid=1 pos=1 obj=46 op='INDEX UNIQUE SCAN I_USER1 (cr=1 pr=1 pw=0 time=149 us)'
CLOSE #17695456:c=0,e=2,dep=4,type=0,tim=233428647111
Incident 2601 created, dump file: C:\APP\XFF\diag\rdbms\ecp\ecp\incident\incdir_2601\ecp_ora_40516_i2601.trc
ORA-00600: 内部错误代码, 参数: [6711], [4436379], [1], [4436389], [0], [], [], [], [], [], [], []

FETCH #15289752:c=2062500,e=2544215,p=13,cr=65626,cu=28,mis=0,r=0,dep=3,og=3,plh=3312420081,tim=233431176536
=====================
PARSE ERROR #387363008:len=50 dep=1 uid=0 oct=3 lid=0 tim=233431176680 err=600
select cost from resource_cost$ where resource#=:1
ORA-00600: 内部错误代码, 参数: [6711], [4436379], [1], [4436389], [0], [], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [6711], [4436379], [1], [4436389], [0], [], [], [], [], [], [], []

这个操作触发了递归查询

PARSING IN CURSOR #387319440 len=151 dep=5 lid=0 tim=233428641503 hv=2507062328 ad='7ffd9ffa23a8' sqlid='7u49y06aqxg1s'
select /*+ rule */ bucket, endpoint, col#, epvalue, epvalue_raw, ep_repeat_count from histgrm$ 
where obj#=:1 and intcol#=:2 and row#=:3 order by bucket
END OF STMT
PARSE #387319440:c=0,e=11,p=0,cr=0,cu=0,mis=0,r=0,dep=5,og=3,plh=3312420081,tim=233428641503
BINDS #387319440:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000001 frm=00 csi=00 siz=72 off=0
  kxsbbbfp=00eb2be0  bln=22  avl=02  flg=05
  value=22
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=00eb2bf8  bln=22  avl=02  flg=01
  value=2
 Bind#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000001 frm=00 csi=00 siz=0 off=48
  kxsbbbfp=00eb2c10  bln=22  avl=01  flg=01
  value=0
EXEC #387319440:c=0,e=105,p=0,cr=0,cu=0,mis=0,r=0,dep=5,og=3,plh=3312420081,tim=233428641652
WAIT #387319440: nam='db file sequential read' ela= 124 file#=1 block#=45660 blocks=1 obj#=66 tim=233428641792
FETCH #387319440:c=0,e=173,p=1,cr=3,cu=0,mis=0,r=20,dep=5,og=3,plh=3312420081,tim=233428641834
STAT #387319440 id=1 cnt=20 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=3 pr=1 pw=0 time=169 us cost=0 size=0 card=0)'
STAT #387319440 id=2 cnt=20 pid=1 pos=1 obj=66 op='TABLE ACCESS CLUSTER HISTGRM$ (cr=3 pr=1 pw=0 time=148 us)'
STAT #387319440 id=3 cnt=1 pid=2 pos=1 obj=65 op='INDEX UNIQUE SCAN I_OBJ#_INTCOL# (cr=2 pr=0 pw=0 time=2 us)'
CLOSE #387319440:c=0,e=36,dep=5,type=3,tim=233428641886

查看对应的trace文件

[TOC00000]
Jump to table of contents
Dump continued from file: C:\APP\XFF\diag\rdbms\ecp\ecp\trace\ecp_ora_40516.trc
[TOC00001]
ORA-00600: 内部错误代码, 参数: [6711], [4436379], [1], [4436389], [0], [], [], [], [], [], [], []

[TOC00001-END]
[TOC00002]
========= Dump for incident 2601 (ORA 600 [6711]) ========
[TOC00003]
----- Beginning of Customized Incident Dump(s) -----
kdsDumpState: cdb: 0 dspdb: 0 type: 3
*** ENTER: kds state dump ***
            row 0x0043b1a5.28 continuation at: 0x0043b1a5.0 file# 1 block# 242085 slot 0 (dscnt: 0)
KDSTABN_GET: 1 ..... ntab: 2
curSlot: 0 ..... nrows: 40
Dumping kcb descriptor:
kcbds 0x0000000017100DF0 : tsn 0, rdba 0x0043b1a5, afn 1, objd 64, cls 1, tidflg 0x0 0x0 0x0
    dsflg 0x00100000, dsflg2 0x00004000, lobid 00000000:00000000, cnt 0, addr 0x00007FFD55D1C014 dx 0x0000000000000000
    env [0x0000000017178C7C]: (scn: 0x0000.54290647   xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  
    statement num=0  parent xid:  0x0000.000.00000000  st-scn: 0x0000.00000000  
    hi-scn: 0x0000.00000000  ma-scn: 0x0000.00000000  flg: 0x00000660)
kcb_dw_scan_dumpctx: not in DW scan
kdsgrp1_dump database not fully open
*** EXIT: kds state dump ***
----- End of Customized Incident Dump(s) -----
[TOC00003-END]

通过对相关rdba进行dump分析,确认对象id为64和trace中报的信息匹配

DUL> rdba 0x0043b1a5

  rdba   : 0x0043b1a5=4436389
  rfile# : 1
  block# : 242085

DUL> dump datafile 1 block 242085 header
Block Header:
block type=0x06 (table/index/cluster segment data block)
block format=0xa2 (oracle 10)
block rdba=0x0043b1a5 (file#=1, block#=242085)
scn=0x0000.438d4a86, seq=1, tail=0x4a860601
block checksum value=0xd591=54673, flag=6
Data Block Header Dump:
 Object id on Block? Y
 seg/obj: 0x40=64  csc: 0x00.438d4a80  itc: 2  flg: -  typ: 1 (data)
     fsl: 0  fnx: 0x0 ver: 0x01

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0002.01f.00014b92  0x00c01897.6e20.07  C---    0  scn 0x0000.438c5fca
0x02   0x000a.01a.0011bb8e  0x00c0292c.0317.42  --U-   22  fsc 0x0000.438d4a86
Data Block Dump:
================
flag=0x0 --------
ntab=2
nrow=41
frre=23
fsbo=0x68
ffeo=0xb90
avsp=0x1ce1
tosp=0x1ce1

进一步分析该id为什么对象,使用dul unload obj$
c_obj#_intcol#


确认对对象为cluster C_OBJ#_INTCOL#,对应的表为HISTGRM$(统计信息中存储直方图信息表),明白这一些,处理起来就比较容易了,open数据库过程中绕过该对象访问,然后对该表进行处理即可