C_OBJ#_INTCOL#坏块导致数据库无法open故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:C_OBJ#_INTCOL#坏块导致数据库无法open故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户通过硬件恢复出来数据文件之后,尝试启动数据库报错,他们经过多轮尝试依旧无法open数据库,还原到恢复出来文件的初始状态,通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)脚本检测,发现file# 3 需要sequence#为3200的日志,其他文件需要sequence#为3205的日志
lowscn


检查发现数据库为非归档模式,而且sequence#为3200的redo已经被覆盖
redo

基于上述情况,可以确认除file# 3之外,其他文件可以正常recover

[oracle@oracledb check_db]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Wed Oct 15 16:08:33 2025
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> recover datafile 1;
Media recovery complete.
SQL> recover datafile 2,4,5,7,8,9,10,11,12,13,14,15,16,17;
Media recovery complete.
SQL> recover datafile 18,19,20,21;
Media recovery complete.

对于缺少日志的数据文件,直接使用自研的m_scn(modify scn)工具进行修改

[oracle@oracledb check_db]$ chmod +x m_scn
[oracle@oracledb check_db]$ ./m_scn 1.txt
Please Enter Password: 

===== Starting Datafile Header modification program =====
Datafile list file: 1.txt
Operation Mode: Only Modify Datafile Header CheckPoint
Block Size: 8192
Log Path: /tmp/modify_scn
---------------------------------------------------------
Preparing Datafile list file...
Verifying Datafile existence...
ERROR: Datafile  does not exist!
Datafile verification passed
Initializing working directory...
Recovery script created: /tmp/modify_scn/backup/recover_datafile.sh
---------------------------------------------------------
Starting Datafile Header processing (total 3 files)...
[1/3] Processing Datafile Header: /data/20251014HF/oradata/system01.dbf (File number: 1)
  - Skipping file number 1 (control file)
---------------------------------------------------------
[2/3] Processing Datafile Header: /data/20251014HF/oradata/sysaux01.dbf (File number: 3)
  - Backing up Datafile header...
  - Executing Datafile Header modification with block size 8192...
  - Datafile Header processing completed
---------------------------------------------------------
Cleaning up temporary files...
================= All operations completed =================

Note: Execute /tmp/modify_scn/backup/recover_datafile.sh operation for rollback
[oracle@oracledb check_db]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Wed Oct 15 16:15:12 2025
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> recover datafile 3;
Media recovery complete.

然后尝试打开数据库报ORA-01092 ORA-01578等错误

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00604: error occurred at recursive SQL level 2
ORA-01578: ORACLE data block corrupted (file # 1, block # 132585)
ORA-01110: data file 1: '/data/app/oracle/oradata/ORCL/system01.dbf'
Process ID: 1617
Session ID: 1 Serial number: 5

报错比较明显是由于坏块导致数据库无法打开,进一步检查alert日志

2025-10-14T21:38:22.889985+08:00
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
stopping change tracking
2025-10-14T21:38:22.909799+08:00
Starting background process SMCO
2025-10-14T21:38:22.930114+08:00
SMCO started with pid=57, OS id=1630 
2025-10-14T21:38:23.000833+08:00
ARC0 (PID:1616): Archived Log entry 3 added for T-1.S-3205 ID 0x6539a8b7 LAD:1
2025-10-14T21:38:23.524409+08:00
Undo initialization recovery: err:0 start: 19911054 end: 19911123 diff: 69 ms (0.1 seconds)
2025-10-14T21:38:23.826920+08:00
[26765] Successfully onlined Undo Tablespace 2.
Undo initialization online undo segments: err:0 start: 19911123 end: 19911426 diff: 303 ms (0.3 seconds)
Undo initialization finished serial:0 start:19911054 end:19911454 diff:400 ms (0.4 seconds)
Dictionary check beginning
Tablespace 'TEMP' #3 found in data dictionary,
but not in the controlfile. Adding to controlfile.
Dictionary check complete
Verifying minimum file header compatibility for tablespace encryption..
Verifying file header compatibility for tablespace encryption completed for pdb 0
*********************************************************************
WARNING: The following temporary tablespaces contain no files.
         This condition can occur when a backup controlfile has
         been restored.  It may be necessary to add files to these
         tablespaces.  That can be done using the SQL statement:
 
         ALTER TABLESPACE <tablespace_name> ADD TEMPFILE
 
         Alternatively, if these temporary tablespaces are no longer
         needed, then they can be dropped.
           Empty temporary tablespace: TEMP
*********************************************************************
Database Characterset is AL32UTF8
No Resource Manager plan active
2025-10-14T21:38:24.645423+08:00
Hex dump of (file 1, block 132585) in trace file /u01/app/diag/rdbms/orcl/ORCL/trace/ORCL_ora_26765.trc

Invalid temporary block relative dba: 0x004205e9 (file 1, block 132585)
Bad header found during buffer read
Data in bad block:
 type: 66 format: 2 rdba: 0x4c444242
 last change scn: 0x4c44.4242.4c444242 seq: 0x44 flg: 0x4c
 spare3: 0x4c44
 consistency value in tail: 0xfcc60601
 check value in block header: 0x4242
 computed block checksum: 0x9e0a

Reading datafile '/data/app/oracle/oradata/ORCL/system01.dbf' for corrupt data at rdba:0x004205e9(file 1,block 132585)
Reread (file 1, block 132585) found same corrupt data (no logical check)
2025-10-14T21:38:24.838291+08:00
Errors in file /u01/app/diag/rdbms/orcl/ORCL/trace/ORCL_ora_26765.trc  (incident=51502):
ORA-01578: ORACLE data block corrupted (file # 1, block # 132585)
ORA-01110: data file 1: '/data/app/oracle/oradata/ORCL/system01.dbf'
Incident details in: /u01/app/diag/rdbms/orcl/ORCL/incident/incdir_51502/ORCL_ora_26765_i51502.trc
2025-10-14T21:38:24.861113+08:00
Corrupt Block Found
         TIME STAMP (GMT) = 10/14/2025 21:38:23
         CONT = 0, TSN = 0, TSNAME = SYSTEM
         RFN = 1, BLK = 132585, RDBA = 4326889
         OBJN = 66, OBJD = 64, OBJECT = C_OBJ#_INTCOL#, SUBOBJECT = 
         SEGMENT OWNER = SYS, SEGMENT TYPE = Cluster Segment
2025-10-14T21:38:26.554220+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2025-10-14T21:38:26.769742+08:00
Errors in file /u01/app/diag/rdbms/orcl/ORCL/trace/ORCL_ora_26765.trc:
ORA-00604: error occurred at recursive SQL level 2
ORA-01578: ORACLE data block corrupted (file # 1, block # 132585)
ORA-01110: data file 1: '/data/app/oracle/oradata/ORCL/system01.dbf'
2025-10-14T21:38:26.769940+08:00
Errors in file /u01/app/diag/rdbms/orcl/ORCL/trace/ORCL_ora_26765.trc:
ORA-00604: error occurred at recursive SQL level 2
ORA-01578: ORACLE data block corrupted (file # 1, block # 132585)
ORA-01110: data file 1: '/data/app/oracle/oradata/ORCL/system01.dbf'
Error 604 happened during db open, shutting down database
Errors in file /u01/app/diag/rdbms/orcl/ORCL/trace/ORCL_ora_26765.trc  (incident=51503):
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00604: error occurred at recursive SQL level 2
ORA-01578: ORACLE data block corrupted (file # 1, block # 132585)
ORA-01110: data file 1: '/data/app/oracle/oradata/ORCL/system01.dbf'
Incident details in: /u01/app/diag/rdbms/orcl/ORCL/incident/incdir_51503/ORCL_ora_26765_i51503.trc
Invalid temporary block relative dba: 0x004205e9 (file 1, block 132585)
Bad header found during validation
Data in bad block:
 type: 66 format: 2 rdba: 0x4c444242
 last change scn: 0x4c44.4242.4c444242 seq: 0x44 flg: 0x4c
 spare3: 0x4c44
 consistency value in tail: 0xfcc60601
 check value in block header: 0x4242
 computed block checksum: 0x9e0a

Reread of blocknum=132585, file=/data/app/oracle/oradata/ORCL/system01.dbf. found same corrupt data
Reread of blocknum=132585, file=/data/app/oracle/oradata/ORCL/system01.dbf. found same corrupt data
Reread of blocknum=132585, file=/data/app/oracle/oradata/ORCL/system01.dbf. found same corrupt data
Reread of blocknum=132585, file=/data/app/oracle/oradata/ORCL/system01.dbf. found same corrupt data
Reread of blocknum=132585, file=/data/app/oracle/oradata/ORCL/system01.dbf. found same corrupt data
Checker run found 1 new persistent data failures
2025-10-14T21:38:27.515191+08:00
opiodr aborting process unknown ospid (26765) as a result of ORA-603
2025-10-14T21:38:27.528063+08:00
ORA-603 : opitsk aborting process
License high water mark = 18
USER (ospid: (prelim)): terminating the instance due to ORA error 
2025-10-14T21:38:28.576271+08:00
Instance terminated by USER(prelim), pid = 26765

通过分析alert日志可以确定是由于file 1, block 132585损坏,对应的对象为C_OBJ#_INTCOL#,该对象是histgrm$表的cluster,非数据库必须核心对象,可以在启动的时候跳过该对象,启动数据库,然后设置event对该对象进行处理

SQL> alter database open;

Database altered.

SQL>  create table good_histgrm$ as select * from histgrm$;

Table created.

SQL> truncate cluster c_obj#_intcol#;

Cluster truncated.

SQL> insert into histgrm$ select * from good_histgrm$ ;

286071 rows created.

SQL> commit;

Commit complete.

SQL> select /*+ full(a) */ count(*) from histgrm$ a ;

  COUNT(*)
----------
    286071

SQL> select /*+ full(a) */ count(*) from histgrm$ a ;

  COUNT(*)
----------
    286071

至此整体上完成该库的恢复任务

数据库启动报ORA-600 6711故障分析处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:数据库启动报ORA-600 6711故障分析处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

几个月以前的一个数据库故障,今天拿出来在win上重新分析,数据库启动报ORA-600 6711错

C:\Users\XFF>SQLPLUS / AS SYSDBA

SQL*Plus: Release 12.1.0.2.0 Production on 星期日 7月 14 16:17:32 2024

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

已连接到空闲例程。

SQL> startup mount pfile='d:/pfile.txt'
ORACLE 例程已经启动。

Total System Global Area 6442450944 bytes
Fixed Size                  6205768 bytes
Variable Size            1493175992 bytes
Database Buffers         4932501504 bytes
Redo Buffers               10567680 bytes
数据库装载完毕。
SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [6711], [4436379], [1], [4436389],
[0], [], [], [], [], [], [], []
进程 ID: 44144
会话 ID: 67 序列号: 39084

根据经验该报错为:ORA-600 [6711] “Cluster Key Chain corruption”,也就是说很可能是cluster相关对象异常导致该问题.


对启动过程进行跟踪

PARSING IN CURSOR #17695456 len=189 dep=4  tim=233428646426 hv=186852205 ad='7ffda1eea168' sqlid='2tkw12w5k68vd'
select user#,password,datats#,tempts#,type#,defrole,resource$, ptime,
decode(defschclass,NULL,'DEFAULT_CONSUMER_GROUP',defschclass),
spare1,spare4,ext_username,spare2 from user$ where name=:1
END OF STMT
PARSE #17695456:c=0,e=168,p=0,cr=0,cu=0,mis=1,r=0,dep=4,og=4,plh=0,tim=233428646426
BINDS #17695456:
 Bind#0
  oacdty=01 mxl=32(03) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=871 siz=32 off=0
  kxsbbbfp=010b2df0  bln=32  avl=03  flg=05
  value="SYS"
EXEC #17695456:c=0,e=418,p=0,cr=0,cu=0,mis=1,r=0,dep=4,og=4,plh=1457651150,tim=233428646901
WAIT #17695456: nam='db file sequential read' ela= 126 file#=1 block#=417 blocks=1 obj#=46 tim=233428647046
FETCH #17695456:c=0,e=153,p=1,cr=2,cu=0,mis=0,r=1,dep=4,og=4,plh=1457651150,tim=233428647069
STAT #17695456 id=1 cnt=1 pid=0 pos=1 obj=22 op='TABLE ACCESS BY INDEX ROWID USER$ 
  (cr=2 pr=1 pw=0 time=151 us cost=1 size=139 card=1)'
STAT #17695456 id=2 cnt=1 pid=1 pos=1 obj=46 op='INDEX UNIQUE SCAN I_USER1 (cr=1 pr=1 pw=0 time=149 us)'
CLOSE #17695456:c=0,e=2,dep=4,type=0,tim=233428647111
Incident 2601 created, dump file: C:\APP\XFF\diag\rdbms\ecp\ecp\incident\incdir_2601\ecp_ora_40516_i2601.trc
ORA-00600: 内部错误代码, 参数: [6711], [4436379], [1], [4436389], [0], [], [], [], [], [], [], []

FETCH #15289752:c=2062500,e=2544215,p=13,cr=65626,cu=28,mis=0,r=0,dep=3,og=3,plh=3312420081,tim=233431176536
=====================
PARSE ERROR #387363008:len=50 dep=1 uid=0 oct=3 lid=0 tim=233431176680 err=600
select cost from resource_cost$ where resource#=:1
ORA-00600: 内部错误代码, 参数: [6711], [4436379], [1], [4436389], [0], [], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [6711], [4436379], [1], [4436389], [0], [], [], [], [], [], [], []

这个操作触发了递归查询

PARSING IN CURSOR #387319440 len=151 dep=5 lid=0 tim=233428641503 hv=2507062328 ad='7ffd9ffa23a8' sqlid='7u49y06aqxg1s'
select /*+ rule */ bucket, endpoint, col#, epvalue, epvalue_raw, ep_repeat_count from histgrm$ 
where obj#=:1 and intcol#=:2 and row#=:3 order by bucket
END OF STMT
PARSE #387319440:c=0,e=11,p=0,cr=0,cu=0,mis=0,r=0,dep=5,og=3,plh=3312420081,tim=233428641503
BINDS #387319440:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000001 frm=00 csi=00 siz=72 off=0
  kxsbbbfp=00eb2be0  bln=22  avl=02  flg=05
  value=22
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=00eb2bf8  bln=22  avl=02  flg=01
  value=2
 Bind#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000001 frm=00 csi=00 siz=0 off=48
  kxsbbbfp=00eb2c10  bln=22  avl=01  flg=01
  value=0
EXEC #387319440:c=0,e=105,p=0,cr=0,cu=0,mis=0,r=0,dep=5,og=3,plh=3312420081,tim=233428641652
WAIT #387319440: nam='db file sequential read' ela= 124 file#=1 block#=45660 blocks=1 obj#=66 tim=233428641792
FETCH #387319440:c=0,e=173,p=1,cr=3,cu=0,mis=0,r=20,dep=5,og=3,plh=3312420081,tim=233428641834
STAT #387319440 id=1 cnt=20 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=3 pr=1 pw=0 time=169 us cost=0 size=0 card=0)'
STAT #387319440 id=2 cnt=20 pid=1 pos=1 obj=66 op='TABLE ACCESS CLUSTER HISTGRM$ (cr=3 pr=1 pw=0 time=148 us)'
STAT #387319440 id=3 cnt=1 pid=2 pos=1 obj=65 op='INDEX UNIQUE SCAN I_OBJ#_INTCOL# (cr=2 pr=0 pw=0 time=2 us)'
CLOSE #387319440:c=0,e=36,dep=5,type=3,tim=233428641886

查看对应的trace文件

[TOC00000]
Jump to table of contents
Dump continued from file: C:\APP\XFF\diag\rdbms\ecp\ecp\trace\ecp_ora_40516.trc
[TOC00001]
ORA-00600: 内部错误代码, 参数: [6711], [4436379], [1], [4436389], [0], [], [], [], [], [], [], []

[TOC00001-END]
[TOC00002]
========= Dump for incident 2601 (ORA 600 [6711]) ========
[TOC00003]
----- Beginning of Customized Incident Dump(s) -----
kdsDumpState: cdb: 0 dspdb: 0 type: 3
*** ENTER: kds state dump ***
            row 0x0043b1a5.28 continuation at: 0x0043b1a5.0 file# 1 block# 242085 slot 0 (dscnt: 0)
KDSTABN_GET: 1 ..... ntab: 2
curSlot: 0 ..... nrows: 40
Dumping kcb descriptor:
kcbds 0x0000000017100DF0 : tsn 0, rdba 0x0043b1a5, afn 1, objd 64, cls 1, tidflg 0x0 0x0 0x0
    dsflg 0x00100000, dsflg2 0x00004000, lobid 00000000:00000000, cnt 0, addr 0x00007FFD55D1C014 dx 0x0000000000000000
    env [0x0000000017178C7C]: (scn: 0x0000.54290647   xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  
    statement num=0  parent xid:  0x0000.000.00000000  st-scn: 0x0000.00000000  
    hi-scn: 0x0000.00000000  ma-scn: 0x0000.00000000  flg: 0x00000660)
kcb_dw_scan_dumpctx: not in DW scan
kdsgrp1_dump database not fully open
*** EXIT: kds state dump ***
----- End of Customized Incident Dump(s) -----
[TOC00003-END]

通过对相关rdba进行dump分析,确认对象id为64和trace中报的信息匹配

DUL> rdba 0x0043b1a5

  rdba   : 0x0043b1a5=4436389
  rfile# : 1
  block# : 242085

DUL> dump datafile 1 block 242085 header
Block Header:
block type=0x06 (table/index/cluster segment data block)
block format=0xa2 (oracle 10)
block rdba=0x0043b1a5 (file#=1, block#=242085)
scn=0x0000.438d4a86, seq=1, tail=0x4a860601
block checksum value=0xd591=54673, flag=6
Data Block Header Dump:
 Object id on Block? Y
 seg/obj: 0x40=64  csc: 0x00.438d4a80  itc: 2  flg: -  typ: 1 (data)
     fsl: 0  fnx: 0x0 ver: 0x01

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0002.01f.00014b92  0x00c01897.6e20.07  C---    0  scn 0x0000.438c5fca
0x02   0x000a.01a.0011bb8e  0x00c0292c.0317.42  --U-   22  fsc 0x0000.438d4a86
Data Block Dump:
================
flag=0x0 --------
ntab=2
nrow=41
frre=23
fsbo=0x68
ffeo=0xb90
avsp=0x1ce1
tosp=0x1ce1

进一步分析该id为什么对象,使用dul unload obj$
c_obj#_intcol#


确认对对象为cluster C_OBJ#_INTCOL#,对应的表为HISTGRM$(统计信息中存储直方图信息表),明白这一些,处理起来就比较容易了,open数据库过程中绕过该对象访问,然后对该表进行处理即可