ORA-600 krhpfh_03-1210故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 krhpfh_03-1210故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

rac数据库多个节点均处于open状态,数据查询正常,但是应用入库有些时候会失败报类似ORA-01187: cannot read from file because it failed verification tests错误:
ora-01187


故障最初原因是由于有坏盘,换盘之后,有两个节点数据实例crash

Mon Aug 19 21:16:47 2024
Read of datafile '+DATA/xifenfei99.dbf' (fno 1399) header failed with ORA-01207
Rereading datafile 1399 header failed with ORA-01207
Errors in file /u01/app/oracle/diag/rdbms/xff/xff5/trace/xff5_ckpt_75779.trc:
ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
ORA-01207: file is more recent than control file - old control file
Errors in file /u01/app/oracle/diag/rdbms/xff/xff5/trace/xff5_ckpt_75779.trc:
ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
ORA-01207: file is more recent than control file - old control file
CKPT (ospid: 75779): terminating the instance due to error 1242
Mon Aug 19 21:16:47 2024
System state dump requested by (instance=5, osid=75779 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff5/trace/xff5_diag_75725.trc
Mon Aug 19 21:16:52 2024
ORA-1092 : opitsk aborting process
Mon Aug 19 21:16:53 2024
ORA-1092 : opitsk aborting process
Mon Aug 19 21:16:53 2024
License high water mark = 131
Termination issued to instance processes. Waiting for the processes to exit
Mon Aug 19 21:17:02 2024
Instance termination failed to kill one or more processes
Instance terminated by CKPT, pid = 75779
Mon Aug 19 21:17:03 2024
USER (ospid: 33495): terminating the instance
Termination issued to instance processes. Waiting for the processes to exit
Mon Aug 19 21:17:13 2024
Instance termination failed to kill one or more processes
Instance terminated by USER, pid = 33495

但是数据库人工启动成功,查询所有数据文件均处于online状态
20240820-182825


可是有部分入库进程非常慢大量等待在enq:HW – contention
20240826-120804

所有数据库节点alert日志偶尔报ORA-01186: file 1399 failed verification tests等错

Tue Aug 20 21:30:02 2024
Read of datafile '+DATA/xifenfei99.dbf' (fno 1399) header failed with ORA-01207
Rereading datafile 1399 header failed with ORA-01207
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_dbw0_43828.trc:
ORA-01186: file 1399 failed verification tests
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
ORA-01207: file is more recent than control file - old control file
File 1399 not verified due to error ORA-01122
Read of datafile '+DATA/xifenfei99.dbf' (fno 1399) header failed with ORA-01207
Rereading datafile 1399 header failed with ORA-01207
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_dbw0_43828.trc:
ORA-01186: file 1399 failed verification tests
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
ORA-01207: file is more recent than control file - old control file
File 1399 not verified due to error ORA-01122

基于这种情况,初步判断:
1. 是由于该集群本身多节点(6个节点),只要有节点是open状态,其他节点关闭再启动依旧可以正常启动,但是无法写入数据到报ORA-01207错误的数据文件中(可以读取数据).
2. 如果所有节点关闭关闭,然后数据库无法正常启动会报ORA-01207: file is more recent than control file错误

这样的情况,根据以往经验,ORA-01207: file is more recent than control file通过重建ctl即可恢复,先关闭所有节点,然后尝试启动一个节点

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
ORA-01207: file is more recent than control file - old control file
alter database open
Wed Aug 21 14:14:22 2024
SUCCESS: diskgroup REDO was mounted
Wed Aug 21 14:14:22 2024
NOTE: dependency between database xff and diskgroup resource ora.REDO.dg is established
Wed Aug 21 14:14:27 2024
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_47884.trc:
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
ORA-01207: file is more recent than control file - old control file
ORA-1122 signalled during: alter database open...

和预期的一样,重试重建ctl,然后数据库报ORA-00600 [krhpfh_03-1210]错误

SQL> shutdown immediate;
ORA-01109: database not open


Database dismounted.
ORACLE instance shut down.
SQL> startup nomount pfile='/tmp/xff/pfile';
ORACLE instance started.

Total System Global Area 1.3255E+11 bytes
Fixed Size		    2244832 bytes
Variable Size		 9.7442E+10 bytes
Database Buffers	 3.4897E+10 bytes
Redo Buffers		  208654336 bytes
SQL> @rectl

Control file created.

SQL> 
SQL> 
SQL> 
SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-01610: recovery using the BACKUP CONTROLFILE option must be done


SQL> recover database using backup controlfile;
ORA-00283: recovery session canceled due to errors
ORA-00600: internal error code, arguments: [krhpfh_03-1210], [fno =], [1399],
[fhcpc =], [274968], [fhccc =], [274983], [], [], [], [], []
ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'

这里的提示是有fhcpc和fhccc值不对导致,通过bbed查看相关值

BBED> set file 1399
	FILE#          	1399

BBED> p kcvfhccc
ub4 kcvfhccc                                @148      0x00043227 ===>274983(10进制)

BBED> p kcvfhcpc
ub4 kcvfhcpc                                @140      0x00043218 ===>274968(10进制)

报错比较明显通过bbed修改这两个值

BBED> m /x 2a390400 offset 148
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: /tmp/xff/1399.dbf.header (1399)
 Block: 1                Offsets:  148 to  659           Dba:0x5dc00001
------------------------------------------------------------------------
 2a390400 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 0c000000 0f004441 
 5441315f 5442535f 45515f30 31000000 00000000 00000000 00000000 78010000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 cfebdd33 01000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 419333df 81001c0a 6ab13046 06000000 
 c1520400 02000000 10000000 7e000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 0d000d00 0d000100 00000000 00000000 

 <32 bytes per line>

BBED> m /x 2b390400 offset 140
 File: /tmp/xff/1399.dbf.header (1399)
 Block: 1                Offsets:  140 to  651           Dba:0x5dc00001
------------------------------------------------------------------------
 2b390400 e6ef524d 2a390400 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 0c000000 0f004441 5441315f 5442535f 45515f30 31000000 00000000 00000000 
 00000000 78010000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 cfebdd33 01000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 419333df 81001c0a 
 6ab13046 06000000 c1520400 02000000 10000000 7e000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 0d000d00 0d000100 

 <32 bytes per line>

修改好这些值之后,recover database和open数据库成功,检查字典正常,业务读写也正常,完成本次恢复任务

SQL> @hcheck
HCheck Version 07MAY18 on 21-AUG-2024 15:13:02
----------------------------------------------
Catalog Version 11.2.0.3.0 (1102000300)
db_name: XFF

				   Catalog	 Fixed
Procedure Name			   Version    Vs Release    Timestamp
Result
------------------------------ ... ---------- -- ---------- --------------
------
.- LobNotInObj		       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
.- MissingOIDOnObjCol	       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
.- SourceNotInObj	       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
.- OversizedFiles	       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
.- PoorDefaultStorage	       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
.- PoorStorage		       ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
.- TabPartCountMismatch        ... 1102000300 <=  *All Rel* 08/21 15:13:02 PASS
.- OrphanedTabComPart	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- MissingSum$		       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- MissingDir$		       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- DuplicateDataobj	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- ObjSynMissing	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- ObjSeqMissing	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- OrphanedUndo 	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- OrphanedIndex	       ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- OrphanedIndexPartition      ... 1102000300 <=  *All Rel* 08/21 15:13:03 PASS
.- OrphanedIndexSubPartition   ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- OrphanedTable	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- OrphanedTablePartition      ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- OrphanedTableSubPartition   ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- MissingPartCol	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- OrphanedSeg$ 	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- OrphanedIndPartObj#	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- DuplicateBlockUse	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- FetUet		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- Uet0Check		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- SeglessUET		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- BadInd$		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- BadTab$		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- BadIcolDepCnt	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- ObjIndDobj		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- TrgAfterUpgrade	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- ObjType0		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- BadOwner		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- StmtAuditOnCommit	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- BadPublicObjects	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- BadSegFreelist	       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- BadDepends		       ... 1102000300 <=  *All Rel* 08/21 15:13:04 PASS
.- CheckDual		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- ObjectNames		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- BadCboHiLo		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- ChkIotTs		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- NoSegmentIndex	       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- BadNextObject	       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- DroppedROTS		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- FilBlkZero		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- DbmsSchemaCopy	       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- OrphanedObjError	       ... 1102000300 >  1102000000 08/21 15:13:05 PASS
.- ObjNotLob		       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- MaxControlfSeq	       ... 1102000300 <=  *All Rel* 08/21 15:13:05 PASS
.- SegNotInDeferredStg	       ... 1102000300 >  1102000000 08/21 15:13:06 PASS
.- SystemNotRfile1	       ... 1102000300 >   902000000 08/21 15:13:06 PASS
.- DictOwnNonDefaultSYSTEM     ... 1102000300 <=  *All Rel* 08/21 15:13:07 PASS
.- OrphanTrigger	       ... 1102000300 <=  *All Rel* 08/21 15:13:07 PASS
.- ObjNotTrigger	       ... 1102000300 <=  *All Rel* 08/21 15:13:07 PASS
---------------------------------------
21-AUG-2024 15:13:07  Elapsed: 5 secs
---------------------------------------
Found 0 potential problem(s) and 0 warning(s)

PL/SQL procedure successfully completed.

Statement processed.

Complete output is in trace file:
/u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_70961_HCHECK.trc