联系:手机/微信(+86 17813235971) QQ(107644445)
标题:ORA-00600[kjhn_post_ha_alert0-862]原因分析
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
数据库版本和平台信息
数据库版本为10.2.0.1版本,而且是32位的win 2003 sp2之上
ORACLE V10.2.0.1.0 - Production vsnsta=0 vsnsql=14 vsnxtr=3 Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production With the Partitioning, OLAP and Data Mining options Windows Server 2003 Version V5.2 Service Pack 2 CPU : 2 - type 586, 1 Physical Cores Process Affinity : 0x00000000 Memory (Avail/Total): Ph:2608M/3990M, Ph+PgF:4511M/5871M, VA:1242M/2047M Instance name: orcl
数据库报大量ORA-600[kjhn_post_ha_alert0-862]错误
数据库的mmon进程报大量ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []错误
Wed Jun 03 21:50:40 2015 Restarting dead background process MMON MMON started with pid=11, OS id=3804 Wed Jun 03 21:50:43 2015 Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc: ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], [] Wed Jun 03 21:50:49 2015 Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc: ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], [] Wed Jun 03 21:55:44 2015 Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc: ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], [] Wed Jun 03 21:55:49 2015 Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc: ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], [] Wed Jun 03 22:00:40 2015 Thread 1 advanced to log sequence 476 Current log# 1 seq# 476 mem# 0: E:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO01.LOG Wed Jun 03 22:00:44 2015 Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc: ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []
查询对应trace文件发现
ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [] , [], [], [], [] Current SQL statement for this session: BEGIN :success := dbms_ha_alerts_prvt.check_ha_resources; END;
人工执行该过程
SQL> var success varchar2 SQL> begin 2 :success := sys.dbms_ha_alerts_prvt.check_ha_resources; 3 end; 4 / PL/SQL procedure successfully completed. SQL> print success SUCCESS -------------------------------- N
通过查询相关资料得到如下说明
@ This check is triggered with FAN enabled at this instance and it seems to be @ associated with a startup action. From the procedure itself which is called @ this is a run-once MMON (startup) action which supports instance down @ notification reliability. It does the folowing a) registers the current @ instance incarnation in recent_resource_incarnations$ if it's not already @ there b) deletes recent_resource_incarnations$ records that don't apply to @ this database. They may, e.g., have been copied from seed db or from a former @ DataGuard primary c) scans recent_resource_incarnations$ for instance @ incarnations that are no longer alive, and submits instance down alerts for @ them . If all is good then return 'Y' else 'N' (or error) if there is a @ failure. That failure is to get back to MMON, so that it may retry this @ action later. In the local instance I get a 'Y' but in the customer's system @ it fails with a 'N' which seems related to the ORA-600 assert. @ This function is kjhn_post_ha_alert0() which is internal and does the real work of @ posting HA alerts. It is used by both kjhn_post_ha_alert and @ kjn_post_ha_alert_plsql. Its parameters are basically the same as those of @ kjhn_post_ha_alert,other than the fact that it uses individual parameters @ rather than the more easily extensible structure. Also the parameters passed @ to it are the instance_name and the host_name which is the kernelized @ implementation for posting HA alerts. Without actually having the arguments @ the guess is that either the host_name or the instance_name raised in the @ assert is null which triggered it.
mmon进程尝试调用相关程序,然后无法得出正确值,返回N,然后会一直尝试,如果不能得到返回Y,就会一直报ORA-600,错误.通过上述的三种情况来说,都和recent_resource_incarnations$表有关系.
该故障原因是由于:mmon在调用kjhn_post_ha_alert0函数在执行的时候,如果发现参数host_name或者instance_name为null,就会报该错误出来.
处理方法
This problem has been documented as Bug 5173066 REPEATED ORA-600 [KJHN_POST_HA_ALERT0-862] FROM MMON PROCESS.
The bug is fixed in 11.1.0.6. A workaround is available for the problem.
该bug在11.1.0.6中得以修复
To implement the workaround, please execute the following steps as the SYS user: 1. Collect the following information and spool it to a file for your records. a. output of select * from v$instance b. show parameter instance_name c. set pages 1000 d. select * from recent_resource_incarnations$ 2. Create a backup table of recent_resource_incarnations$. SQL> create table recent_resource_inc$bk as select * from recent_resource_incarnations$; 3. Truncate recent_resource_incarnations$. Be sure to do this while the instance is up and running. Do not issue this statement if a shutdown is pending. SQL> truncate table recent_resource_incarnations$; 4. Perform a clean shutdown, followed by a startup.
具体参考:
ORA-600 [kjhn_post_ha_alert0-862] Continuously Repeated in the Alert Log (Doc ID 401640.1)
Bug 5173066 : REPEATED ORA-600 [KJHN_POST_HA_ALERT0-862] FROM MMON PROCESS