ORA-00600[kjhn_post_ha_alert0-862]原因分析

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-00600[kjhn_post_ha_alert0-862]原因分析

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库版本和平台信息
数据库版本为10.2.0.1版本,而且是32位的win 2003 sp2之上

ORACLE V10.2.0.1.0 - Production vsnsta=0
vsnsql=14 vsnxtr=3
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
Windows Server 2003 Version V5.2 Service Pack 2
CPU                 : 2 - type 586, 1 Physical Cores
Process Affinity    : 0x00000000
Memory (Avail/Total): Ph:2608M/3990M, Ph+PgF:4511M/5871M, VA:1242M/2047M
Instance name: orcl

数据库报大量ORA-600[kjhn_post_ha_alert0-862]错误
数据库的mmon进程报大量ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []错误

Wed Jun 03 21:50:40 2015
Restarting dead background process MMON
MMON started with pid=11, OS id=3804
Wed Jun 03 21:50:43 2015
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc:
ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []
Wed Jun 03 21:50:49 2015
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc:
ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []
Wed Jun 03 21:55:44 2015
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc:
ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []
Wed Jun 03 21:55:49 2015
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc:
ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []
Wed Jun 03 22:00:40 2015
Thread 1 advanced to log sequence 476
  Current log# 1 seq# 476 mem# 0: E:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO01.LOG
Wed Jun 03 22:00:44 2015
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_mmon_3804.trc:
ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []

查询对应trace文件发现

ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [] , [], [], [], []
Current SQL statement for this session:
BEGIN :success := dbms_ha_alerts_prvt.check_ha_resources; END;

人工执行该过程

SQL> var success varchar2
SQL> begin
  2  :success := sys.dbms_ha_alerts_prvt.check_ha_resources;
  3  end;
  4  /
PL/SQL procedure successfully completed.
SQL> print success
SUCCESS
--------------------------------
N

通过查询相关资料得到如下说明

@ This check is triggered with FAN enabled at this instance and it seems to be
@ associated with a startup action. From the procedure itself which is called
@ this is a run-once MMON (startup) action which supports instance down
@ notification reliability. It does the folowing a) registers the current
@ instance incarnation in recent_resource_incarnations$ if it's not already
@ there b) deletes recent_resource_incarnations$ records that don't apply to
@ this database. They may, e.g., have been copied from seed db or from a former
@ DataGuard primary c) scans recent_resource_incarnations$ for instance
@ incarnations that are no longer alive, and submits instance down alerts for
@ them . If all is good then return 'Y' else 'N' (or error) if there is a
@ failure. That failure is to get back to MMON, so that it may retry this
@ action later. In the local instance I get a 'Y' but in the customer's system
@ it fails with a 'N' which seems related to the ORA-600 assert.
@ This function is kjhn_post_ha_alert0() which is internal and does the real work of
@ posting HA alerts. It is used by both kjhn_post_ha_alert and
@ kjn_post_ha_alert_plsql. Its parameters are basically the same as those of
@ kjhn_post_ha_alert,other than the fact that it uses individual parameters
@ rather than the more easily extensible structure. Also the parameters passed
@ to it are the instance_name and the host_name which is the kernelized
@ implementation for posting HA alerts. Without actually having the arguments
@ the guess is that either the host_name or the instance_name raised in the
@ assert is null which triggered it.

mmon进程尝试调用相关程序,然后无法得出正确值,返回N,然后会一直尝试,如果不能得到返回Y,就会一直报ORA-600,错误.通过上述的三种情况来说,都和recent_resource_incarnations$表有关系.
该故障原因是由于:mmon在调用kjhn_post_ha_alert0函数在执行的时候,如果发现参数host_name或者instance_name为null,就会报该错误出来.

处理方法
This problem has been documented as Bug 5173066 REPEATED ORA-600 [KJHN_POST_HA_ALERT0-862] FROM MMON PROCESS.
The bug is fixed in 11.1.0.6. A workaround is available for the problem.
该bug在11.1.0.6中得以修复

To implement the workaround, please execute the following steps as the SYS user:
1. Collect the following information and spool it to a file for your records.
a. output of select * from v$instance
b. show parameter instance_name
c. set pages 1000
d. select * from recent_resource_incarnations$
2. Create a backup table of recent_resource_incarnations$.
SQL> create table recent_resource_inc$bk as select * from recent_resource_incarnations$;
3. Truncate recent_resource_incarnations$. Be sure to do this while the instance is up and running.
    Do not issue this statement if a shutdown is pending.
SQL> truncate table recent_resource_incarnations$;
4. Perform a clean shutdown, followed by a startup.

具体参考:
ORA-600 [kjhn_post_ha_alert0-862] Continuously Repeated in the Alert Log (Doc ID 401640.1)
Bug 5173066 : REPEATED ORA-600 [KJHN_POST_HA_ALERT0-862] FROM MMON PROCESS