dns解析导致opiodr aborting process unknown ospid (7266) as a result of ORA-609类似错误

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:dns解析导致opiodr aborting process unknown ospid (7266) as a result of ORA-609类似错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库alert日志中出现大量类似opiodr aborting process unknown ospid (7266) as a result of ORA-609错误

Tue Oct 27 07:51:10 2015
opiodr aborting process unknown ospid (7266) as a result of ORA-609
Tue Oct 27 08:16:56 2015
opiodr aborting process unknown ospid (7660) as a result of ORA-609
Tue Oct 27 08:17:16 2015
opiodr aborting process unknown ospid (7666) as a result of ORA-609
Tue Oct 27 08:21:50 2015
opiodr aborting process unknown ospid (7725) as a result of ORA-609
Tue Oct 27 08:37:43 2015
opiodr aborting process unknown ospid (7940) as a result of ORA-609
Tue Oct 27 08:51:25 2015
opiodr aborting process unknown ospid (8126) as a result of ORA-609
Tue Oct 27 09:07:16 2015
opiodr aborting process unknown ospid (8359) as a result of ORA-609
Tue Oct 27 09:26:09 2015
opiodr aborting process unknown ospid (8600) as a result of ORA-609

测试tnsping

[oracle@Sql admin]$ tnsping orcl
TNS Ping Utility for Linux: Version 11.2.0.1.0 - Production on 28-OCT-2015 01:40:55
Copyright (c) 1997, 2009, Oracle.  All rights reserved.
Used parameter files:
/opt/oracle/product/11.2.0/db_1/network/admin/sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.162)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = orcl.Sql.xifenfei.com)))
OK (20120 msec)
[oracle@Sql admin]$ tnsping orcl
TNS Ping Utility for Linux: Version 11.2.0.1.0 - Production on 28-OCT-2015 01:42:35
Copyright (c) 1997, 2009, Oracle.  All rights reserved.
Used parameter files:
/opt/oracle/product/11.2.0/db_1/network/admin/sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.162)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = orcl.Sql.xifenfei.com)))
OK (41600 msec)

hosts文件配置

[oracle@Sql admin]$ vi /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
192.168.2.162 Sql.xifenfei.com
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

同时测试lsnrctl start/stop/status都很慢,但是ping localhost非常快,证明问题在监听之上.search 关键字lsnrctl slow/tnsping slow关键字发现有相关文档:
Bug 8307164 : TNSPING 11G USING DNS AND NOT HOSTS FILE
Listener Startup or Connections Hang in 11g (Doc ID 803838.1)
11g: Remote Connections Take Very Long to Establish (Doc ID 856820.1)
通过这些文档,可以知道在oracle 10g中解析主机名使用gethostbyname()函数,11g中使用了使用getaddrinfo()函数.这个函数对于/etc/nsswitch.conf的调用和ethostbyname()不一样,从而出现此类问题.根据官方建议:

Linux
hosts: files [NOTFOUND=continue] dns OR  hosts:files/dns
HPUX and Solaris
ipnodes: files [NOTFOUND=continue] dns

本次解决方法
因为对于没有使用到dns服务器的数据库环境而言,没有必要配置这玩意,给自己埋雷,估计是当时安装yum之时没有清理掉

vi /etc/nsswitch.conf
hosts:      files nds  -->hosts:      files

再次测试tnsping,lsnrctl等命令,速度ok

[oracle@Sql"]$ tnsping orcl
TNS Ping Utility for Linux: Version 11.2.0.1.0 - Production on 28-OCT-2015 01:58:30
Copyright (c) 1997, 2009, Oracle.  All rights reserved.
Used parameter files:
/opt/oracle/product/11.2.0/db_1/network/admin/sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.162)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = orcl.Sql.xifenfei.com)))
OK (10 msec)

对于此类监听慢的问题,可以通过对client/server端trace进一步跟踪

TRACE_LEVEL_CLIENT = 16
TRACE_FILE_CLIENT = client/server
TRACE_DIRECTORY_CLIENT = [any valid directory path]
TRACE_TIMESTAMP_CLIENT = ON
DIAG_ADR_ENABLED=off

ORA-609 TNS-12537 and TNS-12547 in 11g Alert.log

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-609 TNS-12537 and TNS-12547 in 11g Alert.log

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库alert日志出现如下错误

Fatal NI connect error 12537, connecting to:
 (LOCAL=NO)
  VERSION INFORMATION:
        TNS for IBM/AIX RISC System/6000: Version 11.2.0.2.0 - Production
        TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.2.0 - Production
        Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.2.0 - Production
  Time: 21-AUG-2012 09:50:15
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12537
TNS-12537: TNS:connection closed
    ns secondary err code: 12560
    nt main err code: 0
    nt secondary err code: 0
    nt OS err code: 0
opiodr aborting process unknown ospid (15204768) as a result of ORA-609

错误的原因

The ORA-609 error is thrown when a client connection of any kind failed to complete or
aborted the connection process before the connection/authentication process was complete.
Very often, this connection abort is due to a timeout.  Beginning with 10gR2, a default value for
inbound connect timeout has been set at 60 seconds.
This time limit is often inadequate for the entire connection process to complete.
We have also discovered that the ORA-609 occurs frequently in installations
where the database is monitored by DB Console and the Enterprise Manager agent (emagent).
After the DB Console is started and as a matter of routine, the emagent will repeatedly try to
connect to the target instances.
We can see frequent emagent connections in the listener.log without error.
However, on occasion it may have failed to complete the connection process at the database so an ORA-609 is thrown.
The emagent will simply retry the connection and may be successful on the subsequent try.
(Provided there is no real fault occurring at the listener or database).
This temporary failure to connect will not be reported back to DB Console and there will be no indication,
 except for the ORA-609, that a fault occurred.

出现这个问题的主要原因是因为从10.2开始inbound connect timeout默认为60 seconds,而在很多建立连接过程可能超过这个时间从而出现类此错误,常见的诱因是DB Console 和 Enterprise Manager agent (emagent). EM会重复的尝试连接到数据库。其过程中会偶尔的出现连接超时的问题,但是接下来会继续尝试,并获得成功。这种临时的失败不会导致EM报错而只会以ora-609的形式记录在alert log中.

处理方法

For that reason, we often recommend increasing the values for INBOUND_CONNECT_TIMEOUT at both listener
and server side sqlnet.ora file as a preventive measure.
If the problem  is due to connection timeouts, an increase in the following parameters should eliminate
or reduce the occurrence of the ORA-609s.
e.g.
Sqlnet.ora: SQLNET.INBOUND_CONNECT_TIMEOUT=180
Listener.ora: INBOUND_CONNECT_TIMEOUT_listener_name=120
These settings are in seconds.  Again, the default is 60.

问题跟踪方法

If the issue persists and inbound connect does not have any effect, the following steps are intended to
help locate  the client that may be causing the errors.
1)  Suppress the TNS errors in the alert.log by setting the following listener.ora file parameter:
DIAG_ADR_ENABLED_listener_name=OFF
This will cause the TNS errors to be posted to the ORACLE_HOME/network/log/sqlnet.log file that is
local to the database and
may yield useful information about the client's address.
For example, here's a snippet from a server side sqlnet.log where client address info was posted:
Production Time: 15-FEB-2010 07:15:01
Fatal NI connect error 12537, connecting to:
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(Host=yourhost)(Port=1521))(CONNECT_DATA=(SID=PROD1DR)
(CID=(PROGRAM=sqlplus)(HOST=client_host)(USER=client))))
Observe the PROGRAM and HOST fields on the last line.  This is where the connection originated.
Be sure to match timestamps in the sqlnet.log with the timestamps of the alert.log errors.
Once you've located the offending client,
you can enable client tracing to try and determine the cause:
TRACE_LEVEL_CLIENT=16
TRACE_DIRECTORY_CLIENT=<dir location>
TRACE_TIMESTAMP_CLIENT=TRUE
DIAG_ADR_ENABLED=off   <<<<<11g or newer client requirement
If you need assistance with client or server tracing, please open an SR with Global Customer Support.
2)  Check the listener.log for client connections that were logged at timestamps that match the ORA-609
timestamps as they appear in the alert.log.
The client information is recorded in each listener.log entry.
Since this error occurs AFTER the listener has handled the connection,
do not expect to see errors in the listener.log.
Here's an example snippet of an incoming client connection that was posted to the listener.log:
20-JAN-2009 17:08:45 (CONNECT_DATA=(SID=orcl)(CID=
(PROGRAM=D:\oracle\product\10.1.0\Db_1\perl\5.6.1\bin\MSWin32-x86\perl.exe)(HOST=myclient)
Note that the exact timestamp, program name and client host will often be recorded.  Again,
once you've located the offending client,
enable tracing (see above) to try to capture the connection failure.
3)  Enable server side Oracle Net tracing and capture the TNS error along with the incoming connection.
Match the PID that accompanies the ORA-609 to the server trace label.  e.g.
ORA-609 : opiodr aborting process unknown ospid (4799_1)  *Note the PID
This PID would correspond to server trace labeled:  svr_4799.trc.  Check the server trace for either
TNS error (the 609 will not appear)
and try to locate the originating client address.  If assistance is needed for this investigation,
please open an SR with Oracle Support.
See below for instuctions on enabling Oracle Net server tracing.
The following details the discovery of the source of an ORA-609 for a real case:
The alert.log reports the following messages intermittently but frequently:
Mon Nov 16 22:39:22 2009
ORA-609 : opiodr aborting process unknown ospid (nnnn)
Enabled Oracle Net server tracing:
TRACE_LEVEL_SERVER=16
TRACE_DIRECTORY_SERVER=<dir location>
TRACE_TIMESTAMP_SERVER=TRUE
DIAG_ADR_ENABLED=off
Reloaded listener and wait for error to appear again.:
ORA-609 : opiodr aborting process unknown ospid (5233_1)
Note that the server trace file set that corresponded to this event was named svr_5233*.trc.
Of course the timestamps of the alert.log event and the server trace creation matched as well.
A review of the server trace showed only an EOF failure and the  TNS-12537 error:
Read unexpected EOF ERROR
nserror: nsres: id=0, op=68, ns=12537
In this particular case, there was no information about the client in the trace.
This is atypical for a server trace.
It may be that the client aborted before all the client information was posted to the file.
However, there was post in the listener.log f
or an emagent connection that was established at the same point in time.
Here's an excerpt from a listener.log entry where an emagent establishes a connection:
PROGRAM=D:\oracle\product\10.1.0\Db_1\bin\emagent.exe)
Checked the EM Agent traces and logs and discovered the following entry:
Fatal NI connect error 12547, connecting to:
(LOCAL=NO)
VERSION INFORMATION:
TNS for Solaris: Version 11.1.0.7.0 - Production
Oracle Bequeath NT Protocol Adapter for Solaris: Version 11.1.0.7.0 - Production
TCP/IP NT Protocol Adapter for Solaris: Version 11.1.0.7.0 - Production
Time: 16-NOV-2009 22:39:22
****Tracing to file: /backup/sid_traces/sqlnetlog/svr_5233.trc
Tns error struct:
ns main err code: 12547
TNS-12547: TNS:lost contact
ns secondary err code: 12560
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0
****Note the name of the server trace which contains the PID:  svr_5233.trc
Also, the timestamp of the agent event matches the timestamp of the alert.log error.
Check the following locations for EM Agent traces. If working with support on this issue and
the EM Agent is suspected, upload ALL files under:
$ORACLE_HOME/sysman/log/emagent.trc < Single node agent trace location
$ORACLE_HOME/host/sysman/log/emagent.trc < RAC agent trace location
It was determined that in this case, the emagent was aborting the connection
before it was complete and then simply reconnecting
and succeeding on the subsequent try.  No errors were reported in the listener log or listener trace.
No errors were returned to the DB Console.
There was no apparent outage of any kind.  No action was taken to correct the ORA-609 in this case.
It was decided that the message was informational and completely benign.

参考文档:
ORA-609 TNS-12537 and TNS-12547 in 11g Alert.log (Doc ID 1116960.1)
Troubleshooting Guide ORA-609 : Opiodr aborting process unknown ospid (Doc ID 1121357.1)