library cache latch等待事件

产生library cache latch原因

The library cache latches protect the cached SQL statements and objects' definitions held
in the library cache within the shared pool. The library cache latch must be acquired
in order to add a new statement to the library cache. During a parse, Oracle searches
the library cache for a matching statement. If one is not found, then Oracle will parse
the SQL statement, obtain the library cache latch and insert the new SQL.

每一个sql被执行之前,先要到library cache中根据hash_value查找parent cursor,这就需要先获得library cache latch;找到parent cursor后,就会去查找对应的child cursor,当发现无法找到时,就会释放library cache latch,获得share pool latch分配空间给硬解析后的产生的执行计划;然后再次获得library cache latch进行把执行计划放入share pool,转入library cache pin+lock(null模式)开始执行sql.library cache latch 的个数有限(与CPU_COUNT参数相关),当数据库中出现大量硬解析的时候,某一个sql无法得到library cache latch就会开始spin,达到spin count后还没得到,就会开始sleep,达到sleep时间后,醒来还再次试图过的library cache latch得不到就在spin再得不到又sleep…依此类推.
综上可知:在sql执行的过程中可以看出在出现High Versions Count和Hard Parse的情况下都有可能出现library cache latch等待.
关于Hard Parse见:shared pool latch 等待事件
关于High Versions Count见:关于High Versions Count总结

1._KGL_LATCH_COUNT控制library cache latches数量

The hidden parameter _KGL_LATCH_COUNT controls the number of library cache latches.
The default value should be adequate, but if contention for the library cache latch cannot  be resolved,
it one may consider increasing this value. The default value for _KGL_LATCH_COUNT is the next prime number
after CPU_COUNT. This value cannot exceed 67. 

2.Library cache: mutex X
在10g及其以后版本中,很多latch使用mutex代替,我们常见的Library cache: mutex X is similar to library cache wait in earlier version.(_kks_use_mutex_pin=false可以禁止mutex)

ORA-07445 [ACCESS_VIOLATION] [UNABLE_TO_READ] []

alert中发现ORA-07445错误
ORA-07445: exception encountered: core dump [PC:0x7FFF65D0] [ACCESS_VIOLATION] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0] [UNABLE_TO_READ] []错误,导致数据库down掉

Mon May 14 14:34:34 2012
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0, {empty}]
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_p001_1280.trc:
ORA-07445: exception encountered: core dump [PC:0x7FFF65D0] [ACCESS_VIOLATION]
[ADDR:0xFFFFFFFF] [PC:0x7FFF65D0] [UNABLE_TO_READ] []
Mon May 14 14:34:35 2012
Trace dumping is performing id=[cdmp_20120514143435]
Mon May 14 14:35:10 2012
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0, {empty}]
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_1072.trc  (incident=164712):
ORA-07445: exception encountered: core dump [PC:0x7FFF65D0] [ACCESS_VIOLATION]
[ADDR:0xFFFFFFFF] [PC:0x7FFF65D0] [UNABLE_TO_READ] []
ORA-12080: Buffer cache miss for IOQ batching
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_164712\orcl_smon_1072_i164712.trc

分析trace文件

Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Windows Server 2003 Version V5.2 Service Pack 2
CPU                 : 8 - type 586, 4 Physical Cores
Process Affinity    : 0x00000000
Memory (Avail/Total): Ph:4892M/8189M, Ph+PgF:5638M/9795M, VA:925M/4095M
Instance name: orcl
Redo thread mounted by this instance: 1
Oracle process number: 12
Windows thread id: 1072, image: ORACLE.EXE (SMON)
--以上信息得出操作系统和数据库版本2003 sp2+oracle11g(11.1.0.6 32位)
Dump continued from file: d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_1072.trc
ORA-07445: exception encountered: core dump [PC:0x7FFF65D0] [ACCESS_VIOLATION]
[ADDR:0xFFFFFFFF] [PC:0x7FFF65D0] [UNABLE_TO_READ] []
ORA-12080: Buffer cache miss for IOQ batching
========= Dump for incident 164712 (ORA 7445 [PC:0x7FFF65D0]) ========
----- Beginning of Customized Incident Dump(s) -----
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0, {empty}]
--这里的ORA-07445 [ACCESS_VIOLATION][UNABLE_TO_READ]根据经验结合这里的32位的环境,
--怀疑是sga使用的内存太多,ORACLE数据库不能读SGA相关内存导致
--在trace中找出相关参数配置.
[0004]: processes=300
[0004]: sessions=335
[0004]: __shared_pool_size=1124073472
[0004]: __large_pool_size=8388608
[0004]: __java_pool_size=16777216
[0004]: __streams_pool_size=251658240
[0004]: streams_pool_size=251658240
[0004]: sga_target=0
[0004]: __sga_target=1887436800
[0004]: memory_target=3145728000
[0004]: memory_max_target=4722786304
[0004]: db_block_size=8192
[0004]: __db_cache_size=478150656
[0004]: __shared_io_pool_size=0
[0004]: compatible=11.1.0.0.0
[0004]: log_buffer=8851456
[0004]: __pga_aggregate_target=780140544
--这里可以看到sga_target分配了内存为1887436800=1.7578125G
--pga_aggregate_target分配了780140544=0.7265625G
--两者内存之和大于2G,超过了32位ORACLE默认限制

查询MOS发现[1341681.1]
该错误原因

This is a resource issue (memory in particular). 32-bit windows systems,
are limited to 2GB of addressable memory so if you are on this platform
it's likely you are simply exceeding the capabilities of the 32bit operating system.

解决建议

First recommendation :
If you have not already done so, add the /3GB switch to your boot.ini file and reboot the server. The
boot.ini will be located in the root directory on the drive where windows is installed. The switch, /3GB,
is placed at the end of the line that executes the WinNT loading process.
This will allow applications such as oracle access to 3Gb or memory instead of 2Gb.
Example:
[operating systems] multi(0)disk(0)rdisk(0)partition(2)\WINNT="Windows NT Server Version 4.00" /3GB
 Second recommendation :
You do not want to increase memory target. If anything, this should be decreased.
You are limited to under 2GB of addressable memory on 32bit windows (the limit is actually about 1.85GB).
This is for both SGA and PGA memory for all instances; you have to reduce the SGA size for the instance.
The recommendation is to reduce sga_target, memory_target, and memory_max_target.

9i库遇到ORA-01595/ORA-01594

在alert日志中发现ORA-01595/ORA-01594错误

Sat May 12 21:54:17 2012
Errors in file /oracle/app/admin/prmdb/bdump/prmdb2_smon_483522.trc:
ORA-01595: error freeing extent (2) of rollback segment (19))
ORA-01594: attempt to wrap into rollback segment (19) extent (2) which is being freed

分析trace文件

Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.8.0 - Production
ORACLE_HOME = /oracle/app/product/9.2.0
System name:    AIX
Node name:      prmsvr2
Release:        3
Version:        5
Machine:        0008585FD600
Instance name: prmdb2
Redo thread mounted by this instance: 2
Oracle process number: 14
Unix process pid: 483522, image: oracle@prmsvr2 (SMON)
*** 2011-05-03 15:28:47.858
*** SESSION ID:(17.1) 2011-05-03 15:28:47.843
*** 2011-05-03 15:28:47.858
SMON: Parallel transaction recovery tried
*** 2011-07-11 17:13:52.028
SMON: Restarting fast_start parallel rollback
*** 2011-07-11 17:28:39.705
SMON: Parallel transaction recovery tried
*** 2012-05-12 21:54:17.246   --当前问题时间点
SMON: following errors trapped and ignored:
ORA-01595: error freeing extent (2) of rollback segment (19))
ORA-01594: attempt to wrap into rollback segment (19) extent (2) which is being freed
--通过trace文件,我们没有获得关于该错误的其他有用信息

查询MOS相关信息[280151.1]
出现该错误原因

This is a known problem and there is an Internal Bug:2181139 for this Issue.
The error is signaled because smon is shrinking a rollback segment and this fails
because we need an extent to store some rollback information. This is a failure message
for the shrinking. Subsequently smon would succeed in doing that.
--当smon在shrink rollback segment因为需要一个extent存放rollback

解决建议

Ignore these error messages.
Normally adding more undo space should solve the problem,
but if space is not correcting the problem, please file an SR for assistance.
This error message logging is fixed in 10g.
--忽略该错误或者升级到10g

关于High Versions Count总结

High Versions Count后果
sql查询–>hash对比确定是否存在shared pool中(不讨论不存在情况)–>选择合适的children
可能导致library cache latch contention

When you have unnecessary versions of a cursor, each time that cursor is executed,
the parse engine has to search through the list of versions to see which is the cursor that you want.
This wastes CPU cycles that you could be using on something else.
High version counts can easily cause high contention for library cache latches.
A process parsing a SQL statement with many versions (children cursors) will need to scan
through all these children while holding on to a library cache latch.
This means that other processes needing the same latch will have to wait and
can lead to significant database-wide performance degradation.

引起High Versions Count原因

•UNBOUND_CURSOR - The existing child cursor was not fully built (in other words, it was not optimized)
•SQL_TYPE_MISMATCH - The SQL type does not match the existing child cursor
•**OPTIMIZER_MISMATCH - The optimizer environment does not match the existing child cursor.
For example:
select count(*) from emp; ->> 1 PARENT, 1 CHILD
alter session set optimizer_mode=ALL_ROWS
select count(*) from emp; ->> 1 PARENT, 2 CHILDREN
    (The optimizer mode has changed and therefore
       the existing child cannot be reused)
(The same applies with events - if I turned on tracing with 10046 than
I would get the OPTIMIZER_MISMATCH again and a 3rd child)
•OUTLINE_MISMATCH - The outlines do not match the existing child cursor
If my user had created stored outlines previously for this command and they were
stored in seperate categories (say "OUTLINES1" and "OUTLINES2") running:-
alter session set use_stored_outlines = OUTLINES1;
select count(*) from emp;
alter session set use_stored_oulines= OUTLINES2;
select count(*) from emp;
--> Would create a 2nd child as the outline used is different than the first run.
•STATS_ROW_MISMATCH - The existing statistics do not match the existing child cursor.
Check that 10046/sql_trace is not set on all sessions as this can cause this.
•LITERAL_MISMATCH - Non-data literal values do not match the existing child cursor
•SEC_DEPTH_MISMATCH - Security level does not match the existing child cursor
•EXPLAIN_PLAN_CURSOR - The child cursor is an explain plan cursor and should not be shared.
Explain plan statements will generate a new child by default - the mismatch will be this.
•BUFFERED_DML_MISMATCH - Buffered DML does not match the existing child cursor
•PDML_ENV_MISMATCH - PDML environment does not match the existing child cursor
•INST_DRTLD_MISMATCH - Insert direct load does not match the existing child cursor
•SLAVE_QC_MISMATCH -The existing child cursor is a slave cursor and the new one was issued by the coordinator
(or, the existing child cursor was issued by the coordinator and the new one is a slave cursor).
•TYPECHECK_MISMATCH - The existing child cursor is not fully optimized
•AUTH_CHECK_MISMATCH - Authorization/translation check failed for the existing child cursor
The user does not have permission to access the object in any previous version of the cursor.
A typical example would be where each user has it's own copy of a table
•**BIND_MISMATCH - The bind metadata does not match the existing child cursor. For example:
variable a varchar2(100);
select count(*) from emp where ename = :a ->> 1 PARENT, 1 CHILD
variable a varchar2(400);
select count(*) from emp where ename = :a ->> 1 PARENT, 2 CHILDREN
   (The bind 'a' has now changed in definition)
•DESCRIBE_MISMATCH - The typecheck heap is not present during the describe for the child cursor
•LANGUAGE_MISMATCH - The language handle does not match the existing child cursor
•TRANSLATION_MISMATCH - The base objects of the existing child cursor do not match.
The definition of the object does not match any current version.
Usually this is indicative of the same issue as "AUTH_CHECK_MISMATCH" where the object is different.
•ROW_LEVEL_SEC_MISMATCH - The row level security policies do not match
•INSUFF_PRIVS - Insufficient privileges on objects referenced by the existing child cursor
•INSUFF_PRIVS_REM - Insufficient privileges on remote objects referenced by the existing child cursor
•REMOTE_TRANS_MISMATCH - The remote base objects of the existing child cursor do not match
USER1: select count(*) from table@remote_db
USER2: select count(*) from table@remote_db
  (Although the SQL is identical, the dblink pointed to
   by remote_db may be a private dblink which resolves
   to a different object altogether)
•LOGMINER_SESSION_MISMATCH
•INCOMP_LTRL_MISMATCH
•OVERLAP_TIME_MISMATCH - error_on_overlap_time mismatch
•SQL_REDIRECT_MISMATCH - sql redirection mismatch
•MV_QUERY_GEN_MISMATCH - materialized view query generation
•USER_BIND_PEEK_MISMATCH - user bind peek mismatch
•TYPCHK_DEP_MISMATCH - cursor has typecheck dependencies
•NO_TRIGGER_MISMATCH - no trigger mismatch
•FLASHBACK_CURSOR - No cursor sharing for flashback
•ANYDATA_TRANSFORMATION - anydata transformation change
•INCOMPLETE_CURSOR - incomplete cursor.
When bind length is upgradeable (i.e. we found a child cursor that matches everything
else except that the bind length is not long enough), we mark the old cursor is not usable
and build a new one.  This means the version can be ignored.
•TOP_LEVEL_RPI_CURSOR - top level/rpi cursor
In a Parallel Query invocation this is expected behaviour (we purposely do not share)
•DIFFERENT_LONG_LENGTH - different long length
•LOGICAL_STANDBY_APPLY - logical standby apply mismatch
•DIFF_CALL_DURN - different call duration
•BIND_UACS_DIFF - bind uacs mismatch
•PLSQL_CMP_SWITCHS_DIFF - plsql compiler switches mismatch
•CURSOR_PARTS_MISMATCH - cursor-parts executed mismatch
•STB_OBJECT_MISMATCH - STB object different (now exists)
•ROW_SHIP_MISMATCH - row shipping capability mismatch
•PQ_SLAVE_MISMATCH - PQ slave mismatch
Check you want to be using PX with this reason code, as the problem could be caused by running
lots of small SQL statements which do not really need PX. If you are on < 11i you may be hitting Bug:4367986
•TOP_LEVEL_DDL_MISMATCH - top-level DDL cursor
•MULTI_PX_MISMATCH - multi-px and slave-compiled cursor
•BIND_PEEKED_PQ_MISMATCH - bind-peeked PQ cursor
•MV_REWRITE_MISMATCH - MV rewrite cursor
•ROLL_INVALID_MISMATCH - rolling invalidation window exceeded
This is caused by the rolling invalidation capability in DBMS_STATS.
The child cannot be shared as it's invalidation window is exceeded. See:
Note:557661.1  Rolling Cursor Invalidations with DBMS_STATS in Oracle10g (Doc ID 557661.1)
•OPTIMIZER_MODE_MISMATCH - optimizer mode mismatch
•PX_MISMATCH - parallel query mismatch
If running 11.1.0.6 and RAC see Bug:7352775.
Check that if (on each instance) parallel_instance_groups is set then instance_groups is set to the same.
•MV_STALEOBJ_MISMATCH - mv stale object mismatch
•FLASHBACK_TABLE_MISMATCH - flashback table mismatch
•LITREP_COMP_MISMATCH - literal replacement compilation mismatch
New in 11g :
•PLSQL_DEBUG - debug mismatch Session has debugging parameter plsql_debug set to true
•LOAD_OPTIMIZER_STATS  - Load optimizer stats for cursor sharing
•ACL_MISMATCH   -  Check ACL mismatch
•FLASHBACK_ARCHIVE_MISMATCH  - Flashback archive mismatch
•LOCK_USER_SCHEMA_FAILED  - Failed to lock user and schema
•REMOTE_MAPPING_MISMATCH  - Remote mapping mismatch
•LOAD_RUNTIME_HEAP_FAILED  - Runtime heap mismatch
•HASH_MATCH_FAILED  - Hash mismatch
Set to "Y" if sharing fails due to a hash mismatch, such as the case with mismatched histogram data
or a range predicate marked as unsafe by literal replacement (See Bug:3461251)
New in 11.2  :
•PURGED_CURSOR - cursor marked for purging
The cursor has been marked for purging with dbms_shared_pool.purge
•BIND_LENGTH_UPGRADEABLE - bind length upgradeable
 Could not be shared because a bind variable size was smaller than the new value beiing inserted
(marked as BIND_MISMATCH in earlier versions).
•USE_FEEDBACK_STATS - cardinality feedback
Cardinality feedback is being used and therefore a new plan could be formed for the current execution.
•BIND_EQUIV_FAILURE - The bind value's selectivity does not match that used to optimize the existing child cursor
There is no longer  ROW_LEVEL_SEC_MISMATCH in 11.2.

High Versions Count查询
1.使用version_rpt function查询

--install version_rpt3_21
connect / as sysdba
start version_rpt3_21.sql
-- Generate reports for all cursors with more than 100 versions using SQL_ID (10g and up)
select b.* from v$sqlarea a ,table(version_rpt(a.sql_id)) b where loaded_versions >=100;
-- Generate reports for all cursors with more than 100 versions using HASH_VALUE
select b.* from v$sqlarea a ,table(version_rpt(null,a.hash_value)) b where loaded_versions>=100;
-- Generate the report for cursor with sql_id cyzznbykb509s
select * from table(version_rpt('cyzznbykb509s'));

2.直接查询

select sql_id,version_count, ADDRESS,sql_text
from   v$sqlarea
where version_count > 10
order by version_count, hash_value;
•Version 9.2.X.X and below :
select * from v$sql_shared_cursor where kglhdpar = '0000000386BC2E58'
•Version 10.0.X.X and above:
select * from v$sql_shared_cursor where address = '0000000386BC2E58'
NOTE:The 'Y's denote a mismatch

High Versions Count跟踪

CURSORTRACE(10G及其以后版本)
TO trace on using:-
alter system set events
'immediate trace name cursortrace level 577, address hash_value';
(level 578/580 can be used for high level tracing (577=level 1, 578=level 2, 580=level 3)
To turn off tracing use:-
alter system set events
'immediate trace name cursortrace level 2147483648, address 1';
Please note: BUG:5555371 exists in 10.2 (fixed in 10.2.0.4) where cursor trace cannot fully be turned off
and single line entries will still be made to the trace file as a result.
The w/a is to restart the instance. How invasive this BUG is depends on the executions of the cursor
(and the size of the resultant trace file additions)
cursordump(11GR2)
alter system set events 'immediate trace name cursordump level 16'

version_rpt3_21脚本:下载
参考:Troubleshooting: High Version Count Issues [ID 296377.1]

ORACLE在AIX中产生SOFTWARE PROGRAM ABNORMALLY TERMINATED警告原因

数据库中发现如下错误
该错误的解决方案:ORA-07445[dbgrmqmqpk_query_pick_key()+0f88]

Dump file /oracle/diag/rdbms/sgerp5/sgerp5/incident/incdir_579300/sgerp5_m000_7602504_i579300.trc
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/product/11.1.0/db_1
System name:    AIX
Node name:  sgerp5
Release:    1
Version:    6
Machine:    00C8F0564C00
Instance name: sgerp5
Redo thread mounted by this instance: 1
Oracle process number: 138
Unix process pid: 7602504, image: oracle@sgerp5 (m000)
*** 2012-05-11 03:52:35.200
*** SESSION ID:(752.5029) 2012-05-11 03:52:35.200
*** CLIENT ID:() 2012-05-11 03:52:35.200
*** SERVICE NAME:(SYS$BACKGROUND) 2012-05-11 03:52:35.200
*** MODULE NAME:(MMON_SLAVE) 2012-05-11 03:52:35.200
*** ACTION NAME:(Auto-Purge Slave Action) 2012-05-11 03:52:35.200
Dump continued from file: /oracle/diag/rdbms/sgerp5/sgerp5/trace/sgerp5_m000_7602504.trc
ORA-07445: exception encountered: core dump [dbgrmqmqpk_query_pick_key()+0f88]
[SIGSEGV] [ADDR:0xB38F0000000049][PC:0x100213C08] [Address not mapped to object] []

errpt错误说明
在产生7445错误的同时观察aix系统错误日志发现SOFTWARE PROGRAM ABNORMALLY TERMINATED错误

sgerp5_[oracle]-->errpt -aj A924A5FC
---------------------------------------------------------------------------
LABEL:          CORE_DUMP
IDENTIFIER:     A924A5FC
Date/Time:       Fri May 11 03:52:55 BEIST 2012
Sequence Number: 471
Machine Id:      00C8F0564C00
Node Id:         sgerp5
Class:           S
Type:            PERM
WPAR:            Global
Resource Name:   SYSPROC
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Probable Causes
SOFTWARE PROGRAM
User Causes
USER GENERATED SIGNAL
        Recommended Actions
        CORRECT THEN RETRY
Failure Causes
SOFTWARE PROGRAM
        Recommended Actions
        RERUN THE APPLICATION PROGRAM
        IF PROBLEM PERSISTS THEN DO THE FOLLOWING
        CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
SIGNAL NUMBER
           6
USER'S PROCESS ID:
               7602504
FILE SYSTEM SERIAL NUMBER
          14
INODE NUMBER
           0      367648
CORE FILE NAME
/oracle/diag/rdbms/sgerp5/sgerp5/cdump/core_7602504/core
PROGRAM NAME
oracle
STACK EXECUTION DISABLED
           0
COME FROM ADDRESS REGISTER
sskgmcrea 0
PROCESSOR ID
  hw_fru_id: 1
  hw_cpu_id: 2
ADDITIONAL INFORMATION
skgdbgcra 224
??
ksdbgcra 3D0
ssexhd 978
??
Symptom Data
REPORTABLE
1
INTERNAL ERROR
SYMPTOM CODE
PCSS/SPI2 FLDS/oracle SIG/6 FLDS/skgdbgcra VALU/224

错误原因

This error is logged when a software program abnormally ends and causes a core dump. Users might
not be exiting applications correctly, the system might have been shut down while users were
working in application, or the user's terminal might have locked up and the application stopped
1)这里也就是说如果oracle进程在aix机器上异常终止,并且产生了一个core dump文件,
  就会出现SOFTWARE PROGRAM ABNORMALLY TERMINATED警告信息
2)用户登录系统没有正常退出,而系统被关闭
3)用户强制终止一个一个lock,而导致进程停止

本次AIX日志警告原因:由于进程7602504异常终止(ORA-07445错误)并且产生了 /oracle/diag/rdbms/sgerp5/sgerp5/cdump/core_7602504/core dump 文件,从而有了AIX中的SOFTWARE PROGRAM ABNORMALLY TERMINATED警告信息

ORA-07445[dbgrmqmqpk_query_pick_key()+0f88]

alert发现如下错误ORA-07445[dbgrmqmqpk_query_pick_key()+0f88]

Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0xB38F0000000049]
[PC:0x100213C08, dbgrmqmqpk_query_pick_key()+0f88]
Errors in file /oracle/diag/rdbms/sgerp5/sgerp5/trace/sgerp5_m000_7602504.trc  (incident=579300):
ORA-07445: exception encountered: core dump [dbgrmqmqpk_query_pick_key()+0f88] [SIGSEGV] [ADDR:0xB38F0000000049]
[PC:0x100213C08] [Address not mapped to object] []
Incident details in: /oracle/diag/rdbms/sgerp5/sgerp5/incident/incdir_579300/sgerp5_m000_7602504_i579300.trc

trace文件部分信息

Dump file /oracle/diag/rdbms/sgerp5/sgerp5/incident/incdir_579300/sgerp5_m000_7602504_i579300.trc
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/product/11.1.0/db_1
System name:	AIX
Node name:	sgerp5
Release:	1
Version:	6
Machine:	00C8F0564C00
Instance name: sgerp5
Redo thread mounted by this instance: 1
Oracle process number: 138
Unix process pid: 7602504, image: oracle@sgerp5 (m000)
--确定是m000进程出现异常,而该进程是awr收集统计信息进程MMON的子进程
*** 2012-05-11 03:52:35.200
*** SESSION ID:(752.5029) 2012-05-11 03:52:35.200
*** CLIENT ID:() 2012-05-11 03:52:35.200
*** SERVICE NAME:(SYS$BACKGROUND) 2012-05-11 03:52:35.200
*** MODULE NAME:(MMON_SLAVE) 2012-05-11 03:52:35.200
*** ACTION NAME:(Auto-Purge Slave Action) 2012-05-11 03:52:35.200
Dump continued from file: /oracle/diag/rdbms/sgerp5/sgerp5/trace/sgerp5_m000_7602504.trc
ORA-07445: exception encountered: core dump [dbgrmqmqpk_query_pick_key()+0f88] [SIGSEGV] [ADDR:0xB38F0000000049]
[PC:0x100213C08] [Address not mapped to object] []
----- Incident Context Dump -----
Address: 0x1104bdb68
Incident ID: 579300
Problem Key: ORA 7445 [dbgrmqmqpk_query_pick_key()+0f88]
Error: ORA-7445 [dbgrmqmqpk_query_pick_key()+0f88] [SIGSEGV] [ADDR:0xB38F0000000049] [PC:0x100213C08]
[Address not mapped to object] [] [] []
[00]: dbgexExplicitEndInc [diag_dde]
[01]: dbgeEndDDEInvocationImpl [diag_dde]
[02]: dbgeEndDDEInvocation [diag_dde]
[03]: ssexhd []
[04]: 47dc []<-- Signaling
[05]: dbgrmqmfs_fetch_setup [ams_comp]
[06]: dbgrmqmf_fetch_real [ams_comp]
[07]: dbgrmqmf_fetch [ams_comp]
[08]: dbgrip_fetch_record [ami_comp]
[09]: dbgrip_relation_iterator [ami_comp]
[10]: dbgripricm_rltniter_wcbf_mt [ami_comp]
[11]: dbgripdrm_dmldrv_mt [ami_comp]
[12]: dbghmm_delete_info_records []
[13]: dbghmo_purge_hm_schema []
[14]: dbgrupipscb_hm_pgsvc_cbf [diag_adr]
[15]: dbgruppm_purge_main [diag_adr]
[16]: dbkrapg_auto_purge [rdbms_adr]
[17]: kewraps_auto_purge_slave []
[18]: kebm_slave_main []
[19]: ksvrdp [ksv_trace]
[20]: opirip []
[21]: opidrv []
[22]: sou2o []
[23]: opimai_real []
[24]: main []
[25]: __start []
MD [00]: 'SID'='752.5029' (0x3)
MD [01]: 'ProcId'='138.46' (0x3)
MD [02]: 'PQ'='(0, 1336679546)' (0x7)
MD [03]: 'Client ProcId'='oracle@sgerp5.7602504_1' (0x0)

MOS关于该问题记录
问题原因

This is due to Bug 9390347 fixed in 12.1 & 11.2.0.2, where a core dump can occur
in module dbgrmqmqpk_query_pick_key() whilst purging HM contents from ADR.

解决方案

- Either install our 11.2.0.2 patchset
- Or download and apply Patch 9390347 if available for your version/platform.
- On Windows, you can also install Bundle Patch 11.1.0.7.31 or above.
There is no workaround to this error, however the error is not serious and does not cause any harm to your database.

常驻连接池(Database Resident Connection Pool)

一.介绍常驻连接池(Database Resident Connection Pool,Oracle DRCP)
数据库驻留连接池是Oracle Database 11g的一个新特性,专门为了解决在需要支持大量连接的环境对可扩性的迫切需求而设计的。数据库驻留连接池把数据库服务器进程和对话汇合起来(这样的组合称之为池服务器),通过从单主机或不同主机发出的多个应用软件进程的连接进行共享。由一个连接代理(Connection Broker)进程控制着数据库后台进程中的池服务器。连接代理会持续的连接客户并对客户进行验证。当需要进行某种数据库活动时,客户将请求连接代理提供池服务器,使用完毕后再将它们释放以供其他客户重新使用。当池服务器处在使用当中时,相当于一台专用服务器。对于来自常驻通道中的客户端连接请求,连接代理会为其选择一个合适的池服务器,并把客户端请求交给该池服务器处理,不再干涉。此后客户通过和该池服务器的直接对话来完成所有的数据库活动。当客户完成请求任务释放池服务器后,连接代理将重新接管该池服务器。

二.什么时间使用DRCP
1 使用较小内存的、大量的客户端连接
2 客户端应用是相似的,可以共享或重用会话
3 客户端占用数据库连接的周期相当短
4 会话不需要跨客户请求
5 客户端有众多的主机与进程

三.Dedicated Servers,Shared Servers与DRCP的内存需求和区别对比
一般情况下,由于每个会话需要消耗400k的内存,每个进程需要消耗4m的内存,现在我们以DRCP的pool size是100,shared server的shared server进程是100为例,假如有5000个客户端连接到这些环境,则这些主机的内存分配如下:

A Dedicated Server
Memory used = 5000 X (400 KB + 4 MB) = 22 GB
B Shared Server
Memory used = 5000 X 400 KB + 100 X 4 MB = 2.5 GB
Out of the 2.5 GB, 2 GB is allocated from the SGA
.
C Database Resident Connection Pooling
Memory used = 100 X (400 KB + 4 MB) + (5000 X 35KB)= 615 MB


四.使用DRCP时,当实例有活动pooled server,有以下限制:
1 不能shutdown database;
2 不能停掉DRCP;
3 不能用database link连接到不同实例的DRCP;
4 不能使用Advanced Security Option (ASO),比如encryption等

五.客户端如何连接到DRCP
如果是专用服务器连接,则SERVER=DEDICATED,如果是DRCP连接,则SEVER=POOLED。如果要指定客户端请求到DRCP,则客户端的tnsnames.ora中的连接字符串必须指定连接类型是POOLED,配置方式如下所示:

ORA11G_P =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.10)(PORT = 1521))
    )
    (CONNECT_DATA =
     (SERVER=POOLED)  --注意
      (SERVICE_NAME = ora11g)
    )
  )

说明:
1)如果在tnsnames.ora中指定了SERVER=POOLED,但并没有在实例中启动DRCP,则当客户请求连接时,DB会报ORA-12520错误。
2)11g的客户端才可以使用DRCP,如果10g的客户端在tnsnames.ora中指定了SERVER=POOLED,则连接时报ORA-56606。

六.DRCP配置/查询
1.DRCP启动/关闭

--查看当前DRCP状态
SQL> select CONNECTION_POOL,status from dba_cpool_info;
CONNECTION_POOL                STATUS
------------------------------ --------------------------------
SYS_DEFAULT_CONNECTION_POOL    INACTIVE
--启动DRCP
SQL> execute dbms_connection_pool.start_pool;
PL/SQL procedure successfully completed.
SQL> select CONNECTION_POOL,status from dba_cpool_info;
CONNECTION_POOL                STATUS
------------------------------ --------------------------------
SYS_DEFAULT_CONNECTION_POOL    ACTIVE
--关闭DRCP
SQL> exec dbms_connection_pool.stop_pool
PL/SQL procedure successfully completed.
SQL> select CONNECTION_POOL,status from dba_cpool_info;
CONNECTION_POOL                STATUS
------------------------------ --------------------------------
SYS_DEFAULT_CONNECTION_POOL    INACTIVE

2.修改DRCP参数

--dbms_connection_pool.configure_pool
exec dbms_connection_pool.configure_pool(
	POOL_NAME=>'SYS_DEFAULT_CONNECTION_POOL',
	minsize=>10,
	maxsize=>100,
	INCRSIZE=>10,
	SESSION_CACHED_CURSORS=>50,
	inactivity_timeout=>3000,
	max_think_time=>100,
	MAX_USE_SESSION=>10000,
	MAX_LIFETIME_SESSION=>36000
   );
--dbms_connection_pool.alter_param
exec dbms_connection_pool.alter_param(
	POOL_NAME=>'SYS_DEFAULT_CONNECTION_POOL',
	PARAM_NAME=>'MINSIZE',
	PARAM_VALUE=>'2');

3.DRCP视图

DBA_CPOOL_INFO
displays configuration information about all Database Resident Connection Pools in the database.
V$CPOOL_STATS
displays information about the Database Resident Connection Pool statistics for an instance
V$CPOOL_CC_STATS
displays information about the connection class level statistics for
the Database Resident Connection Pool per instance.
V$CPOOL_CONN_INFO
displays connection information about each connection to the connection broker.
V$CPOOL_CC_INFO
displays information about the pool-to-connection class mapping for
the Database Resident Connection Pool per instance.

七.DRCP相关进程

oracle   11715     1  0 21:38 ?        00:00:00 ora_n000_ora11g
oracle   11719     1  0 21:38 ?        00:00:00 ora_l000_ora11g
oracle   11723     1  0 21:38 ?        00:00:00 ora_l001_ora11g
oracle   11727     1  0 21:38 ?        00:00:00 ora_l002_ora11g
oracle   11731     1  0 21:38 ?        00:00:02 ora_l003_ora11g
oracle   12490     1  0 21:57 ?        00:00:00 ora_l004_ora11g
oracle   12494     1  0 21:57 ?        00:00:00 ora_l005_ora11g
oracle   12498     1  0 21:57 ?        00:00:00 ora_l006_ora11g
oracle   12502     1  0 21:57 ?        00:00:00 ora_l007_ora11g
oracle   12506     1  0 21:57 ?        00:00:00 ora_l008_ora11g
oracle   12510     1  0 21:57 ?        00:00:00 ora_l009_ora11g
oracle   12514     1  0 21:57 ?        00:00:01 ora_l010_ora11g
oracle   12518     1  0 21:57 ?        00:00:00 ora_l011_ora11g
oracle   12522     1  0 21:57 ?        00:00:00 ora_l012_ora11g
oracle   12526     1  0 21:57 ?        00:00:00 ora_l013_ora11g
oracle   12530     1  0 21:57 ?        00:00:00 ora_l014_ora11g
oracle   12534     1  0 21:57 ?        00:00:00 ora_l015_ora11g
oracle   12538     1  0 21:57 ?        00:00:00 ora_l016_ora11g
oracle   12542     1  0 21:57 ?        00:00:00 ora_l017_ora11g
oracle   12546     1  0 21:57 ?        00:00:00 ora_l018_ora11g
oracle   12550     1  0 21:57 ?        00:00:00 ora_l019_ora11g
ora_n000_ora11g
Connection Broker Process
ora_l000_ora11g
Pooled Server Process(Handles client requests in Database Resident Connection Pooling)

提高短连接性能方法测试

创建测试脚本
通过在三个会话中同时执行test_login.sh脚本,模拟当数据库多个短连接情况性能

[oracle@xifenfei tmp]$ more test_login.sh
#!/bin/bash
echo "start login database `date`*********" >>/tmp/test_1.log
e=2000
for((i=1;i<=$e;i=i+1))
do
/tmp/login_oracle.sh
done
echo "end login database `date`*********" >>/tmp/test_1.log
[oracle@xifenfei tmp]$ more login_oracle.sh
#!/bin/bash
sqlplus chf/xifenfei@ORA11G_P<<XFF>/dev/null
select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') from dual;
exit
XFF
--ORA11G_P根据不同的测试情景指定不同名称

情况1:一个监听情况下

start login database Tue May  1 18:03:32 CST 2012*********
start login database Tue May  1 18:03:35 CST 2012*********
start login database Tue May  1 18:03:37 CST 2012*********
end login database Tue May  1 18:08:20 CST 2012*********
end login database Tue May  1 18:08:25 CST 2012*********
end login database Tue May  1 18:08:26 CST 2012*********
--计算2000个会话登录/查询/推出时间
4:48
4:40
4:49

情况2:三个监听,客户端配置tns负载均衡

--监听配置
LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.10)(PORT = 1521))
    )
  )
SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
     (GLOBAL_DBNAME = ora11g)
     (ORACLE_HOME = /u01/oracle/oracle/product/11.2.0/db_1)
     (SID_NAME = ora11g)
    )
  )
ADR_BASE_LISTENER = /u01/oracle
LISTENER1 =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.10)(PORT = 1522))
    )
  )
SID_LIST_LISTENER1 =
  (SID_LIST =
    (SID_DESC =
     (GLOBAL_DBNAME = ora11g)
     (ORACLE_HOME = /u01/oracle/oracle/product/11.2.0/db_1)
     (SID_NAME = ora11g)
    )
  )
ADR_BASE_LISTENER1 = /u01/oracle
LISTENER2 =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.10)(PORT = 1523))
    )
  )
SID_LIST_LISTENER2 =
  (SID_LIST =
    (SID_DESC =
     (GLOBAL_DBNAME = ora11g)
     (ORACLE_HOME = /u01/oracle/oracle/product/11.2.0/db_1)
     (SID_NAME = ora11g)
    )
  )
ADR_BASE_LISTENER2 = /u01/oracle
--tns配置
ORA11G_M =
  (DESCRIPTION =
      (LOAD_BALANCE=ON)
      (FAILOVER=ON)
      (ADDRESS_LIST =
       (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.10)(PORT = 1521))
       (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.10)(PORT = 1522))
       (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.10)(PORT = 1523))
       (LOAD_BALANCE = yes)
    )
    (CONNECT_DATA =
     (SERVER=DEDICATED)
      (SERVICE_NAME = ora11g)
    )
  )
--测试结果
start login database Tue May  1 17:51:45 CST 2012*********
start login database Tue May  1 17:51:49 CST 2012*********
start login database Tue May  1 17:51:51 CST 2012*********
end login database Tue May  1 17:55:58 CST 2012*********
end login database Tue May  1 17:56:06 CST 2012*********
end login database Tue May  1 17:56:09 CST 2012*********
--计算2000个会话登录/查询/推出时间
4:13
4:17
4:18

情况2:使用常驻连接池DRCP(11g新特性)

--启动默认DRCP
SQL> exec dbms_connection_pool.start_pool();
PL/SQL procedure successfully completed.
--tns配置
ORA11G_P =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.10)(PORT = 1521))
    )
    (CONNECT_DATA =
     (SERVER=POOLED)  --注意
      (SERVICE_NAME = ora11g)
    )
  )
--执行结果
start login database Tue May  1 18:19:58 CST 2012*********
start login database Tue May  1 18:20:01 CST 2012*********
start login database Tue May  1 18:20:03 CST 2012*********
end login database Tue May  1 18:23:16 CST 2012*********
end login database Tue May  1 18:23:19 CST 2012*********
end login database Tue May  1 18:23:21 CST 2012*********
--计算2000个会话登录/查询/推出时间
3:16
3:18
3:19

总结
如果在数据库短连接过程中发现监听是瓶颈的时候,可以考虑使用多个监听+tns 负载均衡,从一定程度上缓解监听瓶颈.如果是11g数据库可以考虑使用其心功能DRCP,从而很大程度上提高短连接过程中数据库的效率.因为DRCP还属于11g的新功能稳定性不知道如何?使用该功能前,请一定要做好相关测试工作.如有可能还是建议从应用层面尽可能的使用长连接,提高数据库会话效率.

shared pool latch 等待事件

shared pool latch相关描述

The shared pool latch is used to protect critical operations when allocating
and freeing memory in the shared pool.
If an application makes use of literal (unshared) SQL then this can severely
limit scalability and throughput. The cost of parsing a new SQL statement is
expensive both in terms of CPU requirements and the number of times the library
cache and shared pool latches may need to be acquired and released. Before Oracle9,
there was just one such latch for the entire database to protect the allocation of
memory in the library cache. In Oracle9, multiple children were introduced to relieve
contention on this resource.

减少shared pool latch方法

Avoid hard parses when possible, parse once, execute many.
Eliminate  literal SQL so that same sql is shared by many sessions.
Size the shared_pool adequately to avoid reloads
Use of MTS (shared server option) also greatly influences the shared pool latch.

查询未绑定sql

--9i
SELECT substr(sql_text,1,40) "SQL",
         count(*) ,
         sum(executions) "TotExecs"
    FROM v$sqlarea
   WHERE executions < 5
   GROUP BY substr(sql_text,1,40)
  HAVING count(*) > 30
   ORDER BY 2
  ;
--10g及其以后版本
SET pages 10000
SET linesize 250
column FORCE_MATCHING_SIGNATURE format 99999999999999999999999
WITH c AS
     (SELECT  FORCE_MATCHING_SIGNATURE,
              COUNT(*) cnt
     FROM     v$sqlarea
     WHERE    FORCE_MATCHING_SIGNATURE!=0
     GROUP BY FORCE_MATCHING_SIGNATURE
     HAVING   COUNT(*) > 20
     )
     ,
     sq AS
     (SELECT  sql_text                ,
              FORCE_MATCHING_SIGNATURE,
              row_number() over (partition BY FORCE_MATCHING_SIGNATURE ORDER BY sql_id DESC) p
     FROM     v$sqlarea s
     WHERE    FORCE_MATCHING_SIGNATURE IN
              (SELECT FORCE_MATCHING_SIGNATURE
              FROM    c
              )
     )
SELECT   sq.sql_text                ,
         sq.FORCE_MATCHING_SIGNATURE,
         c.cnt "unshared count"
FROM     c,
         sq
WHERE    sq.FORCE_MATCHING_SIGNATURE=c.FORCE_MATCHING_SIGNATURE
AND      sq.p                       =1
ORDER BY c.cnt DESC

查询数据库整体解析情况

select to_char(100 * sess / calls, '999999999990.00') || '%' cursor_cache_hits,
to_char(100 * (calls - sess - hard) / calls, '999990.00') || '%' soft_parses,
to_char(100 * hard / calls, '999990.00') || '%' hard_parses
from ( select value calls from v$sysstat where name = 'parse count (total)' ),
( select value hard from v$sysstat where name = 'parse count (hard)' ),
( select value sess from v$sysstat where name = 'session cursor cache hits' );

参考:Note 62143.1 Troubleshooting: Tuning the Shared Pool and Tuning Library Cache Latch