ORA-00700: soft internal error, arguments: [kskvmstatact: excessive swapping observed]

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-00700: soft internal error, arguments: [kskvmstatact: excessive swapping observed]

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有一朋友数据库经常crash,让我帮忙分析和解决该问题
数据库版本

Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
PL/SQL Release 12.1.0.2.0 - Production
CORE 12.1.0.2.0 Production
TNS for Linux: Version 12.1.0.2.0 - Production
NLSRTL Version 12.1.0.2.0 - Production

alert日志报错信息

Mon Apr 23 02:14:18 2018
Process 0x0x10f33262f8 appears to be hung while dumping
Current time = 464149508, process death time = 464089392 interval = 60000
Called from location UNKNOWN:UNKNOWN
Attempting to kill process 0x0x10f33262f8 with OS pid = 30813
OSD kill succeeded for process 0x10f33262f8
Instance Critical Process (pid: 9, ospid: 30813, DBRM) died unexpectedly
Mon Apr 23 02:14:21 2018
System state dump requested by (instance=1, osid=30789 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_diag_30809_20180423021421.trc
Mon Apr 23 02:14:22 2018
PMON (ospid: 30789): terminating the instance due to error 56710
Mon Apr 23 02:14:22 2018
opiodr aborting process unknown ospid (27086) as a result of ORA-1092
Mon Apr 23 02:14:28 2018
Instance terminated by PMON, pid = 30789

而在类似报错之前,一般有swap不足的报错

Mon Apr 23 02:03:54 2018
WARNING: Heavy swapping observed on system in last 5 mins.
pct of memory swapped in [2.01%] pct of memory swapped out [0.51%].
Please make sure there is no memory pressure and the SGA and PGA
are configured correctly. Look at DBRM trace file for more details.
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_dbrm_30813.trc  (incident=854536):
ORA-00700: soft internal error, arguments: [kskvmstatact: excessive swapping observed], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_854536/orcl_dbrm_30813_i854536.trc
Mon Apr 23 02:04:02 2018
Dumping diagnostic data in directory=[cdmp_20180423020402], requested by (instance=1, osid=30813 (DBRM))

从这里报错看,由于系统内存不足,导致大量使用swap,从而引起oracle进程被kill

分析系统内存使用情况

[www.xifenfei.com@Oracle ~]$ more /proc/meminfo
MemTotal:       66109924 kB
MemFree:          359848 kB
Buffers:            9308 kB
Cached:          1848504 kB
SwapCached:       172800 kB
Active:          1060368 kB
Inactive:        1156100 kB
Active(anon):     999208 kB
Inactive(anon):  1104860 kB
Active(file):      61160 kB
Inactive(file):    51240 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      33554428 kB
SwapFree:       30516280 kB
Dirty:                68 kB
Writeback:             0 kB
AnonPages:        190936 kB
Mapped:          1152196 kB
Shmem:           1745380 kB
Slab:              70900 kB
SReclaimable:      25640 kB
SUnreclaim:        45260 kB
KernelStack:        5728 kB
PageTables:        92488 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    35152108 kB
Committed_AS:   70923356 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      267468 kB
VmallocChunk:   34359442996 kB
HardwareCorrupted:     0 kB
AnonHugePages:     18432 kB
HugePages_Total:   30720
HugePages_Free:    30720
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        8192 kB
DirectMap2M:     2088960 kB
DirectMap1G:    65011712 kB
[www.xifenfei.com@Oracle ~]$ free -m
             total       used       free     shared    buffers     cached
Mem:         64560      64200        359       1682          9       1801
-/+ buffers/cache:      62389       2170
Swap:        32767       2977      29790

比较明显系统总共内存64G,配置了60G大页,但是数据库没有使用该大页

数据库使用内存情况

SQL> show sga;
Total System Global Area 7.1672E+10 bytes
Fixed Size                  3719544 bytes
Variable Size            2684358280 bytes
Database Buffers         6.8719E+10 bytes
Redo Buffers              264712192 bytes

比较明显按照上述配置,一共就只有4G的空闲内存,但是oracle sga占用7G,出现大量换页是必然.错误也明显想让数据库使用大页,但是由于配置不当导致数据库无法使用大页而使用系统除大页之外的内存,从而引起系统异常.
这里也说明12c的提示有明显的改善,通过alert的错误提示基本上就可以确定是swap不足导致.