联系:手机/微信(+86 17813235971) QQ(107644445)
标题:ORA-00700: soft internal error, arguments: [kskvmstatact: excessive swapping observed]
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
有一朋友数据库经常crash,让我帮忙分析和解决该问题
数据库版本
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production PL/SQL Release 12.1.0.2.0 - Production CORE 12.1.0.2.0 Production TNS for Linux: Version 12.1.0.2.0 - Production NLSRTL Version 12.1.0.2.0 - Production
alert日志报错信息
Mon Apr 23 02:14:18 2018 Process 0x0x10f33262f8 appears to be hung while dumping Current time = 464149508, process death time = 464089392 interval = 60000 Called from location UNKNOWN:UNKNOWN Attempting to kill process 0x0x10f33262f8 with OS pid = 30813 OSD kill succeeded for process 0x10f33262f8 Instance Critical Process (pid: 9, ospid: 30813, DBRM) died unexpectedly Mon Apr 23 02:14:21 2018 System state dump requested by (instance=1, osid=30789 (PMON)), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_diag_30809_20180423021421.trc Mon Apr 23 02:14:22 2018 PMON (ospid: 30789): terminating the instance due to error 56710 Mon Apr 23 02:14:22 2018 opiodr aborting process unknown ospid (27086) as a result of ORA-1092 Mon Apr 23 02:14:28 2018 Instance terminated by PMON, pid = 30789
而在类似报错之前,一般有swap不足的报错
Mon Apr 23 02:03:54 2018 WARNING: Heavy swapping observed on system in last 5 mins. pct of memory swapped in [2.01%] pct of memory swapped out [0.51%]. Please make sure there is no memory pressure and the SGA and PGA are configured correctly. Look at DBRM trace file for more details. Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_dbrm_30813.trc (incident=854536): ORA-00700: soft internal error, arguments: [kskvmstatact: excessive swapping observed], [], [], [], [], [], [] Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_854536/orcl_dbrm_30813_i854536.trc Mon Apr 23 02:04:02 2018 Dumping diagnostic data in directory=[cdmp_20180423020402], requested by (instance=1, osid=30813 (DBRM))
从这里报错看,由于系统内存不足,导致大量使用swap,从而引起oracle进程被kill
分析系统内存使用情况
[www.xifenfei.com@Oracle ~]$ more /proc/meminfo MemTotal: 66109924 kB MemFree: 359848 kB Buffers: 9308 kB Cached: 1848504 kB SwapCached: 172800 kB Active: 1060368 kB Inactive: 1156100 kB Active(anon): 999208 kB Inactive(anon): 1104860 kB Active(file): 61160 kB Inactive(file): 51240 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 33554428 kB SwapFree: 30516280 kB Dirty: 68 kB Writeback: 0 kB AnonPages: 190936 kB Mapped: 1152196 kB Shmem: 1745380 kB Slab: 70900 kB SReclaimable: 25640 kB SUnreclaim: 45260 kB KernelStack: 5728 kB PageTables: 92488 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 35152108 kB Committed_AS: 70923356 kB VmallocTotal: 34359738367 kB VmallocUsed: 267468 kB VmallocChunk: 34359442996 kB HardwareCorrupted: 0 kB AnonHugePages: 18432 kB HugePages_Total: 30720 HugePages_Free: 30720 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 8192 kB DirectMap2M: 2088960 kB DirectMap1G: 65011712 kB [www.xifenfei.com@Oracle ~]$ free -m total used free shared buffers cached Mem: 64560 64200 359 1682 9 1801 -/+ buffers/cache: 62389 2170 Swap: 32767 2977 29790
比较明显系统总共内存64G,配置了60G大页,但是数据库没有使用该大页
数据库使用内存情况
SQL> show sga; Total System Global Area 7.1672E+10 bytes Fixed Size 3719544 bytes Variable Size 2684358280 bytes Database Buffers 6.8719E+10 bytes Redo Buffers 264712192 bytes
比较明显按照上述配置,一共就只有4G的空闲内存,但是oracle sga占用7G,出现大量换页是必然.错误也明显想让数据库使用大页,但是由于配置不当导致数据库无法使用大页而使用系统除大页之外的内存,从而引起系统异常.
这里也说明12c的提示有明显的改善,通过alert的错误提示基本上就可以确定是swap不足导致.