联系:手机/微信(+86 17813235971) QQ(107644445)
标题:ORA-00600[kcbshlc_1]导致数据库 down 案例
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
一台服务器因为ORA-00600[kcbshlc_1]错误引起PMON异常导致数据库down掉
Sun Jul 8 17:20:10 2012 Errors in file /opt/oracle/admin/xff/bdump/xff_pmon_16412.trc: ORA-00600: internal error code, arguments: [kcbshlc_1], [33], [], [], [], [], [], [] Sun Jul 8 17:20:12 2012 Errors in file /opt/oracle/admin/xff/bdump/xff_pmon_16412.trc: ORA-00600: internal error code, arguments: [kcbshlc_1], [33], [], [], [], [], [], [] Sun Jul 8 17:20:12 2012 PMON: terminating instance due to error 472
分析trace文件
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options ORACLE_HOME = /opt/oracle/product/10.2.0 System name: Linux Node name: localhost.localdomain Release: 2.6.9-89.ELsmp Version: #1 SMP Mon Apr 20 10:33:05 EDT 2009 Machine: x86_64 Instance name: xff Redo thread mounted by this instance: 1 Oracle process number: 2 Unix process pid: 16412, image: oracle@localhost.localdomain (PMON) *** 2012-07-08 03:00:11.351 *** SERVICE NAME:(SYS$BACKGROUND) 2012-07-08 03:00:11.338 *** SESSION ID:(1105.1) 2012-07-08 03:00:11.338 wsd 0x1f8169a6c8, sbuf (nil), setid 9, op 0 lcuridx 0, lasz (nil) freeing in-flux r/w latch for process state: 1fc165d248 ... in-flux r/w latch 1fc1fcc9b0 Child cache buffers chains level=1 child#=4753 Location from where latch is held: kcbgtcr: kslbegin excl: Context saved from call: 113266196 state=busy(exclusive) (val=0x2000000000000071) holder orapid = 113 waiters [orapid (seconds since: put on list, posted, alive check)]: 139 (2, 1341687611, 2) 192 (2, 1341687611, 2) 191 (2, 1341687611, 2) 173 (2, 1341687611, 2) 185 (2, 1341687611, 2) 176 (2, 1341687611, 2) 174 (2, 1341687611, 2) 118 (2, 1341687611, 2) 190 (2, 1341687611, 2) 179 (2, 1341687611, 2) 184 (1, 1341687611, 1) 189 (1, 1341687611, 1) 177 (1, 1341687611, 1) 195 (1, 1341687611, 1) 187 (1, 1341687611, 1) 194 (1, 1341687611, 1) 147 (1, 1341687611, 1) 183 (1, 1341687611, 1) 143 (1, 1341687611, 1) 144 (1, 1341687611, 1) 186 (1, 1341687611, 1) 188 (1, 1341687611, 1) 196 (1, 1341687611, 1) 145 (1, 1341687611, 1) 193 (1, 1341687611, 1) waiter count=25 *** 2012-07-08 03:50:06.228 wsd 0x1f8169ac20, sbuf 0xac1ffafe8, setid 10, op 3 lcuridx 1, lasz 0x3c1ffc110 *** 2012-07-08 16:30:05.294 freeing in-flux r/w latch for process state: 20406507f0 ... in-flux r/w latch 1f81265f28 Child cache buffers chains level=1 child#=14180 Location from where latch is held: kcbgtcr: kslbegin excl: Context saved from call: 71341989 state=busy(exclusive) (val=0x2000000000000066) holder orapid = 102 waiters [orapid (seconds since: put on list, posted, alive check)]: 121 (2, 1341736205, 2) 116 (2, 1341736205, 2) 125 (2, 1341736205, 2) 140 (2, 1341736205, 2) 145 (2, 1341736205, 2) waiter count=5 freeing in-flux r/w latch for process state: 1fc165f9d0 ... in-flux r/w latch 1f813aec18 Child cache buffers chains level=1 child#=20914 Location from where latch is held: kcbrls: kslbegin: Context saved from call: 96505705 state=busy(exclusive) (val=0x200000000000007b) holder orapid = 123 *** 2012-07-08 17:20:10.876 wsd 0x1f8169a6c8, sbuf (nil), setid 9, op 0 lcuridx 0, lasz (nil) *** 2012-07-08 17:20:10.876 ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [kcbshlc_1], [33], [], [], [], [], [], [] ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- ksedst()+31 call ksedst1() 000000000 ? 000000001 ? 7FBFFFCEB0 ? 7FBFFFCF10 ? 7FBFFFCE50 ? 000000000 ? ksedmp()+610 call ksedst() 000000000 ? 000000001 ? 7FBFFFCEB0 ? 7FBFFFCF10 ? 7FBFFFCE50 ? 000000000 ? ksfdmp()+21 call ksedmp() 000000003 ? 000000001 ? 7FBFFFCEB0 ? 7FBFFFCF10 ? 7FBFFFCE50 ? 000000000 ? kgerinv()+161 call ksfdmp() 000000003 ? 000000001 ? 7FBFFFCEB0 ? 7FBFFFCF10 ? 7FBFFFCE50 ? 000000000 ? kgeasnmierr()+163 call kgerinv() 0066876E0 ? 2A97200260 ? 7FBFFFCF10 ? 7FBFFFCE50 ? 000000000 ? 000000000 ? kcbshlc()+239 call kgeasnmierr() 0066876E0 ? 2A97200260 ? 7FBFFFCF10 ? 7FBFFFCE50 ? 000000000 ? 000000021 ? kslilcr()+770 call kcbshlc() 0066876E0 ? 1F801DDB28 ? 7FBFFFCF10 ? 7FBFFFCE50 ? 000000000 ? 000000021 ? ksl_cleanup()+1567 call kslilcr() 7FBFFFCE50 ? 000000000 ? 7FBFFFDCE0 ? 1F801DDB28 ? 0066876E0 ? 000000021 ? ksuxfl()+492 call ksl_cleanup() 000000000 ? 000000000 ? 000000000 ? 1F801DDB28 ? 0066876E0 ? 000000021 ? ksuxda()+55 call ksuxfl() 1FC165B8E0 ? 000000000 ? 000000000 ? 1F801DDB28 ? 0066876E0 ? 000000021 ? ksucln()+1390 call ksuxda() 1FC165B8E0 ? 000000000 ? 000000000 ? 1F801DDB28 ? 0066876E0 ? 000000021 ? ksbrdp()+794 call ksucln() 060008100 ? 000000000 ? FFFFFFFF9720ED9F ? 1F801DDB28 ? 0066876E0 ? 000000021 ? opirip()+616 call ksbrdp() 060008100 ? 000000000 ? 000000001 ? 060008100 ? 0066876E0 ? 000000021 ? opidrv()+582 call opirip() 000000032 ? 000000004 ? 7FBFFFF698 ? 060008100 ? 0066876E0 ? 000000021 ? sou2o()+114 call opidrv() 000000032 ? 000000004 ? 7FBFFFF698 ? 060008100 ? 0066876E0 ? 000000021 ? opimai_real()+317 call sou2o() 7FBFFFF670 ? 000000032 ? 000000004 ? 7FBFFFF698 ? 0066876E0 ? 000000021 ? main()+116 call opimai_real() 000000003 ? 7FBFFFF700 ? 000000004 ? 7FBFFFF698 ? 0066876E0 ? 000000021 ? __libc_start_main() call main() 000000003 ? 7FBFFFF700 ? +219 000000004 ? 7FBFFFF698 ? 0066876E0 ? 000000021 ? _start()+42 call __libc_start_main() 000713984 ? 000000001 ? 7FBFFFF848 ? 005288D00 ? 000000000 ? 000000003 ?
通过这个trace可以看出数据库运行在LINUX 64操作系统,版本是10.2.0.4。
出现错误的原因:
PMON在清理1fc165d248的时候,因为被orapid = 102持有,导致清理失败.
PMON在清理20406507f0的时候,因为被orapid = 102持有,导致清理失败.
PMON在清理1fc165f9d0的时候,因为被orapid = 123持有,导致清理失败.
查询MOS[443909.1]
发现是unpublished Bug 4723109.处理方法打上Patch 4723109.