联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
某客户数据库放在x云上面,需要对数据库盘进行扩容,在扩容之前对该盘做了快照,结果没有想到悲剧发生了
[root@xifenfei ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/vda1 99G 64G 31G 68% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm tmpfs 16G 720K 16G 1% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/vdb 2.0T 1.2T 910G 56% /www/xifenfei tmpfs 3.2G 0 3.2G 0% /run/user/1004 tmpfs 3.2G 0 3.2G 0% /run/user/0
如上显示,客户的数据文件都放在/dev/vdb中了,但是很不幸,redo文件放在/data中(也就是vda磁盘组中),没有被做快照,结果客户还原vdb快照之后,发现现象如下
SQL> set pages 10000 SQL> set numw 16 SQL> SELECT status, 2 checkpoint_change#, 3 checkpoint_time,last_change#, 4 count(*) ROW_NUM 5 FROM v$datafile 6 GROUP BY status, checkpoint_change#, checkpoint_time,last_change# 7 ORDER BY status, checkpoint_change#, checkpoint_time; STATUS CHECKPOINT_CHANGE# CHECKPOINT_T LAST_CHANGE# ROW_NUM -------------- ------------------ ------------ ---------------- ---------------- ONLINE 69632585947 04-JUL-22 38 SYSTEM 69632585947 04-JUL-22 2 SQL> set numw 16 SQL> col CHECKPOINT_TIME for a40 SQL> set lines 150 SQL> set pages 1000 SQL> SELECT status, 2 to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss') checkpoint_time,FUZZY,checkpoint_change#, 3 count(*) ROW_NUM 4 FROM v$datafile_header 5 GROUP BY status, checkpoint_change#, to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss'),fuzzy 6 ORDER BY status, checkpoint_change#, checkpoint_time; STATUS CHECKPOINT_TIME FUZZY CHECKPOINT_CHANGE# ROW_NUM -------------- ---------------------------------------- ------ ------------------ ---------------- ONLINE 2022-07-04 09:03:24 YES 69631105424 40
通过上述分析,该库相当数据文件和redo文件之间相差了一段时间数据,而且该库为非归档,基于这种情况,该库只能强制打开,在打开过程中遇到ORA-600 ktpridestroy2错误
SMON: enabling tx recovery Database Characterset is AL32UTF8 No Resource Manager plan active replication_dependency_tracking turned off (no async multimaster replication found) SMON: Restarting fast_start parallel rollback Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7332.trc (incident=41257): ORA-00600: internal error code, arguments: [ktpridestroy2], [], [], [], [], [], [], [], [], [], [], [] Incident details in: /data/oracle/diag/rdbms/orcl/orcl/incident/incdir_41257/orcl_smon_7332_i41257.trc Starting background process QMNC Mon Jul 04 16:31:44 2022 QMNC started with pid=36, OS id=7454 LOGSTDBY: Validating controlfile with logical metadata LOGSTDBY: Validation complete Fatal internal error happened while SMON was doing active transaction recovery. Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7332.trc: ORA-00600: internal error code, arguments: [ktpridestroy2], [], [], [], [], [], [], [], [], [], [], [] SMON (ospid: 7332): terminating the instance due to error 474 Instance terminated by SMON, pid = 7332
对应trace文件
Dump continued from file: /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7332.trc ORA-00600: internal error code, arguments: [ktpridestroy2], [], [], [], [], [], [], [], [], [], [], [] ========= Dump for incident 41257 (ORA 600 [ktpridestroy2]) ======== *** 2022-07-04 16:31:44.261 dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0) ----- SQL Statement (None) ----- Current SQL information unavailable - no cursor. ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- skdstdst()+36 call kgdsdst() 000000000 ? 000000000 ? 7FFCD123B998 ? 000000001 ? 7FFCD123FE98 ? 000000000 ? ksedst1()+98 call skdstdst() 000000000 ? 000000000 ? 7FFCD123B998 ? 000000001 ? 000000000 ? 000000000 ? ksedst()+34 call ksedst1() 000000000 ? 000000001 ? 7FFCD123B998 ? 000000001 ? 000000000 ? 000000000 ? dbkedDefDump()+2736 call ksedst() 000000000 ? 000000001 ? 7FFCD123B998 ? 000000001 ? 000000000 ? 000000000 ? ksedmp()+36 call dbkedDefDump() 000000003 ? 000000002 ? 7FFCD123B998 ? 000000001 ? 000000000 ? 000000000 ? ksfdmp()+64 call ksedmp() 000000003 ? 000000002 ? 7FFCD123B998 ? 000000001 ? 000000000 ? 000000000 ? dbgexPhaseII()+1764 call ksfdmp() 000000003 ? 000000002 ? 7FFCD123B998 ? 000000001 ? 000000000 ? 000000000 ? dbgexProcessError() call dbgexPhaseII() 7F3C5D15C6F0 ? 7F3C5A851598 ? +2279 7FFCD1247C88 ? 000000001 ? 000000000 ? 000000000 ? dbgeExecuteForError call dbgexProcessError() 7F3C5D15C6F0 ? 7F3C5A851598 ? ()+83 000000001 ? 000000000 ? 7FFC00000000 ? 000000000 ? dbgePostErrorKGE()+ call dbgeExecuteForError 7F3C5D15C6F0 ? 7F3C5A851598 ? 1615 () 000000001 ? 000000001 ? 000000000 ? 000000000 ? dbkePostKGE_kgsf()+ call dbgePostErrorKGE() 000000000 ? 7F3C5A6C1228 ? 63 000000258 ? 7F3C5A851598 ? 000000000 ? 000000000 ? kgeadse()+383 call dbkePostKGE_kgsf() 00A984C60 ? 7F3C5A6C1228 ? 000000258 ? 7F3C5A851598 ? 000000000 ? 000000000 ? kgerinv_internal()+ call kgeadse() 00A984C60 ? 7F3C5A6C1228 ? 45 000000258 ? 000000000 ? 000000000 ? 000000000 ? kgerinv()+33 call kgerinv_internal() 00A984C60 ? 7F3C5A6C1228 ? D124022000000000 ? 000000258 ? 000000000 ? 000000000 ? kgeasnmierr()+143 call kgerinv() 00A984C60 ? 7F3C5A6C1228 ? D124022000000000 ? 000000000 ? 000000000 ? 000000000 ? ktpridestroy()+912 call kgeasnmierr() 00A984C60 ? 7F3C5A6C1228 ? D124022000000000 ? 000000000 ? 1E0F02D40 ? 1EC6DA410 ? ktprw1s()+527 call ktpridestroy() D124022000000000 ? 000000000 ? 1E7A1C2B0 ? 000000000 ? 1E0F02D40 ? 1EC6DA410 ? ktprsched()+197 call ktprw1s() D124022000000000 ? 000000000 ? 1E7A1C2B0 ? 000000000 ? 1E0F02D40 ? 1EC6DA410 ? kturRecoverUndoSegm call ktprsched() D124022000000000 ? ent()+1057 000000000 ? 1E7A1C2B0 ? 000000000 ? 1E0F02D40 ? 1EC6DA410 ? kturRecoverActiveTx call kturRecoverUndoSegm 000000000 ? 000000000 ? ns()+710 ent() 000000001 ? 000000000 ? 0D124FFFF ? 6200000005 ? ktprbeg()+2506 call kturRecoverActiveTx 000000004 ? 000000000 ? ns() 000000027 ? 000000000 ? 0D124FFFF ? 6200000005 ? ktmmon()+13588 call ktprbeg() 000000000 ? 000000000 ? 000000027 ? 000000000 ? 0D124FFFF ? 6200000005 ? ktmSmonMain()+201 call ktmmon() 06002DEC0 ? 000000000 ? 000000027 ? 000000000 ? 0D124FFFF ? 6200000005 ? ksbrdp()+923 call ktmSmonMain() 06002DEC0 ? 000000000 ? 000000000 ? 000000000 ? 0D124FFFF ? 6200000005 ? opirip()+618 call ksbrdp() 06002DEC0 ? 000000000 ? 000000000 ? 000000000 ? 0D124FFFF ? 6200000005 ? opidrv()+598 call opirip() 000000032 ? 000000004 ? 7FFCD124B658 ? 000000000 ? 0D124FFFF ? 6200000005 ? sou2o()+98 call opidrv() 000000032 ? 000000004 ? 7FFCD124B658 ? 000000000 ? 0D124FFFF ? 6200000005 ? opimai_real()+261 call sou2o() 7FFCD124B630 ? 000000032 ? 000000004 ? 7FFCD124B658 ? 0D124FFFF ? 6200000005 ? ssthrdmain()+209 call opimai_real() 000000000 ? 7FFCD124B820 ? 000000004 ? 7FFCD124B658 ? 0D124FFFF ? 6200000005 ? main()+196 call ssthrdmain() 000000003 ? 7FFCD124B820 ? 000000001 ? 000000000 ? 0D124FFFF ? 6200000005 ? __libc_start_main() call main() 000000003 ? 7FFCD124B9C0 ? +245 000000001 ? 000000000 ? 0D124FFFF ? 6200000005 ? _start()+36 call __libc_start_main() 0009C12F0 ? 000000001 ? 7FFCD124B9B8 ? 000000000 ? 0D124FFFF ? 6200000005 ? --------------------- Binary Stack Dump ---------------------
通过分析确认该错误和并行恢复有关系,绕过该错误之后,再次尝试启动库报错为ORA-600 4137
Mon Jul 04 16:33:41 2022 SMON: enabling cache recovery Verifying file header compatibility for 11g tablespace encryption.. Verifying 11g file header compatibility for tablespace encryption completed SMON: enabling tx recovery Database Characterset is AL32UTF8 Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7554.trc (incident=42457): ORA-00600: internal error code, arguments: [4137], [6.11.21484016], [0], [0], [], [], [], [], [], [], [], [] Incident details in: /data/oracle/diag/rdbms/orcl/orcl/incident/incdir_42457/orcl_smon_7554_i42457.trc Stopping background process MMNL Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7554.trc: ORA-00339: archived log does not contain any redo ORA-00334: archived log: '/data/oracle/oradata/orcl/redo03.log' ORA-00600: internal error code, arguments: [4137], [6.11.21484016], [0], [0], [], [], [], [], [], [], [], [] Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7554.trc: ORA-00339: archived log does not contain any redo ORA-00334: archived log: '/data/oracle/oradata/orcl/redo03.log' ORA-00600: internal error code, arguments: [4137], [6.11.21484016], [0], [0], [], [], [], [], [], [], [], [] ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering transaction (6, 11). Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7554.trc: ORA-00600: internal error code, arguments: [4137], [6.11.21484016], [0], [0], [], [], [], [], [], [], [], []
该错误比较常见,一般是由于undo中有异常事务,对异常事务进行处理,数据库open成功,并顺利导入数据到新库中,完成本次数据恢复