某客户数据库放在x云上面,需要对数据库盘进行扩容,在扩容之前对该盘做了快照,结果没有想到悲剧发生了
[root@xifenfei ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 99G 64G 31G 68% /
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 720K 16G 1% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/vdb 2.0T 1.2T 910G 56% /www/xifenfei
tmpfs 3.2G 0 3.2G 0% /run/user/1004
tmpfs 3.2G 0 3.2G 0% /run/user/0
如上显示,客户的数据文件都放在/dev/vdb中了,但是很不幸,redo文件放在/data中(也就是vda磁盘组中),没有被做快照,结果客户还原vdb快照之后,发现现象如下
SQL> set pages 10000
SQL> set numw 16
SQL> SELECT status,
2 checkpoint_change#,
3 checkpoint_time,last_change#,
4 count(*) ROW_NUM
5 FROM v$datafile
6 GROUP BY status, checkpoint_change#, checkpoint_time,last_change#
7 ORDER BY status, checkpoint_change#, checkpoint_time;
STATUS CHECKPOINT_CHANGE# CHECKPOINT_T LAST_CHANGE# ROW_NUM
-------------- ------------------ ------------ ---------------- ----------------
ONLINE 69632585947 04-JUL-22 38
SYSTEM 69632585947 04-JUL-22 2
SQL> set numw 16
SQL> col CHECKPOINT_TIME for a40
SQL> set lines 150
SQL> set pages 1000
SQL> SELECT status,
2 to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss') checkpoint_time,FUZZY,checkpoint_change#,
3 count(*) ROW_NUM
4 FROM v$datafile_header
5 GROUP BY status, checkpoint_change#, to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss'),fuzzy
6 ORDER BY status, checkpoint_change#, checkpoint_time;
STATUS CHECKPOINT_TIME FUZZY CHECKPOINT_CHANGE# ROW_NUM
-------------- ---------------------------------------- ------ ------------------ ----------------
ONLINE 2022-07-04 09:03:24 YES 69631105424 40
通过上述分析,该库相当数据文件和redo文件之间相差了一段时间数据,而且该库为非归档,基于这种情况,该库只能强制打开,在打开过程中遇到ORA-600 ktpridestroy2错误
SMON: enabling tx recovery
Database Characterset is AL32UTF8
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
SMON: Restarting fast_start parallel rollback
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7332.trc (incident=41257):
ORA-00600: internal error code, arguments: [ktpridestroy2], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /data/oracle/diag/rdbms/orcl/orcl/incident/incdir_41257/orcl_smon_7332_i41257.trc
Starting background process QMNC
Mon Jul 04 16:31:44 2022
QMNC started with pid=36, OS id=7454
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Fatal internal error happened while SMON was doing active transaction recovery.
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7332.trc:
ORA-00600: internal error code, arguments: [ktpridestroy2], [], [], [], [], [], [], [], [], [], [], []
SMON (ospid: 7332): terminating the instance due to error 474
Instance terminated by SMON, pid = 7332
对应trace文件
Dump continued from file: /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7332.trc
ORA-00600: internal error code, arguments: [ktpridestroy2], [], [], [], [], [], [], [], [], [], [], []
========= Dump for incident 41257 (ORA 600 [ktpridestroy2]) ========
*** 2022-07-04 16:31:44.261
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- SQL Statement (None) -----
Current SQL information unavailable - no cursor.
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
skdstdst()+36 call kgdsdst() 000000000 ? 000000000 ?
7FFCD123B998 ? 000000001 ?
7FFCD123FE98 ? 000000000 ?
ksedst1()+98 call skdstdst() 000000000 ? 000000000 ?
7FFCD123B998 ? 000000001 ?
000000000 ? 000000000 ?
ksedst()+34 call ksedst1() 000000000 ? 000000001 ?
7FFCD123B998 ? 000000001 ?
000000000 ? 000000000 ?
dbkedDefDump()+2736 call ksedst() 000000000 ? 000000001 ?
7FFCD123B998 ? 000000001 ?
000000000 ? 000000000 ?
ksedmp()+36 call dbkedDefDump() 000000003 ? 000000002 ?
7FFCD123B998 ? 000000001 ?
000000000 ? 000000000 ?
ksfdmp()+64 call ksedmp() 000000003 ? 000000002 ?
7FFCD123B998 ? 000000001 ?
000000000 ? 000000000 ?
dbgexPhaseII()+1764 call ksfdmp() 000000003 ? 000000002 ?
7FFCD123B998 ? 000000001 ?
000000000 ? 000000000 ?
dbgexProcessError() call dbgexPhaseII() 7F3C5D15C6F0 ? 7F3C5A851598 ?
+2279 7FFCD1247C88 ? 000000001 ?
000000000 ? 000000000 ?
dbgeExecuteForError call dbgexProcessError() 7F3C5D15C6F0 ? 7F3C5A851598 ?
()+83 000000001 ? 000000000 ?
7FFC00000000 ? 000000000 ?
dbgePostErrorKGE()+ call dbgeExecuteForError 7F3C5D15C6F0 ? 7F3C5A851598 ?
1615 () 000000001 ? 000000001 ?
000000000 ? 000000000 ?
dbkePostKGE_kgsf()+ call dbgePostErrorKGE() 000000000 ? 7F3C5A6C1228 ?
63 000000258 ? 7F3C5A851598 ?
000000000 ? 000000000 ?
kgeadse()+383 call dbkePostKGE_kgsf() 00A984C60 ? 7F3C5A6C1228 ?
000000258 ? 7F3C5A851598 ?
000000000 ? 000000000 ?
kgerinv_internal()+ call kgeadse() 00A984C60 ? 7F3C5A6C1228 ?
45 000000258 ? 000000000 ?
000000000 ? 000000000 ?
kgerinv()+33 call kgerinv_internal() 00A984C60 ? 7F3C5A6C1228 ?
D124022000000000 ?
000000258 ? 000000000 ?
000000000 ?
kgeasnmierr()+143 call kgerinv() 00A984C60 ? 7F3C5A6C1228 ?
D124022000000000 ?
000000000 ? 000000000 ?
000000000 ?
ktpridestroy()+912 call kgeasnmierr() 00A984C60 ? 7F3C5A6C1228 ?
D124022000000000 ?
000000000 ? 1E0F02D40 ?
1EC6DA410 ?
ktprw1s()+527 call ktpridestroy() D124022000000000 ?
000000000 ? 1E7A1C2B0 ?
000000000 ? 1E0F02D40 ?
1EC6DA410 ?
ktprsched()+197 call ktprw1s() D124022000000000 ?
000000000 ? 1E7A1C2B0 ?
000000000 ? 1E0F02D40 ?
1EC6DA410 ?
kturRecoverUndoSegm call ktprsched() D124022000000000 ?
ent()+1057 000000000 ? 1E7A1C2B0 ?
000000000 ? 1E0F02D40 ?
1EC6DA410 ?
kturRecoverActiveTx call kturRecoverUndoSegm 000000000 ? 000000000 ?
ns()+710 ent() 000000001 ? 000000000 ?
0D124FFFF ? 6200000005 ?
ktprbeg()+2506 call kturRecoverActiveTx 000000004 ? 000000000 ?
ns() 000000027 ? 000000000 ?
0D124FFFF ? 6200000005 ?
ktmmon()+13588 call ktprbeg() 000000000 ? 000000000 ?
000000027 ? 000000000 ?
0D124FFFF ? 6200000005 ?
ktmSmonMain()+201 call ktmmon() 06002DEC0 ? 000000000 ?
000000027 ? 000000000 ?
0D124FFFF ? 6200000005 ?
ksbrdp()+923 call ktmSmonMain() 06002DEC0 ? 000000000 ?
000000000 ? 000000000 ?
0D124FFFF ? 6200000005 ?
opirip()+618 call ksbrdp() 06002DEC0 ? 000000000 ?
000000000 ? 000000000 ?
0D124FFFF ? 6200000005 ?
opidrv()+598 call opirip() 000000032 ? 000000004 ?
7FFCD124B658 ? 000000000 ?
0D124FFFF ? 6200000005 ?
sou2o()+98 call opidrv() 000000032 ? 000000004 ?
7FFCD124B658 ? 000000000 ?
0D124FFFF ? 6200000005 ?
opimai_real()+261 call sou2o() 7FFCD124B630 ? 000000032 ?
000000004 ? 7FFCD124B658 ?
0D124FFFF ? 6200000005 ?
ssthrdmain()+209 call opimai_real() 000000000 ? 7FFCD124B820 ?
000000004 ? 7FFCD124B658 ?
0D124FFFF ? 6200000005 ?
main()+196 call ssthrdmain() 000000003 ? 7FFCD124B820 ?
000000001 ? 000000000 ?
0D124FFFF ? 6200000005 ?
__libc_start_main() call main() 000000003 ? 7FFCD124B9C0 ?
+245 000000001 ? 000000000 ?
0D124FFFF ? 6200000005 ?
_start()+36 call __libc_start_main() 0009C12F0 ? 000000001 ?
7FFCD124B9B8 ? 000000000 ?
0D124FFFF ? 6200000005 ?
--------------------- Binary Stack Dump ---------------------
通过分析确认该错误和并行恢复有关系,绕过该错误之后,再次尝试启动库报错为ORA-600 4137
Mon Jul 04 16:33:41 2022
SMON: enabling cache recovery
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is AL32UTF8
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7554.trc (incident=42457):
ORA-00600: internal error code, arguments: [4137], [6.11.21484016], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /data/oracle/diag/rdbms/orcl/orcl/incident/incdir_42457/orcl_smon_7554_i42457.trc
Stopping background process MMNL
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7554.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/data/oracle/oradata/orcl/redo03.log'
ORA-00600: internal error code, arguments: [4137], [6.11.21484016], [0], [0], [], [], [], [], [], [], [], []
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7554.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/data/oracle/oradata/orcl/redo03.log'
ORA-00600: internal error code, arguments: [4137], [6.11.21484016], [0], [0], [], [], [], [], [], [], [], []
ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering transaction (6, 11).
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7554.trc:
ORA-00600: internal error code, arguments: [4137], [6.11.21484016], [0], [0], [], [], [], [], [], [], [], []
该错误比较常见,一般是由于undo中有异常事务,对异常事务进行处理,数据库open成功,并顺利导入数据到新库中,完成本次数据恢复