InnoDB: Cannot open table db/tab from the internal data dictionary of InnoDB though the .frm file for the table exists

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:InnoDB: Cannot open table db/tab from the internal data dictionary of InnoDB though the .frm file for the table exists

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户找到我说mysql无法正常使用报错如下(库名.表名 doesn’t exist):
mysql-doesnot-exist


客户确认表的frm和ibd文件均存在.通过检查mysql日志,发现大量类似异常
mysql-1

而且情况一样ibd和frm文件均存在,系统日志中提示:
2022-07-08T08:37:57.935514Z 1423 [Warning] InnoDB: Cannot open table 库名/表名 from the internal data dictionary of InnoDB though the .frm file for the table exists. Please refer to http://dev.mysql.com/doc/refman/5.7/en/innodb-troubleshooting.html for how to resolve the issue.
根据对mysql的认知,出现此类问题,很可能是mysql的ibdata文件出了问题,对日志进行分析,发现类似记录

2022-07-08T04:11:27.413455Z 0 [Note] /www/server/mysql/bin/mysqld (mysqld 5.7.34-log) starting as process 2144 ...
2022-07-08T04:11:27.495536Z 0 [Note] InnoDB: PUNCH HOLE support available
2022-07-08T04:11:27.495559Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2022-07-08T04:11:27.495562Z 0 [Note] InnoDB: Uses event mutexes
2022-07-08T04:11:27.495565Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2022-07-08T04:11:27.495568Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2022-07-08T04:11:27.495571Z 0 [Note] InnoDB: Using Linux native AIO
2022-07-08T04:11:27.496130Z 0 [Note] InnoDB: Number of pools: 1
2022-07-08T04:11:27.496227Z 0 [Note] InnoDB: Using CPU crc32 instructions
2022-07-08T04:11:27.510618Z 0 [Note] InnoDB: Initializing buffer pool, total size=256M,instances=1,chunk size=128M
2022-07-08T04:11:27.520144Z 0 [Note] InnoDB: Completed initialization of buffer pool
2022-07-08T04:11:27.522095Z 0 [Note] InnoDB: If the mysqld execution user is authorized, 
        page cleaner thread priority can be changed.  See the man page of setpriority().
2022-07-08T04:11:27.532135Z 0 [Note] InnoDB: The first innodb_system data file 'ibdata1' did not exist. 
       A new tablespace will be created!
2022-07-08T04:11:27.532259Z 0 [Note] InnoDB: Setting file '/www/server/data/ibdata1' size to 10 MB. 
        Physically writing the file full; Please wait ...
2022-07-08T04:11:27.760116Z 0 [Note] InnoDB: File '/www/server/data/ibdata1' size is now 10 MB.
2022-07-08T04:11:27.760338Z 0 [Note] InnoDB: Setting log file /www/server/data/ib_logfile101 size to 128MB
2022-07-08T04:11:27.760414Z 0 [Note] InnoDB: Progress in MB:
 100
2022-07-08T04:11:28.940355Z 0 [Note] InnoDB: Setting log file /www/server/data/ib_logfile1 size to 128 MB
2022-07-08T04:11:28.940442Z 0 [Note] InnoDB: Progress in MB:
 100
2022-07-08T04:11:30.517357Z 0 [Note] InnoDB: Renaming log file /www/server/data/ib_logfile101 
      to /www/server/data/ib_logfile0
2022-07-08T04:11:30.517394Z 0 [Warning] InnoDB: New log files created, LSN=45790
2022-07-08T04:11:30.517401Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2022-07-08T04:11:30.517425Z 0 [Note] InnoDB: Setting file '/www/server/data/ibtmp1' size to 12 MB. 
       Physically writing the file full; Please wait ...
2022-07-08T04:11:30.609146Z 0 [Note] InnoDB: File '/www/server/data/ibtmp1' size is now 12 MB.
2022-07-08T04:11:30.609236Z 0 [Note] InnoDB: Doublewrite buffer not found: creating new
2022-07-08T04:11:30.631133Z 0 [Note] InnoDB: Doublewrite buffer created
2022-07-08T04:11:31.160847Z 0 [Note] InnoDB: 96 redo rollback segment(s) found.96 redo rollback segment(s) are active.
2022-07-08T04:11:31.160860Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
2022-07-08T04:11:31.160970Z 0 [Warning] InnoDB: Creating foreign key constraint system tables.
2022-07-08T04:11:31.194147Z 0 [Note] InnoDB: Foreign key constraint system tables created
2022-07-08T04:11:31.194175Z 0 [Note] InnoDB: Creating tablespace and datafile system tables.
2022-07-08T04:11:31.195079Z 0 [Note] InnoDB: Tablespace and datafile system tables created.
2022-07-08T04:11:31.195098Z 0 [Note] InnoDB: Creating sys_virtual system tables.
2022-07-08T04:11:31.195974Z 0 [Note] InnoDB: sys_virtual table created
2022-07-08T04:11:31.196099Z 0 [Note] InnoDB: Waiting for purge to start
2022-07-08T04:11:31.246167Z 0 [Note] InnoDB: 5.7.34 started; log sequence number 0
2022-07-08T04:11:31.246379Z 0 [Note] Plugin 'FEDERATED' is disabled.
2022-07-08T04:11:31.248996Z 0 [Warning] InnoDB: Cannot open table mysql/plugin from the internal data dictionary 
        of InnoDB  though the .frm file for the table exists. 
Please refer to http://dev.mysql.com/doc/refman/5.7/en/innodb-troubleshooting.html for how to resolve the issue.
mysqld: Table 'mysql.plugin' doesn't exist

通过上述日志可以确认,数据库应该被重新初始化了,导致以前库的ibd和frm文件无法被正常访问.对于此类情况,可以参考以前类似恢复案例:frm和ibd文件数据库恢复

存储重启,oracle无法启动故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:存储重启,oracle无法启动故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户由于机房要停电,正常关闭两个节点数据库,通过数据库alert日志均可看到类似如下记录,证明数据库确实是正常shutdown immediate
1
2
然后关闭存储,启动存储之后发现数据库无法正常启动(数据scn不一致).相关信息如下:
20220707175525


最初报ORA-214错

that ORACLE_BASE be set in the environment
Wed Jul 06 00:50:02 2022
ALTER SYSTEM SET local_listener=' (ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.10)(PORT=1521))' SCOPE=MEMORY SID='xffdb2';
ALTER DATABASE MOUNT /* db agent *//* {1:42392:203} */
This instance was first to mount
NOTE: Loaded library: System 
SUCCESS: diskgroup DATA1 was mounted
SUCCESS: diskgroup DATA2 was mounted
ORA-214 signalled during: ALTER DATABASE MOUNT /* db agent *//* {1:42392:203} */...
NOTE: dependency between database xffdb and diskgroup resource ora.DATA1.dg is established
NOTE: dependency between database xffdb and diskgroup resource ora.DATA2.dg is established

提示ctl不存在,通过处理之后报ORA-600 2131错误

Wed Jul 06 01:55:45 2022
ALTER SYSTEM SET local_listener=' (ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.10)(PORT=1521))' SCOPE=MEMORY SID='xffdb2';
ALTER DATABASE MOUNT /* db agent *//* {1:42392:663} */
This instance was first to mount
NOTE: Loaded library: System 
SUCCESS: diskgroup DATA1 was mounted
SUCCESS: diskgroup DATA2 was mounted
NOTE: dependency between database xffdb and diskgroup resource ora.DATA1.dg is established
NOTE: dependency between database xffdb and diskgroup resource ora.DATA2.dg is established
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb2/trace/xffdb2_ora_47746.trc  (incident=576488):
ORA-00600: internal error code, arguments: [2131], [33], [32], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/xffdb/xffdb2/incident/incdir_576488/xffdb2_ora_47746_i576488.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORA-600 signalled during: ALTER DATABASE MOUNT /* db agent *//* {1:42392:663} */...

重建控制文件后恢复报错

Parallel Media Recovery started with 127 slaves
ORA-279 signalled during: ALTER DATABASE RECOVER  database using BACKUP CONTROLFILE  ...
Wed Jul 06 02:41:04 2022
ALTER DATABASE RECOVER    LOGFILE '+DATA3/xffdb/archivelog/2022_07_05/thread_2_seq_40889.18030.1109269215'  
Media Recovery Log +DATA3/xffdb/archivelog/2022_07_05/thread_2_seq_40889.18030.1109269215
Wed Jul 06 02:41:04 2022
Errors with log +DATA3/xffdb/archivelog/2022_07_05/thread_2_seq_40889.18030.1109269215
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_pr00_96503.trc:
ORA-00325: archived log for thread 1, wrong thread # 2 in header
ORA-00334: archived log: '+DATA3/xffdb/archivelog/2022_07_05/thread_2_seq_40889.18030.1109269215'
ORA-325 signalled during: ALTER DATABASE RECOVER    LOGFILE '+DATA3/thread_2_seq_40889.18030.1109269215'  ...
ALTER DATABASE RECOVER CANCEL 
Media Recovery Canceled
Completed: ALTER DATABASE RECOVER CANCEL 
…………
Wed Jul 06 02:22:25 2022
ALTER DATABASE RECOVER  DATABASE  
Media Recovery Start
 started logmerger process
Only allocated 127 recovery slaves (requested 128)
Parallel Media Recovery started with 127 slaves
Wed Jul 06 02:22:28 2022
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_pr00_77044.trc:
ORA-00313: open failed for members of log group 7 of thread 1
Media Recovery failed with error 313
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_pr00_77044.trc:
ORA-00283: recovery session canceled due to errors
ORA-00313: open failed for members of log group 7 of thread 1
Wed Jul 06 02:22:28 2022
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_m000_77318.trc:
ORA-00322: log 4 of thread 2 is not current copy
ORA-00312: online log 4 thread 2: '+DATA3/xffdb/onlinelog/group_4.16148.1107795635'
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_m000_77318.trc:
ORA-00322: log 7 of thread 1 is not current copy
ORA-00312: online log 7 thread 1: '+DATA3/xffdb/onlinelog/group_7.18959.1107796013'
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_m000_77318.trc:
ORA-00314: log 9 of thread 1, expected sequence# 133495 doesn't match 133490
ORA-00312: online log 9 thread 1: '+DATA3/xffdb/onlinelog/group_9.3142.1107796071'
Checker run found 208 new persistent data failures
ORA-10877 signalled during: ALTER DATABASE RECOVER  DATABASE  ...
…………
Only allocated 127 recovery slaves (requested 128)
Parallel Media Recovery started with 127 slaves
ORA-279 signalled during: ALTER DATABASE RECOVER  database using backup controlfile  ...
Wed Jul 06 06:15:26 2022
ALTER DATABASE RECOVER    LOGFILE '+DATA3/xffdb/onlinelog/group_4.16442.1107795653'  
Media Recovery Log +DATA3/xffdb/onlinelog/group_4.16442.1107795653
ORA-279 signalled during: ALTER DATABASE RECOVER    LOGFILE '+DATA3/xffdb/onlinelog/group_4.16442.1107795653'  ...
Wed Jul 06 06:15:43 2022
ALTER DATABASE RECOVER    LOGFILE '+DATA3/xffdb/onlinelog/group_7.18959.1107796013'  
Media Recovery Log +DATA3/xffdb/onlinelog/group_7.18959.1107796013
Wed Jul 06 06:15:50 2022
Errors with log +DATA3/xffdb/onlinelog/group_7.18959.1107796013
Wed Jul 06 06:15:50 2022
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_pr29_306479.trc  (incident=961030):
ORA-00600: internal error code, arguments: [6102], [13], [17], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Jul 06 06:15:54 2022
Sweep [inc][961030]: completed
Sweep [inc2][961030]: completed
Slave exiting with ORA-10562 exception
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_pr29_306479.trc:
ORA-10562: Error occurred while applying redo to data block (file# 159, block# 3591756)
ORA-10564: tablespace LIS
ORA-01110: data file 159: '+DATA1/xffdb/datafile/lis.379.1080445903'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 138875
ORA-00600: internal error code, arguments: [6102], [13], [17], [], [], [], [], [], [], [], [], []
Wed Jul 06 06:15:59 2022
Recovery Slave PR29 previously exited with exception 10562

基于上述情况,很可能是由于存储重启之后,cache或者某些数据没有写入到数据文件和redo中,数据库重启之后redo不是最新的[ORA-00322错误可以证明,],数据文件也需要进行恢复(不是数据库正常关闭之后该有的情况),而且redo和数据文件还不一致[ORA-00600 6102可以证明],对于类似这样的情况,只能尝试强制打开数据库,报ORA-600 2663

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2663], [1393], [4159455578],
[1393], [4160374753], [], [], [], [], [], [], []
Process ID: 357910
Session ID: 1585 Serial number: 7
Wed Jul 06 06:57:25 2022
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_ora_357910.trc  (incident=1056360):
ORA-00600: internal error code, arguments: [2663], [1393],[4159455578],[1393],[4160374753],[], [], [], []
Redo thread 2 internally disabled at seq 1 (CKPT)
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_arc3_360348.trc:
ORA-00600: internal error code, arguments: [ORA_NPI_ERROR],[600], 
  [ORA-00600: internal error code, arguments: [kffbAddBlk04]
Unable to create archive log file '+DATA3'
ARC3: Error 19504 Creating archive log file to '+DATA3'
ARCH: Archival error occurred on a closed thread. Archiver continuing
ORACLE Instance xffdb1 - Archival Error. Archiver continuing.
ARCH: Archival error occurred on a closed thread. Archiver continuing
ORACLE Instance xffdb1 - Archival Error. Archiver continuing.
Wed Jul 06 06:57:34 2022
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/incident/incdir_1056360/xffdb1_ora_357910_i1056360.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '+DATA1/xffdb/onlinelog/group_4.424.1109314453'
ORA-00600: internal error code, arguments: [2663], [1393], [4159455578], [1393], [4160374753], [], [],
Wed Jul 06 06:57:34 2022
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_ora_357910.trc:
ORA-00600: internal error code, arguments: [2663], [1393], [4159455578], [1393], [4160374753], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_ora_357910.trc:
ORA-00600: internal error code, arguments: [2663], [1393], [4159455578], [1393], [4160374753], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 357910): terminating the instance due to error 600
Instance terminated by USER, pid = 357910
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (357910) as a result of ORA-1092
Wed Jul 06 06:57:35 2022
ORA-1092 : opitsk aborting process

该错误比较常见,参考:ORA-600 2663,也可以利用我的Patch_SCN小工具快速解决,后续数据库报ORA-03113错

SQL> alter database open ;
alter database open 
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 369324
Session ID: 1585 Serial number: 1

查看alert日志,确认具体报错为kgegpa

Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Jul 06 07:17:08 2022
SMON: enabling cache recovery
ARC1: Archiving disabled thread 2 sequence 1
Archived Log entry 1 added for thread 1 sequence 1 ID 0x36317f52 dest 1:
Archived Log entry 2 added for thread 1 sequence 2 ID 0x36317f52 dest 1:
Archived Log entry 3 added for thread 2 sequence 1 ID 0x0 dest 1:
Exception [type:SIGSEGV, Address not mapped to object][ADDR:0x4D562123][PC:0x983CDD6,kgegpa()+40][flags: 0x0,count:1]
Exception [type:SIGSEGV, Address not mapped to object][ADDR:0x4D562123][PC:0x983B84A, kgebse()+776][flags: 0x2,count:2]
Exception [type:SIGSEGV, Address not mapped to object][ADDR:0x4D562123][PC:0x983B84A, kgebse()+776][flags: 0x2,count:2]
Wed Jul 06 07:17:11 2022
PMON (ospid: 377647): terminating the instance due to error 397

该问题有过类似的案例通过处理数据库open成功:
在数据库恢复遭遇ORA-07445 kgegpa错误
Exception [type: SIGSEGV, Address not mapped to object] [] [ kgegpa()+36]

误删除分区oracle数据库恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:误删除分区oracle数据库恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

很多时候祸福相依,又一个用户发生类似的事情,数据库故障
ora-600-3020


他们公司内部折腾,然后把数据库open成功,并且也顺利导出来dmp在d盘.然后重新安装系统,结果悲剧发生了,他们在操作过程中给c盘扩容,把d盘删除了,然后以前d盘的部分空间分配给c盘了,但是d盘数据全部消失(以前的数据库文件,最新备份出来的dmp文件).用户给我反馈给一系列操作之后,提醒客户尽可能不要对该磁盘进行任何操作(已经分区200G的c盘和800G未分区的空间),然后通过恢复工具进行分析
20220706202513

运气不错,相关的文件没有被覆盖,并且顺利恢复出来
dmp
20220706215448
20220706215611

运气不错,顺利完成相关恢复,将误操作数据恢复恢复来,再次提醒各位,操作谨慎,切莫因为一时疏忽酿成打错.

ORA-00333 ORA-01595 恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-00333 ORA-01595 恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户反馈数据库异常,查看日志发现asm和db均发生hang住情况(由于环境原因部分日志没有拷贝出来),基于现有情况,无法直接恢复,通过一些工具把asm磁盘组中的数据文件拷贝到文件系统,经过检测无坏块
20220705233329


修改相关路径,尝试recover库

Tue Jul 05 15:05:54 2022
ALTER DATABASE RECOVER  datafile 1  
Media Recovery Start
Serial Media Recovery started
Recovery of Online Redo Log: Thread 2 Group 4 Seq 29973 Reading mem 0
  Mem# 0: E:\ORADATA\GROUP_4.266.822672441
Recovery of Online Redo Log: Thread 1 Group 2 Seq 38422 Reading mem 0
  Mem# 0: E:\ORADATA\GROUP_2.262.822672137
Incomplete read from log member 'E:\ORADATA\GROUP_2.262.822672137'. Trying next member.
Media Recovery failed with error 333
ORA-283 signalled during: ALTER DATABASE RECOVER  datafile 1  ...

ORA-00333


错误信息比较明显,在读入redo进行恢复的时候遭遇“ORA-00333: 重做日志读取块 11557 计数 731 出错”错误,从而无法继续恢复.这次故障运气比较好,通过分析v$datafile和v$datafile_header关系
20220705233008

进行一些操作,绕过redo block 11557,顺利recover成功,并且open库

ALTER DATABASE RECOVER  database  
Media Recovery Start
 started logmerger process
Tue Jul 05 15:17:46 2022
Parallel Media Recovery started with 32 slaves
Tue Jul 05 15:17:46 2022
Recovery of Online Redo Log: Thread 2 Group 4 Seq 29973 Reading mem 0
  Mem# 0: E:\ORADATA\GROUP_4.266.822672441
Recovery of Online Redo Log: Thread 1 Group 2 Seq 38422 Reading mem 0
  Mem# 0: E:\ORADATA\GROUP_2.262.822672137
Completed: ALTER DATABASE RECOVER  database  

20220705232246


通过分析alert日志发现有ORA-600 4194错误

QMNC started with pid=58, OS id=15980 
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Tue Jul 05 15:18:24 2022
Tue Jul 05 15:18:24 2022
Block recovery from logseq 38423, block 152 to scn 16218380250500
Recovery of Online Redo Log: Thread 1 Group 1 Seq 38423 Reading mem 0
  Mem# 0: E:\ORADATA\GROUP_1.261.822672135
Block recovery stopped at EOT rba 38423.154.16
Block recovery completed at rba 38423.154.16, scn 3776.583740804
Block recovery from logseq 38423, block 152 to scn 16218380250497
Recovery of Online Redo Log: Thread 1 Group 1 Seq 38423 Reading mem 0
  Mem# 0: E:\ORADATA\GROUP_1.261.822672135
Block recovery completed at rba 38423.154.16, scn 3776.583740804
Errors in file F:\APP\ADMINISTRATOR\diag\rdbms\xff\xff1\trace\xff1_smon_5660.trc:
ORA-01595: 释放区 (2) 回退段 (8) 时出错
ORA-00600: 内部错误代码, 参数: [4194], [], [
                                      
Completed: alter database open 

这比较简单,对于异常的undo进行处理即可,然后使用hcheck检查字典一致性

SQL> @e:/oradata/txt/11.txt
HCheck Version 07MAY18 on 05-7月 -2022 16:30:18
----------------------------------------------
Catalog Version 11.2.0.3.0 (1102000300)
db_name: xff

                                   Catalog       Fixed
Procedure Name                     Version    Vs Release    Timestamp
Result
------------------------------ ... ---------- -- ---------- --------------
------
.- LobNotInObj                 ... 1102000300 <=  *All Rel* 07/05 16:30:18 PASS
.- MissingOIDOnObjCol          ... 1102000300 <=  *All Rel* 07/05 16:30:19 PASS
.- SourceNotInObj              ... 1102000300 <=  *All Rel* 07/05 16:30:19 PASS
.- OversizedFiles              ... 1102000300 <=  *All Rel* 07/05 16:30:19 PASS
.- PoorDefaultStorage          ... 1102000300 <=  *All Rel* 07/05 16:30:19 PASS
.- PoorStorage                 ... 1102000300 <=  *All Rel* 07/05 16:30:19 PASS
.- TabPartCountMismatch        ... 1102000300 <=  *All Rel* 07/05 16:30:20 PASS
.- OrphanedTabComPart          ... 1102000300 <=  *All Rel* 07/05 16:30:20 PASS
.- MissingSum$                 ... 1102000300 <=  *All Rel* 07/05 16:30:20 PASS
.- MissingDir$                 ... 1102000300 <=  *All Rel* 07/05 16:30:20 PASS
.- DuplicateDataobj            ... 1102000300 <=  *All Rel* 07/05 16:30:20 PASS
.- ObjSynMissing               ... 1102000300 <=  *All Rel* 07/05 16:30:20 PASS
.- ObjSeqMissing               ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedUndo                ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedIndex               ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedIndexPartition      ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedIndexSubPartition   ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedTable               ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedTablePartition      ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedTableSubPartition   ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- MissingPartCol              ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedSeg$                ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedIndPartObj#         ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- DuplicateBlockUse           ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- FetUet                      ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- Uet0Check                   ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- SeglessUET                  ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- BadInd$                     ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- BadTab$                     ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- BadIcolDepCnt               ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- ObjIndDobj                  ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- TrgAfterUpgrade             ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- ObjType0                    ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- BadOwner                    ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- StmtAuditOnCommit           ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- BadPublicObjects            ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- BadSegFreelist              ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- BadDepends                  ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- CheckDual                   ... 1102000300 <=  *All Rel* 07/05 16:30:23 PASS
.- ObjectNames                 ... 1102000300 <=  *All Rel* 07/05 16:30:23 WARN

HCKW-0018: OBJECT name clashes with SCHEMA name (Doc ID 2363142.1)
Schema=BSHRP INDEX=XFF.XFF

.- BadCboHiLo                  ... 1102000300 <=  *All Rel* 07/05 16:30:23 PASS
.- ChkIotTs                    ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- NoSegmentIndex              ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- BadNextObject               ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- DroppedROTS                 ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- FilBlkZero                  ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- DbmsSchemaCopy              ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- OrphanedObjError            ... 1102000300 >  1102000000 07/05 16:30:24 PASS
.- ObjNotLob                   ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- MaxControlfSeq              ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- SegNotInDeferredStg         ... 1102000300 >  1102000000 07/05 16:30:25 PASS
.- SystemNotRfile1             ... 1102000300 >   902000000 07/05 16:30:25 PASS
.- DictOwnNonDefaultSYSTEM     ... 1102000300 <=  *All Rel* 07/05 16:30:25 PASS
.- OrphanTrigger               ... 1102000300 <=  *All Rel* 07/05 16:30:25 PASS
.- ObjNotTrigger               ... 1102000300 <=  *All Rel* 07/05 16:30:25 PASS
---------------------------------------
05-7月 -2022 16:30:25  Elapsed: 7 secs
---------------------------------------
Found 0 potential problem(s) and 1 warning(s)
Contact Oracle Support with the output and trace file
to check if the above needs attention or not

PL/SQL 过程已成功完成。

有一个SCHEMA和对象名一样,这个不影响属于正常情况(客户创建了一个用户叫做XFF,然后有创建了一个XFF的对象),该数据库恢复至此基本上晚上,业务可以直接运行,不用做逻辑迁移

云主机快照之后Oracle无法正常启动处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:云主机快照之后Oracle无法正常启动处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

某客户数据库放在x云上面,需要对数据库盘进行扩容,在扩容之前对该盘做了快照,结果没有想到悲剧发生了

[root@xifenfei ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        99G   64G   31G  68% /
devtmpfs         16G     0   16G   0% /dev
tmpfs            16G     0   16G   0% /dev/shm
tmpfs            16G  720K   16G   1% /run
tmpfs            16G     0   16G   0% /sys/fs/cgroup
/dev/vdb        2.0T  1.2T  910G  56% /www/xifenfei
tmpfs           3.2G     0  3.2G   0% /run/user/1004
tmpfs           3.2G     0  3.2G   0% /run/user/0

如上显示,客户的数据文件都放在/dev/vdb中了,但是很不幸,redo文件放在/data中(也就是vda磁盘组中),没有被做快照,结果客户还原vdb快照之后,发现现象如下

SQL> set pages 10000
SQL> set numw 16
SQL> SELECT status,
  2  checkpoint_change#,
  3  checkpoint_time,last_change#,
  4  count(*) ROW_NUM
  5  FROM v$datafile
  6  GROUP BY status, checkpoint_change#, checkpoint_time,last_change#
  7  ORDER BY status, checkpoint_change#, checkpoint_time;

STATUS         CHECKPOINT_CHANGE# CHECKPOINT_T     LAST_CHANGE#          ROW_NUM
-------------- ------------------ ------------ ---------------- ----------------
ONLINE                69632585947 04-JUL-22                                   38
SYSTEM                69632585947 04-JUL-22                                    2

SQL> set numw 16
SQL> col CHECKPOINT_TIME for a40
SQL> set lines 150
SQL> set pages 1000
SQL> SELECT status,
  2  to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss') checkpoint_time,FUZZY,checkpoint_change#,
  3  count(*) ROW_NUM
  4  FROM v$datafile_header
  5  GROUP BY status, checkpoint_change#, to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss'),fuzzy
  6  ORDER BY status, checkpoint_change#, checkpoint_time;

STATUS         CHECKPOINT_TIME                          FUZZY  CHECKPOINT_CHANGE#          ROW_NUM
-------------- ---------------------------------------- ------ ------------------ ----------------
ONLINE         2022-07-04 09:03:24                      YES           69631105424               40

20220704230638


通过上述分析,该库相当数据文件和redo文件之间相差了一段时间数据,而且该库为非归档,基于这种情况,该库只能强制打开,在打开过程中遇到ORA-600 ktpridestroy2错误

SMON: enabling tx recovery
Database Characterset is AL32UTF8
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
SMON: Restarting fast_start parallel rollback
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7332.trc  (incident=41257):
ORA-00600: internal error code, arguments: [ktpridestroy2], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /data/oracle/diag/rdbms/orcl/orcl/incident/incdir_41257/orcl_smon_7332_i41257.trc
Starting background process QMNC
Mon Jul 04 16:31:44 2022
QMNC started with pid=36, OS id=7454 
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Fatal internal error happened while SMON was doing active transaction recovery.
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7332.trc:
ORA-00600: internal error code, arguments: [ktpridestroy2], [], [], [], [], [], [], [], [], [], [], []
SMON (ospid: 7332): terminating the instance due to error 474
Instance terminated by SMON, pid = 7332

对应trace文件

Dump continued from file: /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7332.trc
ORA-00600: internal error code, arguments: [ktpridestroy2], [], [], [], [], [], [], [], [], [], [], []

========= Dump for incident 41257 (ORA 600 [ktpridestroy2]) ========

*** 2022-07-04 16:31:44.261
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- SQL Statement (None) -----
Current SQL information unavailable - no cursor.

----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
skdstdst()+36        call     kgdsdst()            000000000 ? 000000000 ?
                                                   7FFCD123B998 ? 000000001 ?
                                                   7FFCD123FE98 ? 000000000 ?
ksedst1()+98         call     skdstdst()           000000000 ? 000000000 ?
                                                   7FFCD123B998 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ksedst()+34          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FFCD123B998 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbkedDefDump()+2736  call     ksedst()             000000000 ? 000000001 ?
                                                   7FFCD123B998 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ksedmp()+36          call     dbkedDefDump()       000000003 ? 000000002 ?
                                                   7FFCD123B998 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ksfdmp()+64          call     ksedmp()             000000003 ? 000000002 ?
                                                   7FFCD123B998 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbgexPhaseII()+1764  call     ksfdmp()             000000003 ? 000000002 ?
                                                   7FFCD123B998 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbgexProcessError()  call     dbgexPhaseII()       7F3C5D15C6F0 ? 7F3C5A851598 ?
+2279                                              7FFCD1247C88 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbgeExecuteForError  call     dbgexProcessError()  7F3C5D15C6F0 ? 7F3C5A851598 ?
()+83                                              000000001 ? 000000000 ?
                                                   7FFC00000000 ? 000000000 ?
dbgePostErrorKGE()+  call     dbgeExecuteForError  7F3C5D15C6F0 ? 7F3C5A851598 ?
1615                          ()                   000000001 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbkePostKGE_kgsf()+  call     dbgePostErrorKGE()   000000000 ? 7F3C5A6C1228 ?
63                                                 000000258 ? 7F3C5A851598 ?
                                                   000000000 ? 000000000 ?
kgeadse()+383        call     dbkePostKGE_kgsf()   00A984C60 ? 7F3C5A6C1228 ?
                                                   000000258 ? 7F3C5A851598 ?
                                                   000000000 ? 000000000 ?
kgerinv_internal()+  call     kgeadse()            00A984C60 ? 7F3C5A6C1228 ?
45                                                 000000258 ? 000000000 ?
                                                   000000000 ? 000000000 ?
kgerinv()+33         call     kgerinv_internal()   00A984C60 ? 7F3C5A6C1228 ?
                                                   D124022000000000 ?
                                                   000000258 ? 000000000 ?
                                                   000000000 ?
kgeasnmierr()+143    call     kgerinv()            00A984C60 ? 7F3C5A6C1228 ?
                                                   D124022000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ?
ktpridestroy()+912   call     kgeasnmierr()        00A984C60 ? 7F3C5A6C1228 ?
                                                   D124022000000000 ?
                                                   000000000 ? 1E0F02D40 ?
                                                   1EC6DA410 ?
ktprw1s()+527        call     ktpridestroy()       D124022000000000 ?
                                                   000000000 ? 1E7A1C2B0 ?
                                                   000000000 ? 1E0F02D40 ?
                                                   1EC6DA410 ?
ktprsched()+197      call     ktprw1s()            D124022000000000 ?
                                                   000000000 ? 1E7A1C2B0 ?
                                                   000000000 ? 1E0F02D40 ?
                                                   1EC6DA410 ?
kturRecoverUndoSegm  call     ktprsched()          D124022000000000 ?
ent()+1057                                         000000000 ? 1E7A1C2B0 ?
                                                   000000000 ? 1E0F02D40 ?
                                                   1EC6DA410 ?
kturRecoverActiveTx  call     kturRecoverUndoSegm  000000000 ? 000000000 ?
ns()+710                      ent()                000000001 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
ktprbeg()+2506       call     kturRecoverActiveTx  000000004 ? 000000000 ?
                              ns()                 000000027 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
ktmmon()+13588       call     ktprbeg()            000000000 ? 000000000 ?
                                                   000000027 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
ktmSmonMain()+201    call     ktmmon()             06002DEC0 ? 000000000 ?
                                                   000000027 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
ksbrdp()+923         call     ktmSmonMain()        06002DEC0 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
opirip()+618         call     ksbrdp()             06002DEC0 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
opidrv()+598         call     opirip()             000000032 ? 000000004 ?
                                                   7FFCD124B658 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
sou2o()+98           call     opidrv()             000000032 ? 000000004 ?
                                                   7FFCD124B658 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
opimai_real()+261    call     sou2o()              7FFCD124B630 ? 000000032 ?
                                                   000000004 ? 7FFCD124B658 ?
                                                   0D124FFFF ? 6200000005 ?
ssthrdmain()+209     call     opimai_real()        000000000 ? 7FFCD124B820 ?
                                                   000000004 ? 7FFCD124B658 ?
                                                   0D124FFFF ? 6200000005 ?
main()+196           call     ssthrdmain()         000000003 ? 7FFCD124B820 ?
                                                   000000001 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
__libc_start_main()  call     main()               000000003 ? 7FFCD124B9C0 ?
+245                                               000000001 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
_start()+36          call     __libc_start_main()  0009C12F0 ? 000000001 ?
                                                   7FFCD124B9B8 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
--------------------- Binary Stack Dump ---------------------

通过分析确认该错误和并行恢复有关系,绕过该错误之后,再次尝试启动库报错为ORA-600 4137

Mon Jul 04 16:33:41 2022
SMON: enabling cache recovery
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is AL32UTF8
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7554.trc  (incident=42457):
ORA-00600: internal error code, arguments: [4137], [6.11.21484016], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /data/oracle/diag/rdbms/orcl/orcl/incident/incdir_42457/orcl_smon_7554_i42457.trc
Stopping background process MMNL
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7554.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/data/oracle/oradata/orcl/redo03.log'
ORA-00600: internal error code, arguments: [4137], [6.11.21484016], [0], [0], [], [], [], [], [], [], [], []
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7554.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/data/oracle/oradata/orcl/redo03.log'
ORA-00600: internal error code, arguments: [4137], [6.11.21484016], [0], [0], [], [], [], [], [], [], [], []
ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering transaction (6, 11).
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7554.trc:
ORA-00600: internal error code, arguments: [4137], [6.11.21484016], [0], [0], [], [], [], [], [], [], [], []

该错误比较常见,一般是由于undo中有异常事务,对异常事务进行处理,数据库open成功,并顺利导入数据到新库中,完成本次数据恢复

ORA-600 2032故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 2032故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户数据库,异常断电之后,数据库运行不稳定(经常性的重启),通过分析发现

Wed Jun 29 01:04:39 2022
Completed: alter database open
Wed Jun 29 01:04:42 2022
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_j000_3284.trc:
ORA-12012: error on auto execute of job 1
ORA-01578: ORACLE data block corrupted (file # 2, block # 552)
ORA-01110: data file 2: 'E:\ORACLE\PRODUCT\10.2.0\ORADATA_2\ORCL\UNDOTBS01.DBF'
…………
Wed Jun 29 01:13:28 2022
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Wed Jun 29 01:15:34 2022
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_m000_5488.trc:
ORA-00600: internal error code, arguments: [6002], [6], [48], [5], [0], [], [], []
Wed Jun 29 01:20:54 2022
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.
SMON encountered 2 out of maximum 100 non-fatal internal errors.
Wed Jun 29 01:20:55 2022
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_smon_6956.trc:
ORA-00600: internal error code, arguments: [2032], [8389160], [8389160], [8192], [2], [255], [0], [767]

Non-fatal internal error happenned while SMON was doing flushing of monitored table stats.
SMON encountered 3 out of maximum 100 non-fatal internal errors.
Wed Jun 29 01:20:57 2022
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_smon_6956.trc:
ORA-00600: internal error code, arguments: [2032], [8389160], [8389160], [8192], [2], [255], [0], [767]
………………
Wed Jun 29 01:21:41 2022
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_q001_2124.trc:
ORA-00474: SMON process terminated with error

Wed Jun 29 01:21:42 2022
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_dbw0_3376.trc:
ORA-00474: SMON process terminated with error

Wed Jun 29 01:21:42 2022
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_reco_2412.trc:
ORA-00474: SMON process terminated with error

Instance terminated by PMON, pid = 7160

对ora-600 2032进行分析

*** 2022-06-29 01:13:26.907
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [2032], [8389160], [8389160], [8192], [2], [255], [0], [767]
Current SQL statement for this session:
insert into smon_scn_time (thread, time_mp, time_dp, scn, scn_wrp, scn_bas,  num_mappings, 
tim_scn_map) values (0, :1, :2, :3, :4, :5, :6, :7)
check trace file e:\oracle\product\10.2.0\db_2\rdbms\trace\orcl_ora_0.trc for preloading .sym file messages
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedmp+663           CALL???  ksedst+55            003C878B8 000000000 00FF57178
                                                   000000000
ksfdmp+19            CALL???  ksedmp+663           000000003 00ED07680 010453DA8
                                                   003CACC80
kgeriv+184           CALL???  ksfdmp+19            7FF2378C000 7FF5E2F81D8
                                                   7FF5EBB9300 7FF5EBB92F0
kgesiv+102           CALL???  kgeriv+184           000000002 00ED07040 000000002
                                                   00FF59110
ksesic7+125          CALL???  kgesiv+102           7FFFFFFF00800228 100000002
                                                   200000001 00FF59328
kcopcv+1014          CALL???  ksesic7+125          0000007F0 000000000 000800228
                                                   000000000
kcbchg1_main+3115    CALL???  kcopcv+1014          00001E778 7FF00000001
                                                   00001E980 00001E650
kcbchg1+238          CALL???  kcbchg1_main+3115    00ED07040 00ED07040 000040013
                                                   000000000
ktuchg+1331          CALL???  kcbchg1+238          7FF00000000 000000003
                                                   00FF597C8 00FF59780
ktbchg2+341          CALL???  ktuchg+1331          000000000 000000401 000000046
                                                   000BB5D21
kdtchg+916           CALL???  ktbchg2+341          000000001 000000000 000000240
                                                   000000000
kdtwrp+2582          CALL???  kdtchg+916           01077A268 01044404C 010444054
                                                   010441710
kdtInsRow+705        CALL???  kdtwrp+2582          01077A268 7FF5E2F8510
                                                   003CB3358 0009C20C4
insrowFastPath+125   CALL???  kdtInsRow+705        00ED07040 7FF55A86E30
                                                   7FF58E87918 00001F318
insdrvFastPath+478   CALL???  insrowFastPath+125   000000067 000000000 000000000
                                                   000000000
inscovexe+434        CALL???  insdrvFastPath+478   01077A268 000000000 010440070
                                                   00521E0C9
insExecStmtExecIniE  CALL???  inscovexe+434        7FF55A88FD8 7FF55A86E30
ngine+99                                           00FF5BAD8 0009FA3EC
insexe+453           CALL???  insExecStmtExecIniE  8B2D554D9623 000000006
                              ngine+99             000000006 0000000C0
opiexe+4991          CALL???  insexe+453           7FF55A885A0 00FF5BAD8
                                                   000000102 000000000
opiall0+1931         CALL???  opiexe+4991          7FF00000049 000000003
                                                   00FF5C160 000000020
opikpr+660           CALL???  opiall0+1931         000000065 000000022 00FF5C638
                                                   000000000
opiodr+1136          CALL???  opikpr+660           000000065 000000017 01044B798
                                                   07EEF1FCF
rpidrus+230          CALL???  opiodr+1136          000000065 000000017 01044B798
                                                   80005900000000
rpidru+112           CALL???  rpidrus+230          00FF5D3F0 000000003 01078D4A0
                                                   7FF5B6F1A80
rpiswu2+517          CALL???  rpidru+112           7FF5DA52BB8 000000000
                                                   000000000 000000008
kprball+1446         CALL???  rpiswu2+517          7FF5E408820 000000000
                                                   00FF5DAD0 000000002
ktf_scn_time+4951    CALL???  kprball+1446         01044B798 8B2D00000140
                                                   000000000 000000005
ktmmon+4107          CALL???  ktf_scn_time+4951    000000000 000000001 07FFFFFFF
                                                   00521A3C2
ktmSmonMain+26       CALL???  ktmmon+4107          0049CBAC0 000000004
                                                   8B2D554D9623 00001E768
ksbrdp+903           CALL???  ktmSmonMain+26       003CC2A18 0049CBADC 000000008
                                                   000000004
opirip+700           CALL???  ksbrdp+903           726F77740000001E 003C8B000
                                                   00FF5FA30 000000000
opidrv+860           CALL???  opirip+700           000000032 000000004 00FF5FD50
                                                   000000000
sou2o+52             CALL???  opidrv+860           000000032 000000004 00FF5FD50
                                                   000000003
opimai_real+272      CALL???  sou2o+52             000000000 000000000 000000000
                                                   000000000
opimai+96            CALL???  opimai_real+272      000000000 000000000 000000000
                                                   000000000
BackgroundThreadSta  CALL???  opimai+96            00FF5FEA8 000000001 000000000
rt+633                                             000000000
000000007738F56D     CALL???  BackgroundThreadSta  0069E4590 000000000 000000000
                              rt+633               000000000
0000000077703021     CALL???  000000007738F56D     000000000 000000000 000000000
                                                   000000000
 
--------------------- Binary Stack Dump ---------------------

通过该trace和alert日志信息可以确认是由于smon_scn_time的操作需要使用到undo,但是对应的undo block异常,从而使得该操作失败,进而引起数据库smon进程异常从而引起ORA-00474,数据库自动crash.处理问题比较简单:
1. 对异常undo进行处理,创建新undo,删除老undo
2. 对于smon_scn_time异常数据进行处理

Oracle Recovery Tools实战批量坏块修复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle Recovery Tools实战批量坏块修复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户数据库无法正常启动ORA-600 6711错误

SQL> startup mount pfile='d:/pfile.txt'
ORACLE instance started.

Total System Global Area 4294964032 bytes
Fixed Size                  9036608 bytes
Variable Size             889192448 bytes
Database Buffers         3388997632 bytes
Redo Buffers                7737344 bytes
Database mounted.
SQL> alter database open ;
alter database open 
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898],
[0], [], [], [], [], [], [], []
Process ID: 22708
Session ID: 978 Serial number: 56675

alert日志报错

2022-06-26T12:34:41.855326+08:00
alter database open
2022-06-26T12:34:41.984974+08:00
Ping without log force is disabled:
  instance mounted in exclusive mode.
Endian type of dictionary set to little
Undo initialization finished serial:0 start:313418906 end:313418906 diff:0 ms (0.0 seconds)
Database Characterset is ZHS16GBK
No Resource Manager plan active
2022-06-26T12:34:43.302315+08:00
Errors in file C:\APP\XFF\diag\rdbms\orcl\ora19c\trace\ora19c_ora_22708.trc  (incident=38629):
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898], [0], [], [], [], [], [], [], []
Incident details in: C:\APP\XFF\diag\rdbms\orcl\ora19c\incident\incdir_38629\ora19c_ora_22708_i38629.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2022-06-26T12:34:44.232115+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2022-06-26T12:34:44.315431+08:00
Errors in file C:\APP\XFF\diag\rdbms\orcl\ora19c\trace\ora19c_ora_22708.trc:
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898], [0], [], [], [], [], [], [], []
2022-06-26T12:34:44.315431+08:00
Errors in file C:\APP\XFF\diag\rdbms\orcl\ora19c\trace\ora19c_ora_22708.trc:
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898], [0], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
Errors in file C:\APP\XFF\diag\rdbms\orcl\ora19c\trace\ora19c_ora_22708.trc  (incident=38630):
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898], [0], [], [], [], [], [], [], []
Incident details in: C:\APP\XFF\diag\rdbms\orcl\ora19c\incident\incdir_38630\ora19c_ora_22708_i38630.trc
2022-06-26T12:34:45.266678+08:00
opiodr aborting process unknown ospid (22708) as a result of ORA-603
2022-06-26T12:34:45.274688+08:00
ORA-603 : opitsk aborting process
License high water mark = 1
USER (ospid: (prelim)): terminating the instance due to ORA error 

通过分析trace文件进行分析,确认是由于histgrm$表异常导致,通过一些特殊处理,绕过该表相关sql,open数据库,并且尝试导出数据

SQL> startup mount pfile='d:/pfile.txt';
ORACLE instance started.

Total System Global Area 4294964032 bytes
Fixed Size                  9036608 bytes
Variable Size             889192448 bytes
Database Buffers         3388997632 bytes
Redo Buffers                7737344 bytes
Database mounted.
SQL>
SQL>
SQL> alter database open;

Database altered.

使用expdp导出数据报ORA-01578错
expdp-ora-1578


通过分析是由于system有坏块导致,dbv检查文件
dbv-huikuai

通过Oracle Recovery Tools工具批量坏块修复功能修复
20220626123245
20220626123343

通过工具修复大量主要坏块被修复,还有一些内部逻辑错误(后续工具继续完善),再次尝试逻辑导出数据,无任何报错,数据比较完美恢复
20220626160209

ORA-15063: ASM discovered an insufficient number of disks for diskgroup 恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-15063: ASM discovered an insufficient number of disks for diskgroup 恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户反馈三个磁盘组无法正常mount,报错类似ORA-15032 ORA-15017 ORA-15063

SQL> ALTER DISKGROUP ASM_DATA MOUNT  /* asm agent *//* {0:0:2} */ 
NOTE: cache registered group ASM_DATA number=1 incarn=0xffa85ccd
NOTE: cache began mount (first) of group ASM_DATA number=1 incarn=0xffa85ccd
ERROR: no read quorum in group: required 2, found 0 disks
NOTE: cache dismounting (clean) group 1/0xFFA85CCD (ASM_DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 5709, image: oracle@XFF (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0xFFA85CCD (ASM_DATA) 
NOTE: cache ending mount (fail) of group ASM_DATA number=1 incarn=0xffa85ccd
NOTE: cache deleting context for group ASM_DATA 1/0xffa85ccd
Tue Jun 21 12:24:38 2022
NOTE: No asm libraries found in the system
ASM Health Checker found 1 new failures
GMON dismounting group 1 at 16 for pid 19, osid 5709
ERROR: diskgroup ASM_DATA was not mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup "ASM_DATA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "ASM_DATA"
ERROR: ALTER DISKGROUP ASM_DATA MOUNT  /* asm agent *//* {0:0:2} */

初步判断是asm disk异常导致(比如asm disk不能被扫描到,或者丢失,或者磁盘头损坏等),分析客户的asm disk的udev文件配置

KERNEL=="sdd1", NAME="asm_grid", OWNER="grid", GROUP="asmadmin", MODE="0660"          
KERNEL=="sde1", NAME="asm_system", OWNER="grid", GROUP="asmadmin", MODE="0660"    
KERNEL=="sdf1", NAME="asm_data", OWNER="grid", GROUP="asmadmin", MODE="0660"     

从udev的配置中可以看出来,客户以前是对3个磁盘进行分析,然后使用udev映射别名给asm使用的.通过对其中一个磁盘进行分析
20220621220634
20220621220728


通过上述winhex查看,可以确认该分区的磁盘头信息异常[该信息属于磁盘刚分区的时候信息,而不是asm disk的信息],和kfed看到的结果一致[磁盘头位置肯定损坏,其他位置目前未知]

H:\TEMP\dd>kfed read sdf_sdf1.dd
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
0064D8400 00000000 00000000 00000000 00000000  [................]
        Repeat 26 times
0064D85B0 00000000 00000000 00000000 02000000  [................]
0064D85C0 FE8E0001 003FFFFF DFFC0000 0000257F  [......?......%..]
0064D85D0 00000000 00000000 00000000 00000000  [................]
        Repeat 1 times
0064D85F0 00000000 00000000 00000000 AA550000  [..............U.]
0064D8600 00000000 00000000 00000000 00000000  [................]
  Repeat 223 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

分析其他位置的block情况,初步看基本上ok[运气还不错]

H:\TEMP\dd>kfed read sdf_sdf1.dd blkn=2|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

H:\TEMP\dd>kfed read sdf_sdf1.dd blkn=3|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

H:\TEMP\dd>kfed read sdf_sdf1.dd blkn=1 aun=2|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

通过检索备份出来的部分磁盘文件,找出来ORCLDISK信息部分(asm disk header)
20220621221843


然后利用这个部分对损坏的磁盘头进行修复,并且dd回生产环境中,并尝试mount磁盘组,数据库open成功
20220621181430
20220621222356


至此这个数据库运气不错,没有过多损坏,算完美恢复,可以进行了逻辑导出和rman备份,全部正常.为了后续安全,建议对其进行迁移

修改oracle scn小工具(patch scn)

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:修改oracle scn小工具(patch scn)

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在一些情况下(特别是一些数据库非常规恢复场景中),需要修改oracle scn绕过一些错误,让数据库open成功,在以前的版本中我们可以通过event,隐含参数,oradebug等方法进行修改,在一些较新的版本中这些方法都被oracle屏蔽,无法实现oracle scn进行调整,针对这种情况,开发了一个Patch_SCN小程序,实现对oracle数据库的scn进行调整

SQL> select dbms_flashback.get_system_change_number a from dual;

               A
----------------
    107367806959

通过工具查询scn信息,由于oracle的scn是动态的,因此和get_system_change_number 查询值有细微出入
20220614130358


修改scn
20220614130543

通过查询确认scn修改成功
20220614130553

该功能的通用版已经包含到oracle racovery tools工具中(注册版可用)
oracle-recovery-tools

软件下载:OraRecovery下载
使用说明:使用说明

ORA-01110 ORA-17070 OSD-04006 故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-01110 ORA-17070 OSD-04006 故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友找到我说应用访问数据库和导出数据都报ORA-01110 ORA-17070 OSD-04006之类错误,数据库可以正常open,但是业务访问关键数据和导出报错
20220611175122
20220611175156


对于这个错误,根据以往恢复经验,初步判断可能硬件异常(比如坏道,硬件故障)或者文件系统异常引起,让客户尝试拷贝该文件,确认该文件也无法拷贝
20220611174852

对于这种情况,如果放弃该文件,恢复其他文件数据,那样数据丢失比例太大,直接通过特定恢复工具对其损坏文件进行拷贝,最大限度强求当前文件数据,发现一些扇区损坏跳过继续拷贝
20220611174928

通过坏块检查工具进行检查确认该文件76个block损坏(对于32G的数据文件损坏1M数据,比较好效果)
20220611174909

对坏块进行处理,然后使用expdp导出数据,最大限度抢救数据
20220611175039