联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
xx医院存储突然掉线,导致数据库异常,现场工程师折腾了一天,问题依旧没有解决,无奈之下找到我们,希望我们能够帮忙恢复数据库.
启动报ORA-00600[2131]错误
Fri Nov 06 14:50:59 2015 ALTER DATABASE MOUNT This instance was first to mount Fri Nov 06 14:50:59 2015 ALTER SYSTEM SET local_listener=' (ADDRESS=(PROTOCOL=TCP)(HOST=192.168.4.4)(PORT=1521))' SCOPE=MEMORY SID='xifenfei1'; NOTE: Loaded library: System SUCCESS: diskgroup DATA was mounted NOTE: dependency between database xifenfei and diskgroup resource ora.DATA.dg is established Errors in file /home/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_13221.trc (incident=191085): ORA-00600: internal error code, arguments: [2131], [33], [32], [], [], [], [], [], [], [], [], [] Incident details in: /home/app/oracle/diag/rdbms/xifenfei/xifenfei1/incident/incdir_191085/xifenfei1_ora_13221_i191085.trc Fri Nov 06 14:51:10 2015 Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. ORA-600 signalled during: ALTER DATABASE MOUNT...
出现该错误的原因是由于:We are attempting to write a controlfile checkpoint progress record, but find we do not have the progress record generating this exception.由于控制文件异常导致,出现此类情况,我们一般使用单个控制文件一次尝试,如果都不可以考虑重建控制文件
由于坏块(逻辑/物理)导致数据库实例恢复无法进行
Beginning crash recovery of 2 threads Started redo scan kcrfr_rnenq: use log nab 393216 kcrfr_rnenq: use log nab 2 Completed redo scan read 4427 KB redo, 500 data blocks need recovery Started redo application at Thread 1: logseq 5731, block 391398 Thread 2: logseq 4252, block 520815 Recovery of Online Redo Log: Thread 1 Group 2 Seq 5731 Reading mem 0 Mem# 0: +DATA/xifenfei/onlinelog/group_2.266.835331047 Recovery of Online Redo Log: Thread 2 Group 8 Seq 4252 Reading mem 0 Mem# 0: +DATA/xifenfei/onlinelog/group_8.331.835330421 Errors in file /home/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_13770.trc (incident=197486): ORA-00600: internal error code, arguments: [kdxlin:psno out of range], [], [], [], [], [], [], [], [], [], [], [] Incident details in:/home/app/oracle/diag/rdbms/xifenfei/xifenfei1/incident/incdir_197486/xifenfei1_ora_13770_i197486.trc Fri Nov 06 15:03:09 2015 Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /home/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_13770.trc (incident=197487): ORA-01578: ORACLE data block corrupted (file # 2, block # 65207) ORA-01110: data file 2: '+DATA/xifenfei/datafile/sysaux.257.835324753' ORA-10564: tablespace SYSAUX ORA-01110: data file 2: '+DATA/xifenfei/datafile/sysaux.257.835324753' ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 81045 ORA-00600: internal error code, arguments: [kdxlin:psno out of range], [], [], [], [], [], [], [], [], [], [], [] Incident details in:/home/app/oracle/diag/rdbms/xifenfei/xifenfei1/incident/incdir_197487/xifenfei1_ora_13770_i197487.trc Errors in file /home/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_13770.trc: ORA-01578: ORACLE data block corrupted (file # 2, block # 65207) ORA-01110: data file 2: '+DATA/xifenfei/datafile/sysaux.257.835324753' ORA-10564: tablespace SYSAUX ORA-01110: data file 2: '+DATA/xifenfei/datafile/sysaux.257.835324753' ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 81045 ORA-00600: internal error code, arguments: [kdxlin:psno out of range], [], [], [], [], [], [], [], [], [], [], [] Recovery of Online Redo Log: Thread 2 Group 3 Seq 4253 Reading mem 0 Mem# 0: +DATA/xifenfei/onlinelog/group_3.332.835330505 Hex dump of (file 14, block 62536) in trace file /home/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_13770.trc Reading datafile '+DATA/xifenfei/datafile/ht01.dbf' for corruption at rdba: 0x0380f448 (file 14, block 62536) Reread (file 14, block 62536) found same corrupt data (logically corrupt) RECOVERY OF THREAD 1 STUCK AT BLOCK 62536 OF FILE 14 Fri Nov 06 15:03:13 2015 Abort recovery for domain 0 Aborting crash recovery due to error 1172 Errors in file /home/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_13770.trc: ORA-01172: recovery of thread 1 stuck at block 62536 of file 14 ORA-01151: use media recovery to recover block, restore backup if needed Abort recovery for domain 0 Errors in file /home/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_13770.trc: ORA-01172: recovery of thread 1 stuck at block 62536 of file 14 ORA-01151: use media recovery to recover block, restore backup if needed ORA-1172 signalled during: ALTER DATABASE OPEN...
查看资料发现和Bug 14301592 – Several errors by corrupt blocks shifted by 2 bytes in buffer cache during recovery caused by INDEX redo apply,可以通过ALLOW 1 CORRUPTION临时解决
使用ALLOW 1 CORRUPTION进行恢复,出现ORA-07445[kdxlin]错误
Specify log: {<RET>=suggested | filename | AUTO | CANCEL} +DATA/xifenfei/onlinelog/group_3.332.835330505 ORA-00279: change 700860458 generated at 11/05/2015 21:20:15 needed for thread 1 ORA-00289: suggestion : +ARCHIVE/xifenfei/xifenfei_1_5731_835324843.arc ORA-00280: change 700860458 for thread 1 is in sequence #5731 Specify log: {<RET>=suggested | filename | AUTO | CANCEL} +DATA/xifenfei/onlinelog/group_2.266.835331047 ORA-00283: recovery session canceled due to errors ORA-10562: Error occurred while applying redo to data block (file# 2, block# 70104) ORA-10564: tablespace SYSAUX ORA-01110: data file 2: '+DATA/xifenfei/datafile/sysaux.257.835324753' ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 82289 ORA-00607: Internal error occurred while making a change to a data block ORA-00602: internal programming exception ORA-07445: exception encountered: core dump [kdxlin()+4088] [SIGSEGV] [ADDR:0xC] [PC:0x95FB572] [Address not mapped to object] [] ORA-01112: media recovery not started
ORA-07445[kdxlin()+4088]未找到类似说明,到了这一步,无法简单的恢复成功,只能通过设置隐含参数跳过实例恢复,尝试resetlog库
通过设置_allow_resetlogs_corruption参数继续恢复
SQL> startup pfile='/tmp/pfile.ora' mount; ORACLE instance started. Total System Global Area 7315603456 bytes Fixed Size 2267384 bytes Variable Size 2566915848 bytes Database Buffers 4731174912 bytes Redo Buffers 15245312 bytes Database mounted. SQL> alter database open resetlogs; alter database open resetlogs * ERROR at line 1: ORA-01092: ORACLE instance terminated. Disconnection forced ORA-00600: internal error code, arguments: [kclchkblk_4], [0], [700869927], [0], [700860464], [], [], [], [], [], [], [] Process ID: 13563 Session ID: 157 Serial number: 3
alert日志报错
Fri Nov 06 19:26:39 2015 SMON: enabling cache recovery Instance recovery: looking for dead threads Instance recovery: lock domain invalid but no dead threads Errors in file /home/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_13563.trc (incident=319140): ORA-00600: internal error code, arguments: [kclchkblk_4], [0], [700869927], [0], [700860464], [], [], [], [], [], [], [] Incident details in:/home/app/oracle/diag/rdbms/xifenfei/xifenfei1/incident/incdir_319140/xifenfei1_ora_13563_i319140.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Redo thread 2 internally disabled at seq 1 (CKPT) ARC1: Archiving disabled thread 2 sequence 1 Archived Log entry 9956 added for thread 2 sequence 1 ID 0x0 dest 1: ARC3: Archival started ARC0: STARTING ARCH PROCESSES COMPLETE Errors in file /home/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_13563.trc: ORA-00600: internal error code, arguments: [kclchkblk_4], [0], [700869927], [0], [700860464], [], [], [], [], [], [], [] Errors in file /home/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_13563.trc: ORA-00600: internal error code, arguments: [kclchkblk_4], [0], [700869927], [0], [700860464], [], [], [], [], [], [], [] Error 600 happened during db open, shutting down database USER (ospid: 13563): terminating the instance due to error 600 Fri Nov 06 19:26:42 2015 Instance terminated by USER, pid = 13563 ORA-1092 signalled during: alter database open resetlogs... opiodr aborting process unknown ospid (13563) as a result of ORA-1092 Fri Nov 06 19:26:42 2015 ORA-1092 : opitsk aborting process
这里是比较熟悉的ora-600[kclchkblk_4]错误,和ora-600[2662]错误类似,需要调整scn,由于数据库版本为11.2.0.4,无法使用常规方法调整scn,在修改控制文件,oradebug,bbed方法可供选择
使用oradebug方法处理
因为是asm环境,其他方法处理起来都相对麻烦
[oracle@wisetop1 ~]$ sqlplus / as sysdba SQL*Plus: Release 11.2.0.4.0 Production on Fri Nov 6 19:30:59 2015 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to an idle instance. SQL> startup pfile='/tmp/pfile.ora' mount; ORACLE instance started. Total System Global Area 7315603456 bytes Fixed Size 2267384 bytes Variable Size 2566915848 bytes Database Buffers 4731174912 bytes Redo Buffers 15245312 bytes Database mounted. SQL> oradebug setmypid Statement processed. SQL> oradebug poke 0x06001AE70 4 0x2FAF0800 BEFORE: [06001AE70, 06001AE74) = 00000000 AFTER: [06001AE70, 06001AE74) = 2FAF0800 SQL> alter database open; Database altered.
至此数据库open成功,后续就是处理一些坏块的工作,并建议客户逻辑重建库.
oradebug poke 0x06001AE70 4 0x2FAF0800
问下,这个里边的两个地址是哪里的?
0x06001AE70 是通过ordebug 定位scn的起点,0x2FAF0800是根据实际情况估算出来的