Oracle断电故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle断电故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

异常断电导致数据库异常恢复文件报ORA-00283 ORA-00742 ORA-00312

 
D:\check_db>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on 星期二 5月 31 00:38:42 2022

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


连接到:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> recover datafile 1;
ORA-00283: 恢复会话因错误而取消
ORA-00742: 日志读取在线程 %d 序列 %d 块 %d 中检测到写入丢失情况
ORA-00312: 联机日志 3 线程 1:
'D:\APP\ADMINISTRATOR\FAST_RECOVERY_AREA\ORCL\ONLINELOG\O1_MF_3_HJ32KJD5_.LOG'

这个错误比较明显是由于异常断电引起的写丢失导致.而且这种故障在没有备份的情况下,没有什么好处理方法,只能屏蔽一致性强制拉库,尝试强制拉库报错如下

SQL> startup mount pfile='d:/pfile.txt'
ORACLE 例程已经启动。

Total System Global Area 2.0310E+10 bytes
Fixed Size                  2290000 bytes
Variable Size            3690991280 bytes
Database Buffers         1.6576E+10 bytes
Redo Buffers               40837120 bytes
数据库装载完毕。
SQL> recover database until cancel;
ORA-00279: 更改 18755939194213 (在  生成) 对于线程 1 是必需的


指定日志: {<RET>=suggested | filename | AUTO | CANCEL}
D:\APP\ADMINISTRATOR\FAST_RECOVERY_AREA\ORCL\ONLINELOG\O1_MF_3_HJ32KJD5_.LOG
ORA-00600: internal error code, arguments: [3020], [2], [78824], [8467432], [],
[], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 78824, file
offset is 645726208 bytes)
ORA-10564: tablespace SYSAUX
ORA-01110: data file 2: 'D:\ORADATA\ORCL\SYSAUX01.DBF'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 80834


ORA-01112: 未启动介质恢复


SQL> alter database open resetlogs;
alter database open resetlogs
*
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [krsi_al_hdr_update.15], [4294967295], [], [],[], [], [], [], [], [], [], []

ORA-600 krsi_al_hdr_update.15错误,主要是由于redo异常导致无法resetlogs成功,具体参考:Alter Database Open Resetlogs returns error ORA-00600: [krsi_al_hdr_update.15], (Doc ID 2026541.1)描述,处理这个问题之后,再次resetlogs库,报ORA-600 2662错误

SQL> alter database open resetlogs;
alter database open resetlogs
*
第 1 行出现错误:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [4366], [4112122046],
[4366], [4112228996], [12583040], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [2662], [4366], [4112122045],
[4366], [4112228996], [12583040], [], [], [], [], [], []
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [4366], [4112122040],
[4366], [4112228996], [12583040], [], [], [], [], [], []
进程 ID: 4644
会话 ID: 1701 序列号: 3

这个问题比较简单,通过修改scn即可绕过去,之后数据库open报ORA-600 4194等错误

SQL> alter database open ;
alter database open 
*
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [4194], [
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_smon_5112.trc  (incident=322982):
ORA-00600: internal error code, arguments: [4137], [10.33.3070116], [0], [0], [], [], [], [], [], [], [], []
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_322982\orcl_smon_5112_i322982.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
replication_dependency_tracking turned off (no async multimaster replication found)
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_3340.trc  (incident=323030):
ORA-00600: 内部错误代码, 参数: [4194], [
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_323030\orcl_ora_3340_i323030.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue May 31 09:05:04 2022
Sweep [inc][322982]: completed
ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering transaction (10, 33).
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_smon_5112.trc:
ORA-00600: internal error code, arguments: [4137], [10.33.3070116], [0], [0], [], [], [], [], [], [], [], []
Checker run found 1 new persistent data failures
Tue May 31 09:05:05 2022
Sweep [inc][323030]: completed
Sweep [inc2][322982]: completed
Tue May 31 09:05:14 2022
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_smon_5112.trc  (incident=322983):
ORA-00600: internal error code, arguments: [4193], [10.33.3070116], [0], [], [], [], [], [], [], [], [], []
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_322983\orcl_smon_5112_i322983.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue May 31 09:05:14 2022
ORA-600 signalled during: alter database open...
Block recovery stopped at EOT rba 2.61.16
Block recovery completed at rba 2.61.16, scn 4366.4112429058
Block recovery from logseq 2, block 60 to scn 18755939643393
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: D:\APP\ADMINISTRATOR\FAST_RECOVERY_AREA\ORCL\ONLINELOG\O1_MF_2_K9BSVC11_.LOG
Block recovery completed at rba 2.61.16, scn 4366.4112429058
Dumping diagnostic data in directory=[cdmp_2022053],requested by(instance=1,osid=5112(SMON)),summary=[incident=322983].
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_smon_5112.trc:
ORA-01595: error freeing extent (3) of rollback segment (1))
ORA-00600: internal error code, arguments: [4193], [10.33.3070116], [3], [], [], [], [], [], [], [], [], []

对异常undo进行处理,数据库正常open成功

SQL> shutdown immediate;
ORA-00600: 内部错误代码, 参数: [4193], [


SQL> shutdown abort;
ORACLE 例程已经关闭。
SQL> startup mount
ORACLE 例程已经启动。

Total System Global Area 2.0310E+10 bytes
Fixed Size                  2290000 bytes
Variable Size            3690991280 bytes
Database Buffers         1.6576E+10 bytes
Redo Buffers               40837120 bytes
数据库装载完毕。
SQL> alter database open;

数据库已更改。

hcheck检测有一些字典不一致,建议客户逻辑导出,然后导入到新库中

HCheck Version 07MAY18 on 31-5月 -2022 09:12:22
----------------------------------------------
Catalog Version 11.2.0.4.0 (1102000400)
db_name: ORCL

                                   Catalog       Fixed
Procedure Name                     Version    Vs Release    Timestamp      Resul
t
------------------------------ ... ---------- -- ---------- -------------- -----
-
.- LobNotInObj                 ... 1102000400 <=  *All Rel* 05/31 09:12:22 PASS
.- MissingOIDOnObjCol          ... 1102000400 <=  *All Rel* 05/31 09:12:22 PASS
.- SourceNotInObj              ... 1102000400 <=  *All Rel* 05/31 09:12:22 PASS
.- OversizedFiles              ... 1102000400 <=  *All Rel* 05/31 09:12:22 PASS
.- PoorDefaultStorage          ... 1102000400 <=  *All Rel* 05/31 09:12:22 PASS
.- PoorStorage                 ... 1102000400 <=  *All Rel* 05/31 09:12:22 PASS
.- TabPartCountMismatch        ... 1102000400 <=  *All Rel* 05/31 09:12:22 PASS
.- OrphanedTabComPart          ... 1102000400 <=  *All Rel* 05/31 09:12:22 PASS
.- MissingSum$                 ... 1102000400 <=  *All Rel* 05/31 09:12:22 PASS
.- MissingDir$                 ... 1102000400 <=  *All Rel* 05/31 09:12:22 PASS
.- DuplicateDataobj            ... 1102000400 <=  *All Rel* 05/31 09:12:22 PASS
.- ObjSynMissing               ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- ObjSeqMissing               ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- OrphanedUndo                ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- OrphanedIndex               ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- OrphanedIndexPartition      ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- OrphanedIndexSubPartition   ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- OrphanedTable               ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- OrphanedTablePartition      ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- OrphanedTableSubPartition   ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- MissingPartCol              ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- OrphanedSeg$                ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- OrphanedIndPartObj#         ... 1102000400 <=  *All Rel* 05/31 09:12:23 FAIL

HCKE-0024: Orphaned Index Partition Obj# (no OBJ$) (Doc ID 1360935.1)
ORPHAN INDPART$: OBJ#=149167 BO#=6378 - no OBJ$ row
ORPHAN INDPART$: OBJ#=149168 BO#=6378 - no OBJ$ row

.- DuplicateBlockUse           ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- FetUet                      ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- Uet0Check                   ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- SeglessUET                  ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- BadInd$                     ... 1102000400 <=  *All Rel* 05/31 09:12:23 FAIL

HCKE-0030: OBJ$ INDEX entry has no IND$ or INDPART$/INDSUBPART$ entry (Doc ID 13
60528.1)
OBJ$ INDEX PARTITION has no INDPART$ entry: Obj#=148278 SYS Name=WRH$_FILESTATXS
_PK PARTITION=WRH$_FILEST_1572571104_16462
OBJ$ INDEX PARTITION has no INDPART$ entry: Obj#=148920 SYS Name=WRH$_FILESTATXS
_PK PARTITION=WRH$_FILEST_1572571104_16678

.- BadTab$                     ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- BadIcolDepCnt               ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- ObjIndDobj                  ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- TrgAfterUpgrade             ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- ObjType0                    ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- BadOwner                    ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- StmtAuditOnCommit           ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- BadPublicObjects            ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- BadSegFreelist              ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- BadDepends                  ... 1102000400 <=  *All Rel* 05/31 09:12:23 WARN

HCKW-0016: Dependency$ p_timestamp mismatch for VALID objects (Doc ID 1361045.1)

[E] - P_OBJ#=6376 D_OBJ#=6765

.- CheckDual                   ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- ObjectNames                 ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- BadCboHiLo                  ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
.- ChkIotTs                    ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
.- NoSegmentIndex              ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
.- BadNextObject               ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
.- DroppedROTS                 ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
.- FilBlkZero                  ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
.- DbmsSchemaCopy              ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
.- OrphanedObjError            ... 1102000400 >  1102000000 05/31 09:12:24 PASS
.- ObjNotLob                   ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
.- MaxControlfSeq              ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
.- SegNotInDeferredStg         ... 1102000400 >  1102000000 05/31 09:12:24 PASS
.- SystemNotRfile1             ... 1102000400 >   902000000 05/31 09:12:24 PASS
.- DictOwnNonDefaultSYSTEM     ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
.- OrphanTrigger               ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
.- ObjNotTrigger               ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
---------------------------------------
31-5月 -2022 09:12:24  Elapsed: 2 secs
---------------------------------------
Found 4 potential problem(s) and 1 warning(s)
Contact Oracle Support with the output and trace file
to check if the above needs attention or not

PL/SQL 过程已成功完成。

ORA-600 kccpb_sanity_check_2故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-600 kccpb_sanity_check_2故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

今天是有人在淘宝旺旺上找我,需要oracle数据库恢复支持
wangwang


远程登录上去一看发现数据库mount的时候报ORA-600[kccpb_sanity_check_2]错误

C:\Documents and Settings\Administrator>sqlplus / as sysdba
SQL*Plus: Release 10.2.0.3.0 - Production on Wed Jul 29 16:23:18 2015
Copyright (c) 1982, 2006, Oracle.  All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
SQL> alter database mount;
alter database mount
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kccpb_sanity_check_2], [14169],
[14160], [0x0], [], [], [], []

尝试重建控制文件

SQL> shutdown immediate;
ORA-01507: database not mounted
ORACLE instance shut down.
SQL> startup pfile='D:\database\m104\pfile\init.ora' nomount
ORACLE instance started.
Total System Global Area  444596224 bytes
Fixed Size                  1291072 bytes
Variable Size             155192512 bytes
Database Buffers          281018368 bytes
Redo Buffers                7094272 bytes
SQL> SHOW PARAMETER CONT;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
control_file_record_keep_time        integer     7
control_files                        string      D:\DATABASE\M104\CTRL\CONTROL0
                                                 2.CTL
global_context_pool_size             string
SQL> ALTER DATABASE MOUNT;
ALTER DATABASE MOUNT
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kccpb_sanity_check_2], [14169],
[14160], [0x0], [], [], [], []
SQL>
SQL> CREATE CONTROLFILE REUSE DATABASE "m104_db" NORESETLOGS  FORCE LOGGING NOAR
CHIVELOG
  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
  4      MAXDATAFILES 100
  5      MAXINSTANCES 8
  6      MAXLOGHISTORY 2921
  7  LOGFILE
  8    GROUP 1 'D:\database\m104\log\redo01.log'  SIZE 51200K,
  9    GROUP 2 'D:\database\m104\log\redo02.log'  SIZE 51200K,
 10    GROUP 3 'D:\database\m104\log\redo03.log'  SIZE 51200K
 11  DATAFILE
 12    'd:\database\m104\data\system01.dbf',
 13    'd:\database\m104\data\sysaux01.dbf',
 14    'd:\database\m104\data\USERS01.DBF',
 15    'd:\database\m104\data\UNDOTBS01.DBF',
 16    'd:\database\m104\data\INDX01.DBF'
 17  CHARACTER SET WE8ISO8859P1
 18  ;
CREATE CONTROLFILE REUSE DATABASE "m104_db" NORESETLOGS  FORCE LOGGING NOARCHIVE
LOG
*
ERROR at line 1:
ORA-01503: CREATE CONTROLFILE failed
ORA-00600: internal error code, arguments: [kccsga_update_ckpt_4], [1], [8],
[], [], [], [], []
SQL>
SQL> CREATE CONTROLFILE REUSE DATABASE "m104_db" RESETLOGS  FORCE LOGGING NOARCH
IVELOG
  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
  4      MAXDATAFILES 100
  5      MAXINSTANCES 8
  6      MAXLOGHISTORY 2921
  7  LOGFILE
  8    GROUP 1 'D:\database\m104\log\redo01.log'  SIZE 51200K,
  9    GROUP 2 'D:\database\m104\log\redo02.log'  SIZE 51200K,
 10    GROUP 3 'D:\database\m104\log\redo03.log'  SIZE 51200K
 11  DATAFILE
 12    'd:\database\m104\data\system01.dbf',
 13    'd:\database\m104\data\sysaux01.dbf',
 14    'd:\database\m104\data\USERS01.DBF',
 15    'd:\database\m104\data\UNDOTBS01.DBF',
 16    'd:\database\m104\data\INDX01.DBF'
 17  CHARACTER SET WE8ISO8859P1
 18  ;
CREATE CONTROLFILE REUSE DATABASE "m104_db" RESETLOGS  FORCE LOGGING NOARCHIVELO
G
*
ERROR at line 1:
ORA-01503: CREATE CONTROLFILE failed
ORA-00600: internal error code, arguments: [kccsga_update_ckpt_4], [1], [8],
[], [], [], [], []

无论是使用noresetlogs还是resetlogs,重建控制文件都报ORA-600[kccsga_update_ckpt_4]错误.比较奇怪,无解指定控制文件新名称重建试试看

修改控制文件路径

SQL> SHUTDOWN ABORT
ORACLE instance shut down.
SQL> startup pfile='D:\database\m104\pfile\init.ora' nomount
ORACLE instance started.
Total System Global Area  444596224 bytes
Fixed Size                  1291072 bytes
Variable Size             155192512 bytes
Database Buffers          281018368 bytes
Redo Buffers                7094272 bytes
SQL> SHOW PARAMETER CONT;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
control_file_record_keep_time        integer     7
control_files                        string      D:\DATABASE\M104\CTRL\CONTROL0
                                                 4.CTL
global_context_pool_size             string
SQL> CREATE CONTROLFILE REUSE DATABASE "m104_db" RESETLOGS  FORCE LOGGING NOARCH
IVELOG
  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
  4      MAXDATAFILES 100
  5      MAXINSTANCES 8
  6      MAXLOGHISTORY 2921
  7  LOGFILE
  8    GROUP 1 'D:\database\m104\log\redo01.log'  SIZE 51200K,
  9    GROUP 2 'D:\database\m104\log\redo02.log'  SIZE 51200K,
 10    GROUP 3 'D:\database\m104\log\redo03.log'  SIZE 51200K
 11  DATAFILE
 12    'd:\database\m104\data\system01.dbf',
 13    'd:\database\m104\data\sysaux01.dbf',
 14    'd:\database\m104\data\USERS01.DBF',
 15    'd:\database\m104\data\UNDOTBS01.DBF',
 16    'd:\database\m104\data\INDX01.DBF'
 17  CHARACTER SET WE8ISO8859P1
 18  ;
Control file created.

使用新的控制文件位置,这次终于数据库重建控制文件成功
尝试指定redo进行恢复,数据库正常打开

SQL> RECOVER DATABASE USING BACKUP CONTROLFILE UNTIL CANCEL;
ORA-00279: change 3643108240801 generated at 07/26/2015 20:15:22 needed for
thread 1
ORA-00289: suggestion :
D:\ORACLE\PRODUCT\10.2.0\DB_1\RDBMS\ARC00567_0866390669.001
ORA-00280: change 3643108240801 for thread 1 is in sequence #567
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
D:\database\m104\log\redo01.log
ORA-00310: archived log contains sequence 566; sequence 567 required
ORA-00334: archived log: 'D:\DATABASE\M104\LOG\REDO01.LOG'
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: 'D:\DATABASE\M104\DATA\SYSTEM01.DBF'
SQL> RECOVER DATABASE USING BACKUP CONTROLFILE UNTIL CANCEL;
ORA-00279: change 3643108240801 generated at 07/26/2015 20:15:22 needed for
thread 1
ORA-00289: suggestion :
D:\ORACLE\PRODUCT\10.2.0\DB_1\RDBMS\ARC00567_0866390669.001
ORA-00280: change 3643108240801 for thread 1 is in sequence #567
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
D:\database\m104\log\redo02.log
Log applied.
Media recovery complete.
SQL> ALTER DATABASE OPEN RESETLOGS;
Database altered.

数据库恢复完成。整个数据库恢复比较简单,但是注意这里的ORA-600[kccsga_update_ckpt_4]通过修改控制文件路径规避,具体原因待查。

知识点补充:ORA-600 [kccpb_sanity_check_2] [a] [b] {c}

VERSIONS:
  Versions 10.2 to 11.2
DESCRIPTION:
  This internal error is raised when the sequence number (seq#) of the
  current block of the controlfile is greater than the seq# in the controlfile header.
  The header value should always be equal to, or greater than the value
  held in the control file block(s).
  This extra check was introduced in Oracle 10gR2 to detect lost writes
  or stale reads to the header.
ARGUMENTS:
  Arg [a] seq# in control block header.
  Arg [b] seq# in the control file header.
  Arg {c}

Oracle异常恢复前备份保护现场建议—ASM环境

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:Oracle异常恢复前备份保护现场建议—ASM环境

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在上一篇中写道了文件系统的库,在进行异常恢复前的备份方法(Oracle异常恢复前备份保护现场建议—FileSystem环境),对于asm库,因为asm 里面的数据文件无法直接dd文件头,因此备份方式也有所改变.对于asm是mount,但是数据库不能打开,使用rman或者asm的cp命令全部备份数据文件也来不及或者空间不足,这样的情况下,你可以考虑使用rman或者cp命令备份控制文件和system表空间文件,cp命令备份redo,dd命令备份文件头,来完成asm情况下数据库异常恢复前备份

控制文件备份
11.2及其以后版本使用asmcmd cp命令处理

select 'asmcmd cp '||name||' &&backup_dir/' from v$datafile where ts#=0
union all
select 'asmcmd cp '||name||' &&backup_dir/crontrofile_'||rownum||'.ctl' from v$controlfile
union all
select 'asmcmd cp '||member||' &&backup_dir/'||thread#||'_'||a.group#||'_'||sequence#||'_'||substr(member,
instr(member,'/',-1)+1)  FROM v$log a, v$logfile b WHERE a.group# = B.GROUP#;

其他版本使用rman命令处理

--rman备份控制文件(/tmp目录自己修改)
copy current controlfile to '/tmp/ctl.ctl';
--rman备份system表空间
select 'copy datafile '||file#||' to ''&backup_dir/system_'||file#||'.dbf'';'
from v$datafile where ts#=0;
--redo无法直接备份

备份文件头

[grid@xifenfei ~]$ ss
SQL*Plus: Release 11.2.0.4.0 Production on Fri May 1 04:15:18 2015
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Automatic Storage Management option
SQL> set lines 150
SQL> select 'dd if='||c.PATH_KFDSK||' of=&&backup_path/'||a.GROUP_KFFXP||'_'||a.disk_kffxp||'_'||
  2  b.NUMBER_KFFIL||'.asm count=1 bs='|| d.AUSIZE_KFGRP||' skip='||a.au_kffxp backup_dd_cmd
  3   FROM x$kffxp a, X$KFFIL  b,X$KFDSK c,X$KFGRP d  WHERE
  4  a.GROUP_KFFXP=b.GROUP_KFFIL
  5  and a.NUMBER_KFFXP=b.NUMBER_KFFIL
  6  and b.FTYPE_KFFIL in(2,12)
  7  and b.NUMBER_KFFIL>255
  8  and a.xnum_kffxp=0
  9  and a.GROUP_KFFXP=c.GRPNUM_KFDSK
 10  and a.disk_kffxp=c.NUMBER_KFDSK
 11  and a.GROUP_KFFXP=d.NUMBER_KFGRP;
Enter value for backup_path: /tmp
old   1: select 'dd if='||c.PATH_KFDSK||' of=&&backup_path/'||a.GROUP_KFFXP||'_'||a.disk_kffxp||'_'||
new   1: select 'dd if='||c.PATH_KFDSK||' of=/tmp/'||a.GROUP_KFFXP||'_'||a.disk_kffxp||'_'||
BACKUP_DD_CMD
------------------------------------------------------------------------------------------------------------------
dd if=/dev/asm-disk1 of=/tmp/1_0_256.asm count=1 bs=1048576 skip=29
dd if=/dev/asm-disk2 of=/tmp/1_1_257.asm count=1 bs=1048576 skip=404
dd if=/dev/asm-disk2 of=/tmp/1_1_258.asm count=1 bs=1048576 skip=641
dd if=/dev/asm-disk1 of=/tmp/1_0_259.asm count=1 bs=1048576 skip=648
dd if=/dev/asm-disk3 of=/tmp/2_0_256.asm count=1 bs=1048576 skip=51

还原文件头

SQL> set lines 150
SQL> select 'dd of='||c.PATH_KFDSK||' if=&&backup_path/'||a.GROUP_KFFXP||'_'||a.disk_kffxp||
  2  '_'||b.NUMBER_KFFIL||'.asm count=1 conv=notrunc bs='|| d.AUSIZE_KFGRP||' seek='||a.au_kffxp restore_dd_cmd
  3   FROM x$kffxp a, X$KFFIL  b,X$KFDSK c,X$KFGRP d  WHERE
  4  a.GROUP_KFFXP=b.GROUP_KFFIL
  5  and a.NUMBER_KFFXP=b.NUMBER_KFFIL
  6  and b.FTYPE_KFFIL in(2,12)
  7  and b.NUMBER_KFFIL>255
  8  and a.xnum_kffxp=0
  9  and a.GROUP_KFFXP=c.GRPNUM_KFDSK
 10  and a.disk_kffxp=c.NUMBER_KFDSK
 11  and a.GROUP_KFFXP=d.NUMBER_KFGRP;
old   1: select 'dd of='||c.PATH_KFDSK||' if=&&backup_path/'||a.GROUP_KFFXP||'_'||a.disk_kffxp||
new   1: select 'dd of='||c.PATH_KFDSK||' if=/tmp/'||a.GROUP_KFFXP||'_'||a.disk_kffxp||
RESTORE_DD_CMD
-----------------------------------------------------------------------------------------------------------------
dd of=/dev/asm-disk1 if=/tmp/1_0_256.asm count=1 conv=notrunc bs=1048576 seek=29
dd of=/dev/asm-disk2 if=/tmp/1_1_257.asm count=1 conv=notrunc bs=1048576 seek=404
dd of=/dev/asm-disk2 if=/tmp/1_1_258.asm count=1 conv=notrunc bs=1048576 seek=641
dd of=/dev/asm-disk1 if=/tmp/1_0_259.asm count=1 conv=notrunc bs=1048576 seek=648
dd of=/dev/asm-disk3 if=/tmp/2_0_256.asm count=1 conv=notrunc bs=1048576 seek=51
SQL>

备份还原文件头测试–通过测试证明该方法备份文件头是ok的
关闭数据库,使用dd备份文件头

[oracle@xifenfei ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Fri May 1 04:21:49 2015
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

dul查看当前dbname值为XIFENFEI

[oracle@xifenfei dul]$ ./dul
Data UnLoader: 10.2.0.6.5 - Internal Only - on Fri May  1 04:37:43 2015
with 64-bit io functions
Copyright (c) 1994 2015 Bernard van Duijnen All rights reserved.
 Strictly Oracle Internal Use Only
Disk group DATA, dul group_cid 0
Discovered disk /dev/asm-disk1 as diskgroup DATA, disk number 0 size 3922 Mb File1 starts at 2, dul_disk_cid 0
Discovered disk /dev/asm-disk2 as diskgroup DATA, disk number 1 size 3922 Mb without File1 meta data, dul_disk_cid 1
Disk group XIFENFEI, dul group_cid 1
Discovered disk /dev/asm-disk3 as diskgroup XIFENFEI, disk number 0 size 4439 Mb File1 starts at 2, dul_disk_cid 2
DUL: Warning: Dictionary cache DC_ASM_EXTENTS is empty
Probing for attributes in File9, the attribute directory, for disk group DATA
attribute name "_extent_sizes", value "1 4 16"
attribute name "_extent_counts", value "20000 20000 2147483647"
Oracle data file size 775954432 bytes, block size 8192
Found db_id = 1495013434
Found db_name = XIFENFEI   <-----db name
DUL: Error: Filedir block not allocated, file does not exist
DUL: Error: Could not load asm meta data for group XIFENFEI file 9
Probing for filenames in File6, the alias directory, for disk group XIFENFEI
+XIFENFEI/XIFENFEI/DATAFILE/XIFENFEI.256.878397315
Probing for database datafiles in File1, the file directory,  for disk group XIFENFEI
File 256 datafile size 104865792, block size 8192
Disk group XIFENFEI has one file of type datafile

使用dd备份1文件头

[oracle@xifenfei tmp]$ dd if=/dev/asm-disk1 of=/tmp/1_0_256.asm count=1 bs=1048576 skip=29
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0168209 seconds, 62.3 MB/s

尝试把dbname从XIFENFEI修改为ORCL

SQL> select dump('XIFENFEI',16) from dual;
DUMP('XIFENFEI',16)
-------------------------------------
Typ=96 Len=8: 58,49,46,45,4e,46,45,49
SQL> SELECT DUMP('ORCL',16) FROM DUAL;
DUMP('ORCL',16)
-------------------------
Typ=96 Len=4: 4f,52,43,4c
SQL>

bbed修改XIFENFEI为ORCL

[oracle@xifenfei tmp]$ bbed filename='/tmp/1_0_256.asm' mode=edit
Password:
BBED: Release 2.0.0.0.0 - Limited Production on Fri May 1 04:24:06 2015
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
************* !!! For Oracle Internal Use only !!! ***************
BBED> set blocksize 8192
        BLOCKSIZE       8192
BBED> set block 1
        BLOCK#          1
BBED> map
 File: /tmp/1_0_256.asm (0)
 Block: 1                                     Dba:0x00000000
------------------------------------------------------------
 Data File Header
 struct kcvfh, 860 bytes                    @0
 ub4 tailchk                                @8188
BBED> p kcvfhhdr
struct kcvfhhdr, 76 bytes                   @20
   ub4 kccfhswv                             @20       0x00000000
   ub4 kccfhcvn                             @24       0x0b200400
   ub4 kccfhdbi                             @28       0x591c183a
   text kccfhdbn[0]                         @32      X
   text kccfhdbn[1]                         @33      I
   text kccfhdbn[2]                         @34      F
   text kccfhdbn[3]                         @35      E
   text kccfhdbn[4]                         @36      N
   text kccfhdbn[5]                         @37      F
   text kccfhdbn[6]                         @38      E
   text kccfhdbn[7]                         @39      I
BBED> d seek 32
 File: /tmp/1_0_256.asm (0)
 Block: 1                seeks:   32 to   63           Dba:0x00000000
------------------------------------------------------------------------
 58494645 4e464549 12040000 00720100 00200000 01000300 00000000 00000000
 <32 bytes per line>

dd把修改的block还原到asm中

[oracle@xifenfei dul]$ dd of=/dev/asm-disk1 if=/tmp/1_0_256.asm count=1 conv=notrunc bs=1048576 seek=29
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00253244 seconds, 414 MB/s

dul验证dbname 修改为ORCL成功

[oracle@xifenfei dul]$ ./dul
Data UnLoader: 10.2.0.6.5 - Internal Only - on Fri May  1 04:41:33 2015
with 64-bit io functions
Copyright (c) 1994 2015 Bernard van Duijnen All rights reserved.
 Strictly Oracle Internal Use Only
Disk group DATA, dul group_cid 0
Discovered disk /dev/asm-disk1 as diskgroup DATA, disk number 0 size 3922 Mb File1 starts at 2, dul_disk_cid 0
Discovered disk /dev/asm-disk2 as diskgroup DATA, disk number 1 size 3922 Mb without File1 meta data, dul_disk_cid 1
Disk group XIFENFEI, dul group_cid 1
Discovered disk /dev/asm-disk3 as diskgroup XIFENFEI, disk number 0 size 4439 Mb File1 starts at 2, dul_disk_cid 2
DUL: Warning: Dictionary cache DC_ASM_EXTENTS is empty
Probing for attributes in File9, the attribute directory, for disk group DATA
attribute name "_extent_sizes", value "1 4 16"
attribute name "_extent_counts", value "20000 20000 2147483647"
Oracle data file size 775954432 bytes, block size 8192
Found db_id = 1495013434
Found db_name = ORCL   <----修改后的dbname
DUL: Error: Filedir block not allocated, file does not exist
DUL: Error: Could not load asm meta data for group XIFENFEI file 9
Probing for filenames in File6, the alias directory, for disk group XIFENFEI
+XIFENFEI/XIFENFEI/DATAFILE/XIFENFEI.256.878397315
Probing for database datafiles in File1, the file directory,  for disk group XIFENFEI
File 256 datafile size 104865792, block size 8192
Disk group XIFENFEI has one file of type datafile

对于asm无法mount情况下备份asm disk header
asm磁盘的备份主要是备份磁盘头100M空间,使用dd命令直接备份

set lines 150
set pages 1000
select 'dd if='||path||' of=&asmbackup_dir/'||group_number||'_'||disk_number||'.asm bs=1048576
count=100' from v$asm_disk;
set lines 150
set pages 1000
select 'dd of='||path||' if=&asmbackup_dir/'||group_number||'_'||disk_number||'.asm bs=1048576
count=100 conv=notrunc' from v$asm_disk;

asmlib需要注意把ORCL:替换为/dev/oracleasm/disks/对应目录.

Oracle异常恢复前备份保护现场建议—FileSystem环境

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:Oracle异常恢复前备份保护现场建议—FileSystem环境

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

无论是在各种会议上,还是在朋友/网友私下请教Oracle数据库恢复的问题之时,我都强调,如果你没有十足的把握,请你对您的现场进行备份,确保别对现场进行二次损坏。你不能恢复数据库,但绝对不能再次破坏数据库,给二次恢复增加难度.这里对恢复前备份提供一些指导思想和简单脚本,希望对大家有帮助.

哪些文件需要备份
熟悉数据库恢复的朋友可能都情况,Oracle在异常恢复的过程中主要修改的是system表空间里面数据,其他数据文件,redo数据,控制文件(当然由于redo,undo导致其他数据文件内部的block也可能发生改变)。在备份时间,备份空间允许的情况下,是对这些文件全部备份为好

完整备份文件

set lines 150
set pages 10000
select name from v$datafile
union all
select name from v$controlfile
union all
select member from v$logfile;

有些情况下:比如如果全部备份时间过长,备份空间不足等情况下,我们该如何备份,尽量减少因为异常恢复导致对原环境的损坏.备份最核心的system表空间,数据文件头,redo file,control file等数据,由于这个不是简单的拷贝操作,因此在生成备份语句同时,也生成还原语句,切不可生成了备份语句后,无恢复语句,导致后面还原故障现场难度增大.

无法全备情况下linux/unix数据库恢复前备份

set lines 150
set pages 10000
select 'dd if='||name||' of=&&back_dir/'||ts#||'_'||file#||'.dbf bs=1048576 count=10'
from v$datafile where ts#<>0
union all
select 'dd if='||name||' of=&&back_dir/'||ts#||'_'||file#||'.dbf' from v$datafile where ts#=0
union all
select 'dd if='||name||' of=&&back_dir/control0'||rownum||'.ctl' from v$controlfile
union all
select 'dd if='||member||' of=&&back_dir/'||thread#||'_'||a.group#||'_'||sequence#||'_'||substr(member,
instr(member,'/',-1)+1)  FROM v$log a, v$logfile b WHERE a.group# = B.GROUP#;

无法全备情况下linux/unix使用备份还原

set lines 150
set pages 1000
select 'dd of='||name||' if=&&back_dir/'||ts#||'_'||file#||'.dbf bs=1048576 count=10 conv=notrunc'
from v$datafile where ts#<>0
union all
select 'dd if='||name||' if=&&back_dir/'||ts#||'_'||file#||'.dbf' from v$datafile where ts#=0
union all
select 'dd of='||name||' if=&&back_dir/control0'||rownum||'.ctl' from v$controlfile
union all
select 'dd of='||member||' if=&&back_dir/'||thread#||'_'||a.group#||'_'||sequence#||'_'||substr(member,
instr(member,'/',-1)+1)    FROM v$log a, v$logfile b WHERE a.group# = B.GROUP#;

由于win路径斜杠不一样(/和\的区别),因此在无法全备情况下win备份语句

set lines 150
set pages 10000
select 'dd if='||name||' of=&&back_dir\'||ts#||'_'||file#||'.dbf bs=1048576 count=10'
from v$datafile where ts#<>0
union all
select 'dd if='||name||' of=&&back_dir\'||ts#||'_'||file#||'.dbf' from v$datafile where ts#=0
union all
select 'dd if='||name||' of=&&back_dir\control0'||rownum||'.ctl' from v$controlfile
union all
select 'dd if='||member||' of=&&back_dir\'||thread#||'_'||a.group#||'_'||sequence#||'_'||substr(member,
instr(member,'\',-1)+1)   FROM v$log a, v$logfile b WHERE a.group# = B.GROUP#;

在无法全备情况下win还原语句

set lines 150
set pages 1000
select 'dd of='||name||' if=&&back_dir\'||ts#||'_'||file#||'.dbf bs=1048576 count=10 conv=notrunc'
from v$datafile where ts#<>0
union all
select 'dd if='||name||' if=&&back_dir\'||ts#||'_'||file#||'.dbf' from v$datafile where ts#=0
union all
select 'dd of='||name||' if=&&back_dir\control0'||rownum||'.ctl' from v$controlfile
union all
select 'dd of='||member||' if=&&back_dir\'||thread#||'_'||a.group#||'_'||sequence#||'_'||substr(member,
instr(member,'\',-1)+1)    FROM v$log a, v$logfile b WHERE a.group# = B.GROUP#;

这里提供win环境下dd命令程序win环境dd命令工具

备注:对于asm情况异常情况恢复,备份情况请不要参考该文章,具体请见后续文章,具体见Oracle异常恢复前备份保护现场建议—ASM环境

Oracle 异常恢复案例汇总

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:Oracle 异常恢复案例汇总

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在2014年11月11日来临之际,我整理Blog中和异常恢复案例相关的部分文章,供大家参考:
dul处理分区表
误删除dual表恢复
dul恢复drop表测试
跳过obj$坏块方法
bbed解决ORA-01190
exp dmp文件损坏恢复
当前联机日志损坏恢复
ORA-01578坏块解决(1)
ORA-01578坏块解决(2)
undo异常处理步骤(9i)
DUL挖ORACLE 8.0数据库
undo异常处理步骤(10g)
ORA-600 2663 故障恢复
dul恢复truncate表测试
bbed处理ORA-01200故障
dul 10支持oracle 11g r2
使用 dul 挖数据文件初试
sysaux数据文件异常恢复
ORA-01244/ORA-01110解决
恢复被rm意外删除数据文件
ORA-00600[4194]故障解决
ORA-01207/ORA-00338恢复
DUL10直接支持ORACLE 8.0
bbed修改undo$(回滚段)状态
记录8.0.5数据库恢复过程
ORA-600[4194]/[4193]解决
使用rman找回被误删除表空间
记录一次系统回滚段坏块恢复
使用bbed解决ORA-00600[2662]
ORA-00600[kcfrbd_3]故障解决
数据库启动ORA-08103故障恢复
asm disk header 彻底损坏恢复
数据库恢复遭遇ORA-00600[3705]
Oracle 11g丢失access$恢复方法
undo segment header坏块异常恢复
ORA-600 kghstack_free2异常恢复
ORA-00600[kccpb_sanity_check_2]
重建控制文件引发ORA-00218故障
dul支持ORACLE 12C CDB数据库恢复
ORACLE 8.0.5 ORA-01207故障恢复
dul 10 export_mode=true功能增强
完美解决dul处理clob字段乱码问题
手工修复ASM DISK HEADER 异常
undo坏块导致数据库异常终止案
bbed 恢复 GLOBAL_NAME 为空故障
通过bbed解决ORA-00600[4000]案例
重建控制文件丢失数据文件导致悲剧
异常断电导致current redo损坏处理
创建控制文件遭遇ORA-600 kccscf_1
使用bbed修复损坏datafile header
通过修改控制文件scn推进数据库scn
bbed打开丢失部分system数据文件库
某集团ebs数据库redo undo丢失导致悲剧
obj$坏块exp/expdp导出不能正常执行
处理fast_recovery_area无剩余空间案例
obj$坏块情况下exp导出单个表解决方案
ORA-00600 [ktbdchk1: bad dscn] 解决
记录因磁盘头被重写,抢救redo恢复经历
rac redo log file被意外覆盖数据库恢复
使用bbed让rac中的sysaux数据文件online
通过bbed修改回滚段状态解决ORA-00704故障
处理smon清理临时段导致数据库异常案例
使用DUL挖数据文件恢复非数据外对象方法
一次侥幸的OSD-04016 O/S-Error异常恢复
记录一次ORA-600 4000数据库故障恢复
一起ORA-600 3020故障恢复的大体思路
redo异常 ORA-600 kclchkblk_4 故障恢复
Oracle 12C的第一次异常恢复—文件头坏块
数据库启动报ORA-00704 ORA-39714错误解决
ORACLE 8.1.7 数据库ORA-600 4000故障恢复
数据库报ORA-00607/ORA-00600[4194]错误
创建控制文件遭遇ORA-00600[3753]故障解决
ORA-00600[kcbshlc_1]导致数据库 down 案例
ORACLE 8.1.7 数据库ORA-600 4194故障恢复
ORA-00600[kcrf_resilver_log_1]异常恢复
控制文件异常导致ORA-00600[kccsbck_first]
分享一次ORA-01113 ORA-01110故障处理过程
记录一次ORA-600 3004 恢复过程和处理思路
Oracle安全警示录:加错裸设备导致redo异常
ORA-600 kcratr_nab_less_than_odr故障解决
双机mount数据库出现ORA-00600[kccsbck_first]
又一起存储故障导致ORA-00333 ORA-00312恢复
通过bbed模拟ORA-00607/ORA-00600 4194 故障
记录一次ORA-00316 ORA-00312 redo异常恢复
asmlib异常报ORA-00600[kfklLibFetchNext00]
某个pdb可以在root pdb open状态下进行恢复
使用bbed解决ORA-00607/ORA-00600[4194]故障
乱用_allow_resetlogs_corruption参数导致悲剧
遭遇ORA-07445[kkdliac()+346]使用odu抢救数据
ORA-27069: skgfdisp: 尝试在文件范围外执行 I/O
创建控制文件出现ORA-01565 ORA-27041 OSD-04002
记录一次system表空间坏块(ORA-01578)数据库恢复
模拟基表事务未提交数据库crash,undo丢失恢复异常恢复
重建控制文件丢失undo异常恢复—ORA-01173模拟与恢复
修改props$.NLS_CHARACTERSET导致ORA-00900异常恢复
ORA-600[2037]与ORA-07445[kcbs_dump_adv_state]错误
误drop tablespace后使用flashback database闪回异常处理
数据库恢复历史再次刷新到Oracle 7.3.2版本—redo异常恢复
ORA-27086: skgfglk: unable to lock file – already in use
ORACLE 12C ORA-07445[ktuHistRecUsegCrtMain()+1173]恢复
ORA-00600[kcratr1_lostwrt]/ORA-00600[3020]错误恢复
表空间online出现ORA-00600[kcbz_check_objd_typ]处理过程
_allow_resetlogs_corruption和adjust_scn解决ORA-01190
spfile被覆盖导致ORA-600[kmgs_parameter_update_timeout_1]
重建控制文件丢失undo异常恢复—ORA-600 25025模拟与恢复
使用_allow_resetlogs_corruption打开无归档日志rman备份库
使用_allow_resetlogs_corruption导致ORA-00704/ORA-01555故障
记录一次ORA-600 kccpb_sanity_check_2和ORA-600 kcbgtcr_13 错误恢复
ORA-00600[17182],ORA-00600[25027],ORA-00600[kghfrempty:ds]故障处理
因RAC的undo_management参数不一致导致数据库mount报ORA-01105 ORA-01606
使用bbed解决ORA-01178 file N created before last CREATE CONTROLFILE, cannot recreate
通过多次resetlogs规避类似ORA-01248: file N was created in the future of incomplete recovery错误
bootstrap$核心index(I_OBJ1,I_USER1,I_FILE#_BLOCK#,I_IND1,I_TS#,I_CDEF1等)异常恢复—ORA-00701错误解决
记录一次ORA-00600[kdxlin:psno out of range]/ORA-00600[3020]/ORA-00600[4000]/ORA-00600[4193]的数据库恢复

当你的数据库因为异常断电,强制关机,硬盘故障,drop表,truncate表,delete表,dmp文件异常,asm无法正常mount等故障无法解决导致数据丢失,且无法自行解决,请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971    Q Q:107644445    E-Mail:dba@xifenfei.com

通过多次resetlogs规避类似ORA-01248: file N was created in the future of incomplete recovery错误

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:通过多次resetlogs规避类似ORA-01248: file N was created in the future of incomplete recovery错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库现状
控制文件
recover_xifenfei0
控制文件中数据文件信息
recover_xifenfei1
数据文件头信息
recover_xifenfei2
redo信息
recover_xifenfei3
根据当前数据库恢复检查脚本(Oracle Database Recovery Check)收集的信息,数据库的是非归档状态,而且redo已经覆盖,数据库datafile 5 无法直接online.遇到这样情况,可以使用bbed修改文件头scn实现online(使用bbed让rac中的sysaux数据文件online),也可以通过使用_allow_resetlogs_corruption等隐含参数实现online.本恢复案例中有180个数据文件,160个offline,然后open数据库,所以大量数据文件无法正常online,bbed工作量太大.在恢复过程中不幸遇到ORA-01248

数据库resetlogs出现ORA-01248错误

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01248: file 5 was created in the future of incomplete recovery
ORA-01110: data file 5: 'F:\TTDATA\PUBRTS.DAT'

alert日志记录

Fri Oct 10 15:09:26 2014
alter database open resetlogs
Fri Oct 10 15:09:26 2014
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
ORA-1248 signalled during: alter database open resetlogs...
Fri Oct 10 15:15:22 2014
alter database open
Fri Oct 10 15:15:22 2014
ORA-1589 signalled during: alter database open...
Fri Oct 10 15:15:30 2014
alter database  open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
ORA-1248 signalled during: alter database  open resetlogs...

尝试offline文件然后resetlogs

SQL>ALTER DATABASE DATAFILE 5  OFFLINE;
Database altered.
sql>ALTER DATABASE OPEN RESETLOGS;
ERROR at line 1:
ORA-01245: ffline file 5 will be lost if resetlogs is done
ORA-01110: data file 5: 'F:\TTDATA\PUBRTS.DAT'

alert日志

Fri Oct 10 15:19:37 2014
ALTER DATABASE DATAFILE 5 offline
Fri Oct 10 15:19:37 2014
Completed: ALTER DATABASE DATAFILE 5 offline
Fri Oct 10 15:19:40 2014
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
ORA-1245 signalled during: alter database open resetlogs...

出现该错误原因是由于数据库是非归档模式,offline数据文件需要使用offline drop

Fri Oct 10 15:22:16 2014
alter database datafile 5 offline drop
Fri Oct 10 15:22:17 2014
Completed: alter database datafile 5 offline drop
Fri Oct 10 15:23:13 2014
alter database open resetlogs
Fri Oct 10 15:23:14 2014
Fri Oct 10 15:23:49 2014
RESETLOGS after complete recovery through change 1422423346
Resetting resetlogs activation ID 3503292347 (0xd0cfffbb)
Fri Oct 10 15:24:01 2014
Setting recovery target incarnation to 3
Fri Oct 10 15:24:04 2014
Assigning activation ID 3649065262 (0xd980512e)
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=23, OS id=3772
Fri Oct 10 15:24:04 2014
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=24, OS id=3668
Fri Oct 10 15:24:05 2014
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\CLTTDB\REDO01.LOG
Successful open of redo thread 1
Fri Oct 10 15:24:05 2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Oct 10 15:24:05 2014
ARC0: STARTING ARCH PROCESSES
ARC2: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
ARC0: Becoming the 'no FAL' ARCH
ARC2 started with pid=25, OS id=636
Fri Oct 10 15:24:06 2014
ARC0: Becoming the 'no SRL' ARCH
Fri Oct 10 15:24:06 2014
ARC1: Becoming the heartbeat ARCH
Fri Oct 10 15:24:06 2014
SMON: enabling cache recovery
Fri Oct 10 15:24:07 2014
Successfully onlined Undo Tablespace 1.
Dictionary check beginning
File #5 is offline, but is part of an online tablespace.
data file 5: 'F:\TTDATA\PUBRTS.DAT'
Dictionary check complete
Fri Oct 10 15:24:19 2014
SMON: enabling tx recovery
Fri Oct 10 15:24:19 2014
Database Characterset is ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=26, OS id=868
Fri Oct 10 15:24:21 2014
LOGSTDBY: Validating controlfile with logical metadata
Fri Oct 10 15:24:21 2014
LOGSTDBY: Validation complete
Completed: alter database open resetlogs

open成功后,再次resetlogs库,实现数据文件online

Fri Oct 10 15:28:44 2014
ALTER DATABASE DATAFILE 5 online
Fri Oct 10 15:28:44 2014
Completed: ALTER DATABASE DATAFILE 5 online
Fri Oct 10 15:31:46 2014
alter database open resetlogs
Fri Oct 10 15:31:46 2014
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
Setting recovery target incarnation to 4
Fri Oct 10 15:32:00 2014
Assigning activation ID 3649091231 (0xd980b69f)
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=23, OS id=700
Fri Oct 10 15:32:00 2014
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=24, OS id=3360
Fri Oct 10 15:32:01 2014
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\CLTTDB\REDO01.LOG
Successful open of redo thread 1
Fri Oct 10 15:32:01 2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Oct 10 15:32:01 2014
ARC0: STARTING ARCH PROCESSES
ARC2: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
ARC0: Becoming the 'no FAL' ARCH
ARC2 started with pid=25, OS id=2016
Fri Oct 10 15:32:02 2014
ARC0: Becoming the 'no SRL' ARCH
Fri Oct 10 15:32:02 2014
ARC1: Becoming the heartbeat ARCH
Fri Oct 10 15:32:02 2014
SMON: enabling cache recovery
Fri Oct 10 15:32:03 2014
Successfully onlined Undo Tablespace 1.
Dictionary check beginning
Fri Oct 10 15:32:15 2014
Dictionary check complete
Fri Oct 10 15:32:15 2014
SMON: enabling tx recovery
Fri Oct 10 15:32:15 2014
Database Characterset is ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=26, OS id=256
Fri Oct 10 15:32:17 2014
LOGSTDBY: Validating controlfile with logical metadata
Fri Oct 10 15:32:17 2014
LOGSTDBY: Validation complete
Completed: alter database open resetlogs

ORA-600 2663 故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-600 2663 故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

朋友数据库启动遭遇ORA-00600[2663]

Mon Sep 22 19:24:20 2014
Thread 1 advanced to log sequence 17 (thread open)
Thread 1 opened at log sequence 17
  Current log# 17 seq# 17 mem# 0: /u02/orayali2/redo17.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon Sep 22 19:24:20 2014
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/orayali2/orayali2/trace/orayali2_ora_20722.trc  (incident=336180):
ORA-00600: internal error code, arguments: [2663], [13], [3140023138], [13], [3141216403], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orayali2/orayali2/incident/incdir_336180/orayali2_ora_20722_i336180.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/orayali2/orayali2/trace/orayali2_ora_20722.trc:
ORA-00600: internal error code, arguments: [2663], [13], [3140023138], [13], [3141216403], [], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/orayali2/orayali2/trace/orayali2_ora_20722.trc:
ORA-00600: internal error code, arguments: [2663], [13], [3140023138], [13], [3141216403], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 20722): terminating the instance due to error 600
Instance terminated by USER, pid = 20722
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (20722) as a result of ORA-1092
Mon Sep 22 19:24:24 2014
ORA-1092 : opitsk aborting process

ORA-600[2663]与常见的ORA-600[2662]类似,都是由于block的scn大于文件头的scn导致,只不过错误的对象不一样而已.对于该类问题,我们的处理方法一般就是简单的推scn,但是这个库比较特殊11.2.0.3.5版本,一般方法无法推scn,因为收集操作日志有限,贴出核心操作步骤

[oracle@orayali2 OPatch]$ uname -a
Linux orayali2 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
[oracle@orayali2 OPatch]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Mon Sep 22 19:09:18 2014
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup mount
ORACLE instance started.
Total System Global Area 1.6535E+10 bytes
Fixed Size                  2244792 bytes
Variable Size            9898561352 bytes
Database Buffers         6610223104 bytes
Redo Buffers               24256512 bytes
Database mounted.
SQL> oradebug setmypid
Statement processed.
SQL>  oradebug DUMPvar SGA kcsgscn_
kcslf kcsgscn_ [060019598, 0600195C8) = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 60019278 00000000
SQL> oradebug poke 0x060019598 8 0x0000000000000040
BEFORE: [060019598, 0600195A0) = 00000000 00000000
AFTER:  [060019598, 0600195A0) = 00000040 00000000
SQL> oradebug DUMPvar SGA kcsgscn_
kcslf kcsgscn_ [060019598, 0600195C8) = 00000040 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 60019278 00000000
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-30012: undo tablespace 'SYSTEM' does not exist or of wrong type
Process ID: 21174
Session ID: 1563 Serial number: 3

现在错误已经改变,而是出现了ORA-30012的错误

alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 31 processes
Started redo scan
Completed redo scan
 read 4 KB redo, 0 data blocks need recovery
Started redo application at
 Thread 1: logseq 17, block 2, scn 58974597984
Recovery of Online Redo Log: Thread 1 Group 17 Seq 17 Reading mem 0
  Mem# 0: /u02/orayali2/redo17.log
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 17, block 3, scn 58974617986
 0 data blocks read, 0 data blocks written, 4 redo k-bytes read
Mon Sep 22 19:30:05 2014
Thread 1 advanced to log sequence 18 (thread open)
Thread 1 opened at log sequence 18
  Current log# 18 seq# 18 mem# 0: /u02/orayali2/redo18.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon Sep 22 19:30:05 2014
SMON: enabling cache recovery
Undo initialization errored: err:30012 serial:0 start:1143146928 end:1143147338 diff:410 (4 seconds)
Errors in file /u01/app/oracle/diag/rdbms/orayali2/orayali2/trace/orayali2_ora_21174.trc:
ORA-30012: undo tablespace 'SYSTEM' does not exist or of wrong type
Errors in file /u01/app/oracle/diag/rdbms/orayali2/orayali2/trace/orayali2_ora_21174.trc:
ORA-30012: undo tablespace 'SYSTEM' does not exist or of wrong type
Error 30012 happened during db open, shutting down database
USER (ospid: 21174): terminating the instance due to error 30012
Instance terminated by USER, pid = 21174
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (21174) as a result of ORA-1092
Mon Sep 22 19:30:08 2014
ORA-1092 : opitsk aborting process

猜测原因是undo设置有问题导致,检查果然发现undo_management=auto,而undo_tablespace=SYSTEM

SQL> startup mount
ORACLE instance started.
Total System Global Area 1.6535E+10 bytes
Fixed Size                  2244792 bytes
Variable Size            9898561352 bytes
Database Buffers         6610223104 bytes
Redo Buffers               24256512 bytes
Database mounted.
SQL> show parameter undo;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
undo_management                      string      AUTO
undo_retention                       integer     10800
undo_tablespace                      string      SYSTEM
SQL> alter system set undo_management=manual scope=spfile;
System altered.
SQL> shutdown immediate;
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 1.6535E+10 bytes
Fixed Size                  2244792 bytes
Variable Size            9898561352 bytes
Database Buffers         6610223104 bytes
Redo Buffers               24256512 bytes
Database mounted.
Database opened.

解决该问题修改undo_management=manual即可

某集团ebs数据库redo undo丢失导致悲剧

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:某集团ebs数据库redo undo丢失导致悲剧

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

某集团的ebs系统因磁盘空间不足把redo和undo存放到raid 0之上,而且该库无任何备份。最终悲剧发生了,raid 0异常导致redo undo全部丢失,数据库无法正常启动(我接手之时数据库已经resetlogs过,但是未成功)

Sun Jul 27 11:31:27 2014
SMON: enabling cache recovery
SMON: enabling tx recovery
Sun Jul 27 11:31:27 2014
Database Characterset is ZHS16GBK
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_454754.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 42 cannot be read at this time
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_454754.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 42 cannot be read at this time
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_454754.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 42 cannot be read at this time
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_663670.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 41 cannot be read at this time
ORA-01110: data file 41: '/prod/oracle/PROD/logdata/undo/undo2.dbf'
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 663670
ORA-1092 signalled during: ALTER DATABASE OPEN...

查询相关文件状态发现,undo表空间文件丢失,被offline处理
df_status
因为以前alert日志被清理,通过这里大概猜测是offline丢失的undo文件,然后resetlogs了数据库,现在处理方式为
使用_corrupted_rollback_segments屏蔽回滚段,然后尝试启动数据库

Tue Jul 29 11:40:39 2014
SMON: enabling cache recovery
SMON: enabling tx recovery
Tue Jul 29 11:40:39 2014
Database Characterset is ZHS16GBK
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_569378.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_569378.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_569378.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_585786.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 585786
ORA-1092 signalled during: alter database open...

该错误是由于数据库启动需要找到对应的回滚段,但是由于undo异常导致该回滚段无法找到,因此出现该错误,解决方法是通过修改数据scn,让其不找回滚段,从而屏蔽该错误.数据库启动后,删除undo重新创建新undo

Tue Jul 29 15:59:22 2014
drop tablespace undo2 including contents and datafiles
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01122: database file 41 failed verification check
ORA-01110: data file 41: '/prod/oracle/PROD/logdata/undo/undo2.dbf'
ORA-01565: error in identifying file '/prod/oracle/PROD/logdata/undo/undo2.dbf'
ORA-27037: unable to obtain file status
IBM AIX RISC System/6000 Error: 2: No such file or directory
Additional information: 3
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01259: unable to delete datafile /prod/oracle/PROD/logdata/undo/undo2.dbf
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01122: database file 42 failed verification check
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
ORA-01565: error in identifying file '/prod/oracle/PROD/logdata/undo/undo1.dbf'
ORA-27037: unable to obtain file status
IBM AIX RISC System/6000 Error: 2: No such file or directory
Additional information: 3
ORA-01259: unable to delete datafile /prod/oracle/PROD/logdata/undo/undo2.dbf
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01259: unable to delete datafile /prod/oracle/PROD/logdata/undo/undo1.dbf
Tue Jul 29 15:59:23 2014
Completed: drop tablespace undo2 including contents and datafiles
Tue Jul 29 15:59:56 2014
create undo tablespace undotbs1 datafile '/prod/oracle/PROD/logdata/undo_new01.dbf' size 100M autoextend on next 128M maxsize 30G
Tue Jul 29 15:59:57 2014
Completed: create undo tablespace undotbs1 datafile '/prod/oracle/PROD/logdata/undo_new01.dbf' size 100M autoextend on next 128M maxsize 30G
Tue Jul 29 16:00:03 2014
alter tablespace undotbs1 add datafile '/prod/oracle/PROD/logdata/undo_new02.dbf' size 100M autoextend on next 128M maxsize 30G
Completed: alter tablespace undotbs1 add datafile '/prod/oracle/PROD/logdata/undo_new02.dbf' size 100M autoextend on next 128M maxsize 30G

业务运行过程中,数据库报大量ORA-600 4097,ORA-600 kdsgrp1,ORA-600 kcfrbd_3错误

Tue Jul 29 16:07:03 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_950484.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:07:06 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_950484.trc:
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Tue Jul 29 16:10:06 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_917702.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:10:07 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_917702.trc:
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Tue Jul 29 16:12:45 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_m000_880692.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:21:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:21:37 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:21:56 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:22:18 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:22:28 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1105950.trc:
ORA-00600: 内部错误代码, 参数: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:22:33 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1159232.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [42], [61235], [1], [12800], [12800], [], []

出现该错误有几个原因和解决方法:
ORA-600 kdsgrp1 是因为相关坏块引起(tab,index,memory,cr block等),结合日志分析对象异常原因,根据具体情况确定对象然后选择合适处理方案(具体参考NOTE:1332252.1)
ORA-600 4097 由于数据库异常关闭然后open,创建回滚段,可能触发bug导致该问题(虽然说在当前版本修复,但是实际处理我确实按照NOTE:1030620.6解决)
ORA-600 kcfrbd_3 有事务的block被访问之后,根据回滚槽信息定位到相关回滚段,而正好新建的回滚段信息又和以前的名字编号一致,从而反馈出来是数据文件大小不够,从而出现该错误(具体参考NOTE:601798.1)
最终该数据库虽然恢复了,抢救了大量数据,但是对于ebs系统来说,丢失redo和undo数据的损失还是巨大的.再次温馨提示:数据库的redo,undo也很重要,数据库的备份更加重要

undo异常总结和恢复思路

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:undo异常总结和恢复思路

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

UNDO异常报错千奇百怪,针对本人遇到的比较常见的undo异常报错进行汇总,仅供参考,数据库恢复过程是千奇百怪的,不能照搬硬套.
ORA-00704/ORA-00376
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: ‘/u01/oracle/oradata/ora11g/undotbs01.dbf’
Error 704 happened during db open, shutting down database
USER (ospid: 17864): terminating the instance due to error 704
Instance terminated by USER, pid = 17864
ORA-1092 signalled during: alter database open…
opiodr aborting process unknown ospid (17864) as a result of ORA-1092

ORA-00600[4097]
Fri Aug 31 23:14:10 2012
Errors in file /u01/oradata/orcl/bdump/orcl_smon_15589.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Fri Aug 31 23:14:12 2012
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.
SMON encountered 1 out of maximum 100 non-fatal internal errors.

ORA-01595/ORA-00600[4194]
Fri Aug 31 23:14:14 2012
Errors in file /u01/oradata/orcl/bdump/orcl_smon_15589.trc:
ORA-01595: error freeing extent (2) of rollback segment (4))
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [48], [34], [], [], [], [], []

0RA-00600[4193]
Tue Feb 14 09:35:34 2012
Errors in file d:\oracle\product\10.2.0\admin\interlib\udump\interlib_ora_2824.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [4193], [2005], [2008], [], [], [], [], []

ORA-00600[kcfrbd_3]
Wed Dec 05 10:26:35 2012
SMON: enabling tx recovery
Wed Dec 05 10:26:35 2012
Database Characterset is ZHS16GBK
Wed Dec 05 10:26:35 2012
Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl_smon_548.trc:
ORA-00600: internal error code, arguments: [kcfrbd_3], [2], [2279045], [1], [2277120], [2277120], [], []
SMON: terminating instance due to error 474

ORA-00600[4137]
Fri Jul 6 18:00:40 2012
SMON: ignoring slave err,downgrading to serial rollback
Fri Jul 6 18:00:41 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_smon_16636.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
ORACLE Instance techdb (pid = 8) – Error 600 encountered while recovering transaction (3, 17).

ORA-01595/ORA-01594
Sat May 12 21:54:17 2012
Errors in file /oracle/app/admin/prmdb/bdump/prmdb2_smon_483522.trc:
ORA-01595: error freeing extent (2) of rollback segment (19))
ORA-01594: attempt to wrap into rollback segment (19) extent (2) which is being freed

ORA-00704/ORA-01555
Fri May 4 21:04:21 2012
select ctime, mtime, stime from obj$ where obj# = :1
Fri May 4 21:04:21 2012
Errors in file /oracle/admin/standdb/udump/perfdb_ora_1286288.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 40 with name “_SYSSMU40$” too small
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 1286288
ORA-1092 signalled during: alter database open resetlogs…

ORA-00607/ORA-00600[4194]
Block recovery completed at rba 3994.5.16, scn 0.89979533
Thu Jul 26 13:21:11 2012
Errors in file /orasvr/admin/mispdata/udump/mispdata_ora_2865.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [31], [2], [], [], [], [], []
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 2865
ORA-1092 signalled during: ALTER DATABASE OPEN…

ORA-00704/ORA-00600[4000]
Thu Feb 28 19:29:13 2013
Errors in file /u1/PROD/prodora/db/tech_st/10.2.0/admin/PROD_oracle/udump/prod_ora_20989.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [50], [], [], [], [], [], []
Thu Feb 28 19:29:13 2013
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 20989
ORA-1092 signalled during: ALTER DATABASE OPEN RESETLOGS…

undo异常恢复处理思路
除了极少数undo坏块,undo文件丢失外,大部分undo异常是因为redo未被正常进行前滚,从而导致undo回滚异常数据库无法open,解决此类问题,需要结合一般需要结合redo异常处理技巧在其中,一般undo异常处理思路
1.切换undo_management= MANUAL尝试启动数据库,如果不成功进入2
2.设置10513 等event尝试启动数据库,如果不成功进入3
3.使用_offline_rollback_segments/_corrupted_rollback_segments屏蔽回滚段
4.如果依然不能open数据库,考虑使用bbed工具提交事务,修改回滚段状态等操作
5.如果依然还不能open数据库,考虑使用dul

如果您按照上述步骤还不能解决,请联系我们,将为您提供专业数据库技术支持
Phone:17813235971    Q Q:107644445    E-Mail:dba@xifenfei.com

姊妹篇
ORACLE REDO各种异常恢复
ORACLE丢失各种文件导致数据库不能OPEN恢复