虚拟机故障引起ORA-00310 ORA-00334故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:虚拟机故障引起ORA-00310 ORA-00334故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户由于硬件底层问题,导致运行在虚拟机环境中的oracle数据库突然爆大量错误

Reread (file 5, block 2371528) found same corrupt data (no logical check)
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_j000_10927.trc  (incident=397049):
ORA-01578: ORACLE data block corrupted (file # 5, block # 2371528)
ORA-01110: data file 5: '/home/oracle/app/oradata/users01.dbf'

Wed Apr 02 23:10:24 2025
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_j000_10927.trc  (incident=397050):
ORA-00600: internal error code, arguments: [5400], [], [], [], [], [], [], [], [], [], [], []

Wed Apr 02 23:15:29 2025
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_11605.trc  (incident=397075):
ORA-00600: internal error code, arguments: [ktbdchk1: bad dscn], [], [], [], [], [], [], [], [], [], [], []

Wed Apr 02 23:20:32 2025
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_11530.trc  (incident=397034):
ORA-00600: internal error code, arguments: [25027], [6], [196610], [], [], [], [], [], [], [], [], []

Wed Apr 02 23:20:52 2025
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_11528.trc  (incident=397027):
ORA-00600: internal error code, arguments: [ktspfpblk:kcbz_objdchk], [0], [0], [1], [], [], [], [], [], [], [], []

Wed Apr 02 23:22:53 2025
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_11609.trc  (incident=397082):
ORA-00600: internal error code, arguments: [6002], [6], [189], [1], [0], [], [], [], [], [], [], []

Wed Apr 02 23:26:41 2025
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_m000_11966.trc  (incident=397035):
ORA-00600: internal error code, arguments: [dbgrmblur_update_range_1], [11], [6], [], [], [], [], [], [], [], [], []

Wed Apr 02 23:31:47 2025
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_j000_10927.trc:
ORA-12012: error on auto execute of job "SYS"."ORA$AT_SA_SPC_SY_49685"
ORA-08102: index key not found, obj# 39, file 1, block 55190 (2)

Thu Apr 03 00:15:18 2025
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x8] [PC:0xB9EC41, ksuloget()+421] [flags: 0x0, count: 1]
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_m000_12633.trc  (incident=400879):
ORA-07445: exception encountered:core dump [ksuloget()+421][SIGSEGV][ADDR:0x8][PC:0xB9EC41][Address not mapped to object]

Thu Apr 03 00:15:23 2025
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_pmon_4097.trc  (incident=396817):
ORA-00600: internal error code, arguments: [1100], [0x2E3947E78], [0x2E3947E78], [], [], [], [], [], [], [], [], []

数据库crash掉之后,处理好硬件环境和虚拟机启动之后,数据库直接启动失败,报ORA-01172 ORA-01151

Beginning crash recovery of 1 threads
Started redo scan
Completed redo scan
 read 29239 KB redo, 4020 data blocks need recovery
Started redo application at
 Thread 1: logseq 211603, block 9107
Recovery of Online Redo Log: Thread 1 Group 4 Seq 211603 Reading mem 0
  Mem# 0: /home/oracle/app/oradata/orcl/redo04.log
  Mem# 1: /home/oracle/app/oradata/orcl/redo041.log
Hex dump of (file 2, block 4835) in trace file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_19174.trc
Reading datafile '/home/oracle/app/oradata/orcl/sysaux01.dbf' for corruption at rdba: 0x008012e3 (file 2, block 4835)
Reread (file 2, block 4835) found same corrupt data (logically corrupt)
RECOVERY OF THREAD 1 STUCK AT BLOCK 4835 OF FILE 2
Aborting crash recovery due to error 1172
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_19174.trc:
ORA-01172: recovery of thread 1 stuck at block 4835 of file 2
ORA-01151: use media recovery to recover block, restore backup if needed
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_19174.trc:
ORA-01172: recovery of thread 1 stuck at block 4835 of file 2
ORA-01151: use media recovery to recover block, restore backup if needed
ORA-1172 signalled during: ALTER DATABASE OPEN...

然后再次尝试重启提示ORA-01113 ORA-01110

Fri Apr 04 09:34:36 2025
ALTER DATABASE OPEN
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_4076.trc:
ORA-01113: file 5 needs media recovery
ORA-01110: data file 5: '/home/oracle/app/oradata/users01.dbf'
ORA-1113 signalled during: ALTER DATABASE OPEN...

可以自行尝试了各种恢复,比如using backup controlfile,until cancel,rectl等操作,数据库均为open成功,基本上都是卡在类似如下报ORA-00310 ORA-00334错

Sat Apr 05 10:17:34 2025
ALTER DATABASE RECOVER  database using backup controlfile   
Media Recovery Start
 started logmerger process
Sat Apr 05 10:17:34 2025
WARNING! Recovering data file 1 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 2 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 3 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 4 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 5 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 6 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 7 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 8 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 9 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 10 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 11 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 12 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 13 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 14 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 15 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
Parallel Media Recovery started with 28 slaves
ORA-279 signalled during: ALTER DATABASE RECOVER  database using backup controlfile   ...
Sat Apr 05 10:17:59 2025
ALTER DATABASE RECOVER    LOGFILE '/home/oradata/redo02.log'  
Media Recovery Log /home/oradata/redo02.log
Sat Apr 05 10:17:59 2025
Errors with log /home/oradata/redo02.log
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_pr00_12141.trc:
ORA-00310: archived log contains sequence 211550; sequence 211603 required
ORA-00334: archived log: '/home/oradata/redo02.log'
ORA-310 signalled during: ALTER DATABASE RECOVER    LOGFILE '/home/oradata/redo02.log'  ...
ALTER DATABASE RECOVER CANCEL 
Media Recovery Canceled
Completed: ALTER DATABASE RECOVER CANCEL 

基于上述情况,数据库由于底层异常,导致所需要的redo和实际存在的redo文件内容不匹配,只能屏蔽一致性强制打开库

SQL> alter database open resetlogs ;
alter database open resetlogs 
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [0], [1685409503], [0], [1685415469], [12583040], []
ORA-00600: internal error code, arguments: [2662], [0], [1685409502], [0], [1685415469], [12583040], []
ORA-01092: ORACLE instance terminated. Disconnection force
ORA-00600: internal error code, arguments: [2662], [0], [1685409498], [0], [1685415469], [12583040], []
Process ID: 10637
Session ID: 645 Serial number: 7

ORA-600 2662这个错误比较常见,通过修改数据库scn,进行规避然后尝试打开库

Sat Apr 05 10:31:45 2025
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 1685409495
Resetting resetlogs activation ID 1725417463 (0x66d7c7f7)
Sat Apr 05 10:31:46 2025
Setting recovery target incarnation to 2
Initializing SCN for created control file
Database SCN compatibility initialized to 3
Warning - High Database SCN: Current SCN value is 1685409498, threshold SCN value is 0
Sat Apr 05 10:31:46 2025
Assigning activation ID 1725412798 (0x66d7b5be)
Thread 1 opened at log sequence 1
  Current log# 2 seq# 1 mem# 0: /home/oradata/redo02.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Apr 05 10:31:46 2025
SMON: enabling cache recovery
Undo initialization finished serial:0 start:61632504 end:61632514 diff:10 (0 seconds)
Dictionary check beginning
Tablespace 'TEMP' #3 found in data dictionary,
but not in the controlfile. Adding to controlfile.
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
*********************************************************************
WARNING: The following temporary tablespaces contain no files.
         This condition can occur when a backup controlfile has
SMON: enabling tx recovery
         been restored.  It may be necessary to add files to these
         tablespaces.  That can be done using the SQL statement:
 
         ALTER TABLESPACE <tablespace_name> ADD TEMPFILE
 
         Alternatively, if these temporary tablespaces are no longer
         needed, then they can be dropped.
           Empty temporary tablespace: TEMP
*********************************************************************
Database Characterset is AL32UTF8
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_smon_22927.trc  (incident=8145):
ORA-00600: internal error code, arguments: [4137], [9.1.436887], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oorcl/oorcl/incident/incdir_8145/oorcl_smon_22927_i8145.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Stopping background process MMNL
ORACLE Instance oorcl (pid = 17) - Error 600 encountered while recovering transaction (9, 1).
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_smon_22927.trc:
ORA-00600: internal error code, arguments: [4137], [9.1.436887], [0], [0], [], [], [], [], [], [], [], []
Sat Apr 05 10:31:46 2025
Sweep [inc][8145]: completed
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_smon_22927.trc  (incident=8146):
ORA-00600: internal error code, arguments: [4137], [9.1.436887], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oorcl/oorcl/incident/incdir_8146/oorcl_smon_22927_i8146.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sat Apr 05 10:31:46 2025
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_p054_2643.trc  (incident=8625):
ORA-00600: internal error code, arguments: [kturbleurec1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oorcl/oorcl/incident/incdir_8625/oorcl_p054_2643_i8625.trc
Sat Apr 05 10:31:46 2025
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_p034_2603.trc  (incident=8465):
ORA-00600: internal error code, arguments: [kturbleurec1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oorcl/oorcl/incident/incdir_8465/oorcl_p034_2603_i8465.trc
replication_dependency_tracking turned off (no async multimaster replication found)
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Completed: alter database open resetlogs
Sat Apr 05 10:31:48 2025
Starting background process CJQ0
Sat Apr 05 10:31:48 2025
CJQ0 started with pid=80, OS id=2852 
SMON: Restarting fast_start parallel rollback
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_smon_22927.trc  (incident=8147):
ORA-00600: internal error code, arguments: [4137], [9.1.436887], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oorcl/oorcl/incident/incdir_8147/oorcl_smon_22927_i8147.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sat Apr 05 10:31:50 2025
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_p000_2535.trc  (incident=8169):
ORA-00600: internal error code, arguments: [4198], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oorcl/oorcl/incident/incdir_8169/oorcl_p000_2535_i8169.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORACLE Instance oorcl (pid = 17) - Error 600 encountered while recovering transaction (9, 1).
Block recovery from logseq 1, block 19 to scn 2147483682
Recovery of Online Redo Log: Thread 1 Group 2 Seq 1 Reading mem 0
  Mem# 0: /home/oradata/redo02.log
Block recovery completed at rba 1.734.16, scn 0.2147483683
Block recovery from logseq 1, block 404 to scn 2147483682
Recovery of Online Redo Log: Thread 1 Group 2 Seq 1 Reading mem 0
  Mem# 0: /home/oradata/redo02.log
Block recovery completed at rba 1.734.16, scn 0.2147483683
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_smon_22927.trc  (incident=8148):
ORA-00600: internal error code, arguments: [4198], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oorcl/oorcl/incident/incdir_8148/oorcl_smon_22927_i8148.trc
Sat Apr 05 10:31:50 2025
Sweep [inc][8147]: completed
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
SMON: Parallel transaction recovery slave got internal error
SMON: Downgrading transaction recovery to serial
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_smon_22927.trc  (incident=8149):
ORA-00600: internal error code, arguments: [4137], [10.28.1201778], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oorcl/oorcl/incident/incdir_8149/oorcl_smon_22927_i8149.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORACLE Instance oorcl (pid = 17) - Error 600 encountered while recovering transaction (10, 28).
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_smon_22927.trc:
ORA-00600: internal error code, arguments: [4137], [10.28.1201778], [0], [0], [], [], [], [], [], [], [], []
Sat Apr 05 10:31:50 2025
Sweep [inc][8149]: completed
Checker run found 1 new persistent data failures
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_smon_22927.trc  (incident=8150):
ORA-00600: internal error code, arguments: [4137], [10.28.1201778], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oorcl/oorcl/incident/incdir_8150/oorcl_smon_22927_i8150.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORACLE Instance oorcl (pid = 17) - Error 600 encountered while recovering transaction (10, 28).
Errors in file /u01/app/oracle/diag/rdbms/oorcl/oorcl/trace/oorcl_smon_22927.trc  (incident=8151):
ORA-00600: internal error code, arguments: [4137], [10.28.1201778], [0], [0], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sat Apr 05 10:31:51 2025
Sweep [inc][8150]: completed
ORACLE Instance oorcl (pid = 17) - Error 600 encountered while recovering transaction (10, 28).

虽然数据库open成功,但是有ORA-600 4137/ORA-600 kturbleurec1/ORA-600 4198等错误,但是这里比较明显的undo有问题,对于异常undo进行处理,然后逻辑导出数据,导入新库完成本次恢复任务

ORACLE异常恢复后awr异常处理

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORACLE异常恢复后awr异常处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有一个通过非常规方法恢复过的客户数据库出现问题awr无法收集统计信息(几个月前非常规方法恢复的库,因为未重建库),不太方便跟踪数据库性能,让其帮忙分析跟踪问题.
人工收集统计信息报错RA-00001: 违反唯一约束条件 (SYS.WRM$_SNAPSHOT_PK)

SQL> execute dbms_workload_repository.create_snapshot();
BEGIN dbms_workload_repository.create_snapshot(); END;
*
第 1 行出现错误:
ORA-13509: 更新 AWR 表时出错
ORA-00001: 违反唯一约束条件 (ORA-00001: 违反唯一约束条件 (SYS.WRM$_SNAPSHOT_PK)
.)
ORA-06512: 在 "SYS.DBMS_WORKLOAD_REPOSITORY", line 99
ORA-06512: 在 "SYS.DBMS_WORKLOAD_REPOSITORY", line 122
ORA-06512: 在 line 1

通过分析trace文件问题如下

Trace file D:\APP\ADMINISTRATOR\diag\rdbms\rac\rac2\trace\rac2_m000_1628.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, Oracle Label Security,
OLAP, Data Mining, Oracle Database Vault and Real Application Testing option
Windows NT Version V6.1 Service Pack 1
CPU                 : 32 - type 8664, 32 Physical Cores
Process Affinity    : 0x0x0000000000000000
Memory (Avail/Total): Ph:83326M/131035M, Ph+PgF:214386M/262068M
Instance name: rac2
Redo thread mounted by this instance: 2
Oracle process number: 61
Windows thread id: 1628, image: ORACLE.EXE (M000)
*** 2015-08-28 11:39:51.967
*** SESSION ID:(2062.93) 2015-08-28 11:39:51.968
*** CLIENT ID:() 2015-08-28 11:39:51.968
*** SERVICE NAME:(SYS$BACKGROUND) 2015-08-28 11:39:51.968
*** MODULE NAME:(MMON_SLAVE) 2015-08-28 11:39:51.968
*** ACTION NAME:(Auto-Flush Slave Action) 2015-08-28 11:39:51.968
*** KEWROCISTMTEXEC - encountered error: (ORA-00001: 违反唯一约束条件 (SYS.WRM$_SNAPSHOT_PK)
)
  *** SQLSTR: total-len=342, dump-len=240,
      STR={insert into   WRM$_SNAPSHOT  (snap_id, dbid, instance_number, startup_time,begin_interval_time,
end_interval_time, snap_level,    status, error_count, bl_moved, snap_flag, snap_timezone)
values   (:snap_id, :dbid, :instance_number, :sta}
*** KEWRAFS: Error=13509 encountered by Auto Flush Slave.

这里可以明确的定位到,由于insert WRM$_SNAPSHOT表之时出现主键冲突导致无法收集统计信息.因为awr的数据都是历史数据,可以全部清理,因此尝试删除掉awr相关数据看是否能够解决问题

对收集快照做10046 跟踪发现

SQL> oradebug setmypid
已处理的语句
SQL> alter session set events '10046 trace name context forever, level 12';
会话已更改。
SQL> oradebug tracefile_name
D:\APP\ADMINISTRATOR\diag\rdbms\rac\rac2\trace\rac2_ora_5944.trc
SQL> execute dbms_workload_repository.create_snapshot();
BEGIN dbms_workload_repository.create_snapshot(); END;
*
第 1 行出现错误:
ORA-13509: 更新 AWR 表时出错
ORA-00001: 违反唯一约束条件 (ORA-00001: 违反唯一约束条件 (SYS.WRM$_SNAPSHOT_PK)
.)
ORA-06512: 在 "SYS.DBMS_WORKLOAD_REPOSITORY", line 99
ORA-06512: 在 "SYS.DBMS_WORKLOAD_REPOSITORY", line 122
ORA-06512: 在 line 1
--trace文件分析
PARSING IN CURSOR #1362260992  lid=0 tim=22781405124 hv=438921370 ad='148fd90590' sqlid='15rbgh4d2ku4u'
insert into WRM$_SNAPSHOT(snap_id, dbid, instance_number, startup_time,begin_interval_time, end_interval_time,
snap_level,status, error_count, bl_moved,snap_flag, snap_timezone)values(:snap_id, :dbid, :instance_number,
:startup_time, :begin_interval_time, :end_interval_time, :snap_level,    :status, 0, 0, :bind1, :bind2)
END OF STMT
PARSE #1362260992:c=0,e=474,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=22781405122
BINDS #1362260992:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0000 frm=00 csi=00 siz=208 off=0
  kxsbbbfp=513a6bf8  bln=22  avl=03  flg=05
  value=9277
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0000 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=513a6c10  bln=22  avl=06  flg=01
  value=2429481020
 Bind#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0000 frm=00 csi=00 siz=0 off=48
  kxsbbbfp=513a6c28  bln=22  avl=02  flg=01
  value=2
 Bind#3
  oacdty=180 mxl=11(11) mxlc=00 mal=00 scl=09 pre=00
  oacflg=00 fl2=8000000 frm=00 csi=00 siz=0 off=72
  kxsbbbfp=513a6c40  bln=11  avl=07  flg=01
  value=28-8月 -15 10.06.53 上午
 Bind#4
  oacdty=180 mxl=11(11) mxlc=00 mal=00 scl=09 pre=00
  oacflg=00 fl2=8000000 frm=00 csi=00 siz=0 off=88
  kxsbbbfp=513a6c50  bln=11  avl=07  flg=01
  value=28-8月 -15 10.06.53 上午
 Bind#5
  oacdty=180 mxl=11(11) mxlc=00 mal=00 scl=09 pre=00
  oacflg=00 fl2=8000000 frm=00 csi=00 siz=0 off=104
  kxsbbbfp=513a6c60  bln=11  avl=11  flg=01
  value=28-8月 -15 04.11.40.017000000 下午
 Bind#6
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0000 frm=00 csi=00 siz=0 off=120
  kxsbbbfp=513a6c70  bln=22  avl=02  flg=01
  value=1
 Bind#7
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0000 frm=00 csi=00 siz=0 off=144
  kxsbbbfp=513a6c88  bln=22  avl=02  flg=01
  value=1
 Bind#8
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0000 frm=00 csi=00 siz=0 off=168
  kxsbbbfp=513a6ca0  bln=22  avl=02  flg=01
  value=1
 Bind#9
  oacdty=183 mxl=11(11) mxlc=00 mal=00 scl=09 pre=09
  oacflg=01 fl2=8000000 frm=00 csi=00 siz=0 off=192
  kxsbbbfp=513a6cb8  bln=11  avl=11  flg=01
  value=Unhandled datatype (183) found in kxsbndinf

这里可以明确定位到,awr在收集信息的时候就是插入的值和库中本身存在的记录冲突,从而出现此类问题

清理awr数据

SQL> select max(snap_id),min(snap_id) from WRM$_SNAPSHOT;
MAX(SNAP_ID) MIN(SNAP_ID)
------------ ------------
        9277         9081
SQL> exec DBMS_WORKLOAD_REPOSITORY.DROP_SNAPSHOT_RANGE(9081,9277);
PL/SQL 过程已成功完成。
SQL>
SQL> select max(snap_id),min(snap_id) from WRM$_SNAPSHOT;
MAX(SNAP_ID) MIN(SNAP_ID)
------------ ------------
        9277         9277
SQL> exec DBMS_WORKLOAD_REPOSITORY.DROP_SNAPSHOT_RANGE(9080,9278);
PL/SQL 过程已成功完成。
SQL> select max(snap_id),min(snap_id) from WRM$_SNAPSHOT;
MAX(SNAP_ID) MIN(SNAP_ID)
------------ ------------
        9277         9277
SQL> delete from  WRM$_SNAPSHOT where snap_id=9277;
delete from  WRM$_SNAPSHOT where snap_id=9277
             *
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [13011], [6653], [8456911], [2], [8456911], [3],
[], [], [], [], [], []
SQL> delete /*+ RULE */ from  WRM$_SNAPSHOT where snap_id=9277;
已删除0行。

这里有几分诡异,snap_id=9277的记录无法清理,而且正常删除报ORA-00600[13011].根据经验,出现该问题,很可能是由于表和index的记录问题

尝试rebuild index

SQL> analyze table WRM$_SNAPSHOT validate structure cascade;
analyze table WRM$_SNAPSHOT validate structure cascade
*
第 1 行出现错误:
ORA-01499: 表/索引交叉引用失败 - 请参阅跟踪文件
SQL> select index_name from dba_indexes where table_name='WRM$_SNAPSHOT';
INDEX_NAME
------------------------------
WRM$_SNAPSHOT_PK
SQL> alter index WRM$_SNAPSHOT_PK rebuild;
索引已更改。
SQL> select /*+ full(t) */ max(snap_id),min(snap_id) from WRM$_SNAPSHOT t;
MAX(SNAP_ID) MIN(SNAP_ID)
------------ ------------
SQL> select max(snap_id),min(snap_id) from WRM$_SNAPSHOT;
MAX(SNAP_ID) MIN(SNAP_ID)
------------ ------------
        9277         9277

这里很明确的定位了,由于表和index的记录不一致,而且通过rebuild,发现index依旧有问题

重建index

SQL> set linesize 180
SQL> set pages 999
SQL> set long 90000
SQL> select dbms_metadata.get_ddl('INDEX','WRM$_SNAPSHOT_PK','SYS') from dual;
DBMS_METADATA.GET_DDL('INDEX','WRM$_SNAPSHOT_PK','SYS')
--------------------------------------------------------------------------------
  CREATE UNIQUE INDEX "SYS"."WRM$_SNAPSHOT_PK" ON "SYS"."WRM$_SNAPSHOT" ("DBID", "SNAP_ID", "INSTANCE_NUMBER")
  PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "SYSAUX"
SQL> DROP INDEX "SYS"."WRM$_SNAPSHOT_PK" ;
DROP INDEX "SYS"."WRM$_SNAPSHOT_PK"
                 *
第 1 行出现错误:
ORA-02429: 无法删除用于强制唯一/主键的索引
SQL> alter table "SYS"."WRM$_SNAPSHOT" drop  constraint "SYS"."WRM$_SNAPSHOT_PK";
alter table "SYS"."WRM$_SNAPSHOT" drop  constraint "SYS"."WRM$_SNAPSHOT_PK"
                                                        *
第 1 行出现错误:
ORA-01735: 无效的 ALTER TABLE 选项
SQL> alter table "WRM$_SNAPSHOT" drop  constraint "WRM$_SNAPSHOT_PK";
表已更改。
SQL>alter table "WRM$_SNAPSHOT" add constraint "WRM$_SNAPSHOT_PK" primary key("DBID", "SNAP_ID", "INSTANCE_NUMBER");
表已更改。

再次尝试做快照

SQL> execute dbms_workload_repository.create_snapshot();
BEGIN dbms_workload_repository.create_snapshot(); END;
*
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [kewrose_1], [600], [ORA-00600: 内部错误代码, 参数: [6002], [6], [104],
[4], [0], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []
ORA-06512: 在 "SYS.DBMS_WORKLOAD_REPOSITORY", line 99
ORA-06512: 在 "SYS.DBMS_WORKLOAD_REPOSITORY", line 122
ORA-06512: 在 line 1

悲剧再次发生,收集快照之时遭遇悲催的ORA-00600[kewrose_1]/ORA-600[6002]的错误.范围awr的以前数据都不要了,也就采用最极端的处理方法,定位到表,然后处理之

继续10046跟踪

PARSING IN CURSOR #1495840456 tim=24328721585 hv=4050667988 ad='146f9948f8'sqlid='84qubbrsr0kfn'
insert into wrh$_latch(snap_id, dbid, instance_number, latch_hash, level#, gets, misses, sleeps,
 immediate_gets,immediate_misses, spin_gets, sleep1, sleep2, sleep3, sleep4, wait_time)
select :snap_id, :dbid, :instance_number, hash, level#, gets,    misses, sleeps,
immediate_gets, immediate_misses, spin_gets,    sleep1, sleep2, sleep3,
sleep4, wait_time  from    v$latch  order by    hash
END OF STMT
PARSE #1495840456:c=0,e=376,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=24328721584
BINDS #1495840456:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0000 frm=00 csi=00 siz=72 off=0
  kxsbbbfp=60471350  bln=22  avl=03  flg=05
  value=9280
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0000 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=60471368  bln=22  avl=06  flg=01
  value=2429481020
 Bind#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0000 frm=00 csi=00 siz=0 off=48
  kxsbbbfp=60471380  bln=22  avl=02  flg=01
  value=2
ORA-00600: 内部错误代码, 参数: [6002], [6], [104], [4], [0], [], [], [], [], [], [], []

通过这里可以定位到问题是发生在wrh$_latch表的insert操作之上

分析并truncate table

SQL> SELECT COUNT(*) FROM wrh$_latch;
COUNT(*)
--------
     0
SQL> truncate table wrh$_latch;
表被截断

再次收集快照信息

SQL> execute dbms_workload_repository.create_snapshot();
PL/SQL 过程已成功完成。
SQL> @?/rdbms/admin/awrrpt.sql
--工作正常

经过一些列处理,终于让awr能够正常工作了,特别是在做过异常恢复之后,awr数据可能有各种问题导致工作不正常,可以考虑重建awr,也可以考虑类似我这样彻底清理awr数据,然后放手处理.当然对于使用非常规方法恢复的Oracle数据库,在条件允许的情况下,建议逻辑方式重建库.因为有数据字典不一致,有逻辑坏块,有表和index不一致等问题,在后续的使用中逐渐被显露出来,从而导致很多麻烦,重建库彻彻底底解决问题.