数据库报ORA-00607/ORA-00600[4194]错误

昨天晚上处理一起比较奇特的ORA-00600[4194]错误的数据库恢复案例,客户数据库刚刚上线,因为一时疏忽没有做备份.谁知天有不测风云,就这样的系统也会出问题(数据库文件总共 5g redo log sequence#=9).这个事故告诉我们:作为dba在任何时候都不要有侥幸心理,备份重于一切
数据库报ORA-00607/ORA-00600[4194]错误

Thu Jul 26 13:21:11 2012
SMON: enabling cache recovery
Thu Jul 26 13:21:11 2012
Errors in file /orasvr/admin/mispdata/udump/mispdata_ora_2865.trc:
ORA-00600: internal error code, arguments: [4194], [31], [2], [], [], [], [], []
Thu Jul 26 13:21:11 2012
Doing block recovery for file 1 block 18
Block recovery from logseq 3994, block 3 to scn 89979535
Thu Jul 26 13:21:11 2012
Recovery of Online Redo Log: Thread 1 Group 1 Seq 3994 Reading mem 0
  Mem# 0: /orasvr/mispdata/redo01.log
Block recovery stopped at EOT rba 3994.5.16
Block recovery completed at rba 3994.5.16, scn 0.89979533
Doing block recovery for file 1 block 9
Block recovery from logseq 3994, block 3 to scn 89979532
Thu Jul 26 13:21:11 2012
Recovery of Online Redo Log: Thread 1 Group 1 Seq 3994 Reading mem 0
  Mem# 0: /orasvr/mispdata/redo01.log
Block recovery completed at rba 3994.5.16, scn 0.89979533
Thu Jul 26 13:21:11 2012
Errors in file /orasvr/admin/mispdata/udump/mispdata_ora_2865.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [31], [2], [], [], [], [], []
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 2865
ORA-1092 signalled during: ALTER DATABASE OPEN...

通过alert日志中,我们可以发现是因为ORA-00600[4194]导致数据库不能被正常open,但是这次不同的是在报ORA-00600之前有ORA-00607的错误出现,根据这个提示,应该是一个基本的数据块有问题导致.而ORA-00600[4194]是因为undo和redo不一致导致.对于本错误放在一起分析,大概的评估是因为内部对象的异常出现ora-607,导致undo和redo不一致出现ORA-00600[4194].

trace文件分析

--dump redo
DUMP OF REDO FROM FILE '/orasvr/mispdata/redo02.log'
 Opcodes *.*
 DBAs (file#, block#):
      (1, 18)
 RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
 SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
 Times: creation thru eternity
 FILE HEADER:
	Compatibility Vsn = 169870080=0xa200300
	Db ID=658120234=0x273a1e2a, Db Name='MISPDATA'
	Activation ID=658142762=0x273a762a
	Control Seq=16668=0x411c, File size=102400=0x19000
	File Number=2, Blksiz=512, File Type=2 LOG
 descrip:"Thread 0001, Seq# 0000003992, SCN 0x0000055c5e3c-0x0000055cac62"
 thread: 1 nab: 0x5 seq: 0x00000f98 hws: 0x6 eot: 0 dis: 0
 resetlogs count: 0x2d42646a scn: 0x0000.00000001 (1)
 resetlogs terminal rcv count: 0x0 scn: 0x0000.00000000
 prev resetlogs count: 0x0 scn: 0x0000.00000000
 prev resetlogs terminal rcv count: 0x0 scn: 0x0000.00000000
 Low  scn: 0x0000.055c5e3c (89939516) 07/26/2012 11:17:42
 Next scn: 0x0000.055cac62 (89959522) 07/26/2012 13:16:19
 Enabled scn: 0x0000.00000001 (1) 08/16/2011 11:50:10
 Thread closed scn: 0x0000.055cac61 (89959521) 07/26/2012 11:17:42
 Disk cksum: 0x3088 Calc cksum: 0x3088
 Terminal recovery stop scn: 0x0000.00000000
 Terminal recovery  01/01/1988 00:00:00
 Most recent redo scn: 0x0000.00000000
 Largest LWN: 0 blocks
 End-of-redo stream : No
 Unprotected mode
 Miscellaneous flags: 0x0
 Thread internal enable indicator: thr: 0, seq: 0 scn: 0x0000.00000000
--ORA-00600错误提示
*** 2012-07-26 13:21:11.566
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4194], [31], [2], [], [], [], [], []
Current SQL statement for this session:
update undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,
xactsqn=:8,scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12,spare1=:13 where us#=:1
--ora-607
Error 607 in redo application callback
TYP:0 CLS:16 AFN:1 DBA:0x00400012 OBJ:4294967295 SCN:0x0000.0551610e SEQ:  1 OP:5.1
ktudb redo: siz: 256 spc: 7892 flg: 0x0012 seq: 0x003d rec: 0x02
            xid:  0x0000.026.00000035
ktubl redo: slt: 38 rci: 0 opc: 11.1 objn: 15 objd: 15 tsn: 0
Undo type:  Regular undo        Begin trans    Last buffer split:  No
Temp Object:  No
Tablespace Undo:  No
             0x00000000  prev ctl uba: 0x00400012.003d.01
prev ctl max cmt scn:  0x0000.0550709b  prev tx cmt scn:  0x0000.0550709c
txn start scn:  0xffff.ffffffff  logon user: 0  prev brb: 4194318  prev bcl: 0 KDO undo record:
KTB Redo
op: 0x04  ver: 0x01
op: L  itl: xid:  0x0000.01e.00000035 uba: 0x00400012.003d.01
                      flg: C---    lkc:  0     scn: 0x0000.05511296
KDO Op code: URP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x0040006a  hdba: 0x00400069
itli: 1  ispac: 0  maxfr: 4863
tabn: 0 slot: 1(0x1) flag: 0x2c lock: 0 ckix: 0
ncol: 17 nnew: 12 size: 0
col  1: [ 9]  5f 53 59 53 53 4d 55 31 24
col  2: [ 2]  c1 02
col  3: [ 2]  c1 03
col  4: [ 2]  c1 0a
col  5: [ 5]  c4 5a 12 5a 14
col  6: [ 1]  80
col  7: [ 4]  c3 08 5f 3d
col  8: [ 4]  c3 02 38 52
col  9: [ 1]  80
col 10: [ 2]  c1 04
col 11: [ 2]  c1 02
col 16: [ 2]  c1 02
Block after image is corrupt:
buffer tsn: 0 rdba: 0x00400012 (1/18)
scn: 0x0000.0551610e seq: 0x01 flg: 0x04 tail: 0x610e0201
frmt: 0x02 chkval: 0x65f8 type: 0x02=KTU UNDO BLOCK

这里信息比较多:
1.dump redo部分得到file 1 block 18块可能异常
2.ora-600部分可以得出数据库在执行undo$对象update的回滚操作时候报错
3.通过ora-607信息得到update undo$记录对应的数据块是file 1 block 106(dba 0x00400069),在相同数据库版本数据库中查询.也就是说undo$这个回滚段回滚的时候出现错误.

SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
  3   WHERE FILE_ID = &FILE_ID
  4     AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
Enter value for file_id: 1
old   3:  WHERE FILE_ID = &FILE_ID
new   3:  WHERE FILE_ID = 1
Enter value for block_id: 106
old   4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
new   4:    AND 106 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
OWNER
------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE       TABLESPACE_NAME                PARTITION_NAME
------------------ ------------------------------ ------------------------------
SYS
UNDO$
TABLE              SYSTEM

4.发现dba 0x00400012发现坏块是file 1 block 18,查询坏块对象为

SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
  3   WHERE FILE_ID = &FILE_ID
  4     AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
Enter value for file_id: 1
old   3:  WHERE FILE_ID = &FILE_ID
new   3:  WHERE FILE_ID = 1
Enter value for block_id: 18
old   4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
new   4:    AND 18 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
OWNER
------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE       TABLESPACE_NAME                PARTITION_NAME
------------------ ------------------------------ ------------------------------
SYS
SYSTEM
ROLLBACK           SYSTEM

通过这里的分析,大概可以确定这次故障的原因:
因为ROLLBACK(file 1 block 18)坏块,redo 恢复undo 出现异常出现ORA-607,使得undo和redo不一致从而出现ORA-00600[4194],导致undo$(file 1 block 106)中的一条update事务不能被正常提交或者回滚,从而使得该数据库不能被正常打开.
针对这个库因为ROLLBACK异常,使用隐含参数无法屏蔽该回滚段,因为这个数据量非常小,我们选择了挖数据文件.如果数据量比较大,可以通过bbed尝试提交undo$(file 1 block 106)数据块中事务,看人品是否能够正常启动.

DUL挖ORACLE 8.0数据库

老古董的东西,在你一不小心的时候就可能遇到,测试了dul成功挖ORACLE 8.0数据库
创建模拟环境

SVRMGR> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle8 Release 8.0.5.0.0 - Production
PL/SQL Release 8.0.5.0.0 - Production
CORE Version 4.0.5.0.0 - Production
TNS for 32-bit Windows: Version 8.0.5.0.0 - Production
NLSRTL Version 3.3.2.0.0 - Production
5 rows selected.
SVRMGR> create table t_xifenfei
     2> as
     3> select * from dba_tables;
Statement processed.
SVRMGR> select count(*) from t_xifenfei;
COUNT(*)
----------
       183
1 row selected.
SVRMGR> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.

dul参数配置

    osd_big_endian_flag=false
    osd_dba_file_bits=10
    osd_c_struct_alignment=32
    osd_file_leader_size=1
    dc_columns=2000000
    dc_tables=10000
    dc_objects=1000000
    dc_users=400
    dc_segments=100000
    control_file = control.txt
    db_block_size=2048
    export_mode=true
    compatible=8
    file = dump

dul挖数据

C:\dul>dul8.exe dictv8.ddl
Data UnLoader 8.0.6.8 - Internal Use Only - on Thu Jul 26 20:08:33 2012
with 64-bit io functions
Copyright (c) 1994/2000 Bernard van Duijnen All rights reserved.
DUL: Warning: Recreating file "dul.log"
Parameter altered
Parameter altered
Parameter altered
Parameter altered
. unloading table                      OBJ$    2681 rows unloaded
. unloading table                      TAB$     187 rows unloaded
. unloading table                      COL$   11181 rows unloaded
. unloading table                     USER$      23 rows unloaded
. unloading table                  TABPART$       0 rows unloaded
. unloading table                      IND$     221 rows unloaded
. unloading table                     ICOL$     419 rows unloaded
. unloading table                      LOB$      13 rows unloaded
Life is DUL without it
C:\dul>dul8.exe
Data UnLoader 8.0.6.8 - Internal Use Only - on Thu Jul 26 20:13:44 2012
with 64-bit io functions
Copyright (c) 1994/2000 Bernard van Duijnen All rights reserved.
DUL: Warning: Recreating file "dul.log"
Loaded 23 entries from USER.dat
Loaded 2681 entries from OBJ.dat
Loaded 188 entries from TAB.dat
Loaded 11218 entries from COL.dat
Loaded 0 entries from TABPART.dat
Loaded 221 entries from IND.dat
Loaded 13 entries from LOB.dat
Loaded 419 entries from ICOL.dat
DUL> unload table chf.t_xifenfei;
. unloading table                T_XIFENFEI     183 rows unloaded

启动8.0数据库

C:\oracle\ora80\BIN>SVRMGR30.EXE
Oracle Server Manager Release 3.0.5.0.0 - Production
(c) Copyright 1997, Oracle Corporation.  All Rights Reserved.
Oracle8 Release 8.0.5.0.0 - Production
PL/SQL Release 8.0.5.0.0 - Production
SVRMGR> connect internal/xifenfei
Connected.
SVRMGR> startup
ORACLE instance started.
Total System Global Area                         15077376 bytes
Fixed Size                                          49152 bytes
Variable Size                                    12906496 bytes
Database Buffers                                  2048000 bytes
Redo Buffers                                        73728 bytes
Database mounted.
Database opened.

imp导入数据

C:\dul>imp chf/xifenfei file=dump001.dmp full=y
Import: Release 8.0.5.0.0 - Production on 星期二 11月 1 23:34:36 2011
(c) Copyright 2000 Oracle Corporation.  All rights reserved.
连接到: Oracle8 Release 8.0.5.0.0 - Production
PL/SQL Release 8.0.5.0.0 - Production
经由常规路径导出由EXPORT:V07.00.07创建的文件
警告: 此对象由 Bernard's DUL 导出, 而不是当前用户
. 正在将Bernard's DUL的对象导入到 CHF
. . 正在导入表                    "T_XIFENFEI"        183行被导入
成功终止导入,但出现警告。

测试恢复数据

SVRMGR> connect chf/xifenfei
Connected.
SVRMGR> select count(*) from t_xifenfei;
COUNT(*)
----------
       183
1 row selected.

再次证明了dul确实异常的强大,第三方的工具在某些方面确实不是它的对手

inactive transaction branch等待事件

分析一份awr,发现不太熟悉的等待事件”inactive transaction branch”,awr相关信息如下



分析top 1 sql中的对象

SQL> select * from v$version;
BANNER
-----------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for IBM/AIX RISC System/6000: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production
SQL> select owner,object_type from dba_objects where object_name =upper('wCommonShortMsg');
OWNER                          OBJECT_TYPE
------------------------------ -------------------
PUBLIC                         SYNONYM
OFFONQUERY                     SYNONYM
SQL> COL DB_LINK FOR A12
SQL> select TABLE_OWNER,TABLE_NAME,DB_LINK from dba_SYNONYMS WHERE
   2 SYNONYM_NAME=upper('wCommonShortMsg');
TABLE_OWNER                    TABLE_NAME                     DB_LINK
------------------------------ ------------------------------ ------------
                               WCOMMONSHORTMSG                CRMDB_LINK
                               WCOMMONSHORTMSG                DB_LINK_CRM

通过这里查询,可以确定引起dblink相关等待严重的是关于wCommonShortMsg同义词查询导致(使用dblink连接到其他库),结合数据库版本,大致可以确定inactive transaction branch等待和MOS中的bug 10413418相符

记录一次AIX 4.3.0+ORACLE 8.0.5恢复过程

最近接手一个AIX下面ORACLE 8.0.5恢复需求.这个需求要从2个月前说起.2个月前有客户提出需求,帮他们恢复8.0.5的库(该库被9.2.0.4的ORACLE强制拉过,但是未成功).当我准备去恢复的时候,客户和我说数据库的存储找不到了,暂时不用处理.本来这个事情到此也就算结束了,最多算一个笑话(找人数据库恢复,发现数据库丢了).前几天又接到关于这个恢复的请求,说他们还有磁带的备份(备份方式:begin tablespace backup+cp+archivelog),让我去协助恢复.按照客户的描述,他们购买了国外的读磁带机器可以读取磁带到硬盘,他们解压好数据文件,然后我去恢复,而且数据文件,控制文件,归档日志都存在.我当时认为这个是一个简单的恢复,顺利的话,一个小时就可以搞定.这里告诉我们:哪怕是N久的备份,也可能是最后的救命数据(2004年的磁带备份),一定要做好备份
今天到达现场,客户第一句话:文件没有解压,第二句话:归档日志没有恢复出来.听到这里,我有点感觉情况很不妙.
详细一问:文件没有解压是因为AIX 4.3.0默认的是文件系统格式是jfs,最大支持lv的大小为20g(估计和客户参数有关系),现在如果解压需要分到多个目录中(数据库原始运行就是按照该模式进行的,如:oradata001,oradata002等分别放几个数据文件),需要我去给他们做规划,如果存放这些文件.大脑一晕,dba原来真的什么活都要干啊.本身就对AIX不熟悉,还要去想办法处理这些问题,而且是从来没有见过的AIX 4.3.0.最后通过我和客户的一起努力终于解决了这个问题:使用其他技巧在AIX 4.3中建立jfs2文件系统(先划分小的LV,使用jfs系统,然后修改系统为jfs2,然后增加lv大小),解决以前jfs文件系统限制,解压的时候需要规划文件目录的体力活.这个问题告诉我们:有时候解决问题需要学会变通
归档日志没有恢复出来的原因:因为连续几天的恢复,加上客户本身工作繁忙,可能实在是太累,在最后一盘磁带的恢复的时候(一盘磁带15小时,一共4盘),客户敲错了命令tar -xvf输入成了-cvf,使得磁盘头被覆盖,磁带原则上报废,从而使得归档日志无法恢复出来.这一点点的事故告诉我们:越疲劳越容易出错,越到最后越容易出错,一定要小心谨慎
到这一步,没有解压(已经解决文件系统问题,接下来的解压问题不大),没有归档(修改scn原则上可以解决),这些东西总的来说问题都不大,当我安装好AIX FOR ORACLE 8.0.5,启动数据库到mount状态,核对恢复出来的数据文件和控制文件中的数据文件的时候,发现少了好几个,这下不能容忍了(能够open库,但是可能丢失需要数据,这个太不划算[因为用户还有该备份的前几天的备份]),寻找出现数据文件从磁带中丢失原因:1.因为jfs文件系统限制,不停的mv到其他目录导致丢失.2.uncompress解压丢失.3.最后一盘磁带损坏导致丢失.一切原因都是浮云,解决了jfs2文件系统,客户根据当前的情况,决定使用其他的备份再次从磁带中导出,然后进行恢复
AIX 4.3安装ORACLE 8.0.5

--检查内存
 lsattr -El sys0 -a realmem
--检查交换分区
 lsps -a
--检查临时目录
 df -k /tmp
--检查操作系统位数
getconf HARDWARE_BITMODE
bootinfo -y
--检查操作系统版本号
 oslevel -r
--检查软件包
 lslpp -l bos.adt.base, bos.adt.libm
--检查补丁包
 instfix -i | grep IX71948
--升级aix
smit install_latest
smit update_all
增加用户:
useradd   oracle
增加组:
mkgrp   dba
更改用户所属组:
usermod   -g  dba oracle
更改用户密码:
passwd   oracle
pwdadm   oracle
--关于用户
smit mkuser	建立用户
smit lsuser	列出所有用户的属性
lsuser ALL
smit chuser	改变用户属性
rmuser -p *	删除用户*
smit rmuser	只删除用户,但是不删除所属目录,等于rmuser
smit passwd	修改密码
smit lockuser	给用户加锁
--关于用户组
smit mkgroup	建立新组
smit lsgroup	显示所有组的属性
smit chgroup	修改组的属性
smit rmgroup	删除*组
rmgroup *
配置shell limits( smit chuser)
  soft FILE size -1
  soft CPU time -1
  soft DATA segment -1
  soft STACK size -1
新建目录:
mkdir   /u01
更改目录属主:
chown   oracle   /u01
更改目录所属组:
chgrp   dba   /u01
vi   /home/oracle/.profile
export   LINK_CNTRL=L_PTHREADS_D7
export   NLS_LANG=american_america.zhs16cgb231280
export	 ORACLE_OWNER=oracle
export   ORACLE_TERM=vt100
#export   ORACLE_TERM=xterm
export   ORACLE_BASE=/oracle
export   ORACLE_HOME=$ORACLE_BASE/product/8.0.5
export   ORACLE_SID=ora8
export   LD_LIBRARY_PATH=$/ORACLE_HOME/lib:$LD_LIBRARY_PATH
export   LIBPATH=$ORACLE_HOME/lib:$LIBPATH
export   ORA_NLS32=$ORACLE_HOME/ocommon/nls/admin/data
export   PATH=$ORACLE_HOME/bin:$PATH
export   TMPDIR=/tmp
export   DISPLAY=172.100.1.2:0.0
set -o vi
umask 022
--Mount产品光盘
$   su   root
-查看光驱
$ lsdev -Cc cdrom
#   mkdir   /cdrom
#   chmod   777   /cdrom
#   /etc/mount   -rv   cdrfs   /dev/cd0   /cdrom
#   exit
--运行rootpre.sh脚本。
$   su   root
#   cd   /cdrom/orainst
#   ./rootpre.sh
#   exit
--运行安装程序
$   cd   /cdrom/orainst
-图形
$   ./orainst   /m
-字符
$   ./orainst   /c
# cd $ORACLE_HOME/orainst
# ./root.sh

建立密码文件

cd $ORACLE_HOME/dbs
orapwd file=orapw$ORACLE_SID password=oracle

创建pfile文件

vi $ORACLE_HOME/dbs/init.ora
db_name=ORCL
db_files = 5000
control_files = /oradata/ctl1ORCL.ora
db_file_multiblock_read_count =  8
db_block_buffers =  100000
shared_pool_size =  115343360
log_checkpoint_interval = 10000
processes =  590
parallel_max_servers = 5
log_buffer =  163840
sequence_cache_entries =  100
sequence_cache_hash_buckets =  90
max_dump_file_size = 102400
global_names = TRUE
background_dump_dest=/oracle/trace
user_dump_dest=/oracle/trace
db_block_size = 4196
remote_login_passwordfile = shared
text_enable = TRUE
job_queue_processes = 2
job_queue_interval = 10
job_queue_keep_connections = false
distributed_lock_timeout = 300
distributed_transactions = 5
open_links = 4

操作8.0.5数据库

--结果测试与win,linux/unix使用svrmgrl命令
C:\oracle\ora80\BIN>SVRMGR30.EXE
Oracle Server Manager Release 3.0.5.0.0 - Production
(c) Copyright 1997, Oracle Corporation.  All Rights Reserved.
Oracle8 Release 8.0.5.0.0 - Production
PL/SQL Release 8.0.5.0.0 - Production
SVRMGR> connect system/manager
Connected.
SVRMGR> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle8 Release 8.0.5.0.0 - Production
PL/SQL Release 8.0.5.0.0 - Production
CORE Version 4.0.5.0.0 - Production
TNS for 32-bit Windows: Version 8.0.5.0.0 - Production
NLSRTL Version 3.3.2.0.0 - Production
5 rows selected.
SVRMGR> connect internal/oracle
Connected.
SVRMGR> archive log list
Database log mode              No Archive Mode
Automatic archival             Disabled
Archive destination            %RDBMS80%\
Oldest online log sequence     3
Current log sequence           6
SVRMGR> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SVRMGR> startup mount;
ORACLE instance started.
Total System Global Area                         15077376 bytes
Fixed Size                                          49152 bytes
Variable Size                                    12906496 bytes
Database Buffers                                  2048000 bytes
Redo Buffers                                        73728 bytes
Database mounted.
SVRMGR> alter database archivelog;
Statement processed.
SVRMGR> archive log list;
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            C:\oracle\ora80\RDBMS80\
Oldest online log sequence     3
Next log sequence to archive   6
Current log sequence           6

通过awr指标评估会话建立是否频繁

有朋友问我,通过awr怎么来判断系统这个时间段的会话建立情况,也就是说如果中间件或者客户端程序发生异常,过多的连接数据库.
我这里有个例子,用户的数据库大概在每秒钟建立10个连接左右(相对而言比较频繁)

[oracle@xifenfei tmp]$ grep "18-JUL-2012 18:40:03" listener.log |wc -l
9
[oracle@xifenfei tmp]$ grep "18-JUL-2012 18:40:04" listener.log |wc -l
7
[oracle@xifenfei tmp]$ grep "18-JUL-2012 18:41:04" listener.log |wc -l
12
[oracle@xifenfei tmp]$ grep "18-JUL-2012 18:40" listener.log |wc -l
554
[oracle@xifenfei tmp]$ grep "18-JUL-2012 18:50" listener.log |wc -l
598
[oracle@xifenfei tmp]$ grep "18-JUL-2012 19:00" listener.log |wc -l
597

我们分析这个时间段的awr,看那些地方是可以表明用户会话建立频繁
awr汇总信息
通过这个信息我们可以发现awr报告时间段(120分钟),和数据库在起点和终点的会话数
说明:该数据库问题很多,出现负载高,不全是会话建立频繁导致,这里只分析建立会话相关情况

Load Profile信息
通过这里的Logons为36对于这样的系统来说,明显异常

Top 5 Timed Events
这里的latch: session allocation等待就是比较明显的建立会话时候出现的等待

Time Model Statistics
Time Model Statistics中的connection management call elapsed time大家都明白的,建立会话花费时间

Dictionary Cache Stats
dc_usernames和dc_users请求值偏大

总结说明
在实际工作中:遇到过因为session建立太频繁导致监听繁忙,tnsping延迟比较严重案例,也遇到因为会话建立频繁导致系统内存被消耗完的案例.
在遇到会话建立过于频繁的案例,最有力的说明证据是监听日志,因为awr中的相关数据没有绝对标准(而且awr本身也是一个相对性的东西),而且一般客户对awr中我刚刚列举的数据概念性不强,所以一般只能作为分析的辅助工具,或者为进一步分析监听日志提供理由依据.

CURSOR_SHARING=SIMILAR引起的悲剧

一个客户反馈说有一系统经常性负载比较高,让我帮忙分析原因
系统负载情况

[oracle@zwq-kfdialdb ~]$ top -c
top - 17:11:06 up 78 days,  1:12,  5 users,  load average: 124.83, 125.90, 112.13
Tasks: 836 total, 152 running, 684 sleeping,   0 stopped,   0 zombie
Cpu(s): 98.1%us,  0.1%sy,  0.0%ni,  1.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  132070908k total, 90494280k used, 41576628k free,  1147384k buffers
Swap: 67108856k total,        0k used, 67108856k free, 79109904k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
12179 oracle    15   0 32.5g 142m 139m R 39.0  0.1   6:09.20 oracleahdial (LOCAL=NO)
11530 oracle    16   0 32.5g 469m 466m R 36.1  0.4  11:14.23 oracleahdial (LOCAL=NO)
11816 oracle    15   0 32.5g 467m 463m R 36.1  0.4   6:33.86 oracleahdial (LOCAL=NO)
11577 oracle    15   0 32.5g 480m 477m R 34.7  0.4   7:15.98 oracleahdial (LOCAL=NO)
12136 oracle    16   0 32.5g 455m 452m R 31.9  0.4   9:07.88 oracleahdial (LOCAL=NO)
11237 oracle    16   0 32.5g 997m 992m R 31.2  0.8  20:53.50 oracleahdial (LOCAL=NO)
11427 oracle    16   0 32.5g 137m 135m R 31.2  0.1  11:50.16 oracleahdial (LOCAL=NO)
12051 oracle    16   0 32.5g 459m 456m R 31.2  0.4   6:12.67 oracleahdial (LOCAL=NO)
[oracle@zwq-kfdialdb ~]$ vmstat 3 10
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
137  0      0 41566600 1147384 79109888    0    0     0     3    0    0  0  0 99  0  0
146  0      0 41567548 1147384 79109888    0    0     0    47 1058 32190 98  0  2  0  0
150  0      0 41568056 1147384 79109888    0    0     0    39 1081 31543 98  0  2  0  0
12  0      0 41568464 1147384 79109888    0    0     0    23 1056 32111 98  0  2  0  0
62  0      0 41568776 1147384 79109888    0    0     0    11 1067 31474 98  0  2  0  0
108  0      0 41568304 1147384 79109888    0    0     0    39 1059 31193 98  0  2  0  0
140  0      0 41569280 1147384 79109888    0    0     0    48 1063 31171 98  0  2  0  0
140  0      0 41569444 1147384 79109888    0    0     0    40 1075 30508 98  0  2  0  0

通过top和vmstat看出系统现在负载很高,主要都是用户进程导致.

查询等待事件

SQL> select event from v$session where wait_class#<>6;
EVENT
----------------------------------------------------------------
cursor: mutex S
SQL> /
EVENT
----------------------------------------------------------------
cursor: mutex S
SQL> /
EVENT
----------------------------------------------------------------
cursor: mutex S
SQL> /
EVENT
----------------------------------------------------------------
cursor: mutex S
SQL> SELECT a.*, s.sql_text
  2    FROM v$sql s,
  3         (SELECT sid,
  4                 event,
  5                 wait_class,
  6                 p1 cursor_hash_value,
  7                 p2raw Mutex_value,
  8                 TO_NUMBER (SUBSTR (p2raw, 1, 8), 'xxxxxxxx') hold_mutex_x_sid
  9            FROM v$session_wait
 10           WHERE event LIKE 'cursor%') a
 11   WHERE s.HASH_VALUE = a.cursor_hash_value
 12  /
no rows selected
SQL>  select event from v$session where wait_class#<>6;
EVENT
----------------------------------------------------------------
SQL*Net message to client

数据库开始的等待事件只有cursor: mutex S,等该等待事件消失后系统负载也恢复正常

再次查看系统负载

[oracle@zwq-kfdialdb ~]$ top -c -i10
top - 17:13:51 up 78 days,  1:15,  6 users,  load average: 12.57, 78.21, 96.45
Tasks: 702 total,   2 running, 700 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.5%us,  0.2%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  132070908k total, 86477808k used, 45593100k free,  1147500k buffers
Swap: 67108856k total,        0k used, 67108856k free, 79116036k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
15045 oracle    16   0 13136 1476  724 R 10.6  0.0   0:00.12 top -c -i10
10010 root      10  -5     0    0    0 D  0.0  0.0   0:00.68 [kondemand/4]
10019 root      10  -5     0    0    0 D  0.0  0.0   1:41.58 [kondemand/13]
10020 root      10  -5     0    0    0 D  0.0  0.0   1:52.28 [kondemand/14]
10021 root      10  -5     0    0    0 R  0.0  0.0   2:01.54 [kondemand/15]
12166 root      24   0 10084  300  216 D  0.0  0.0   0:00.00 /opt/VRTSgab/gablogd
[oracle@zwq-kfdialdb ~]$  vmstat 3 10
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 45484264 1147528 79117360    0    0     0     3    0    0  0  0 99  0  0
 2  0      0 45468452 1147528 79117456    0    0     4   161 1372 5369  2  2 97  0  0
 3  0      0 45463712 1147528 79117584    0    0     4   187 1602 7253  6  0 93  0  0
 1  0      0 45458220 1147528 79117648    0    0     1    99 1358 5821  2  0 98  0  0
 0  0      0 45475168 1147528 79117712    0    0     0    41 1321 5321  2  0 98  0  0
 0  0      0 45473624 1147528 79117744    0    0     3   104 1378 5455  2  0 98  0  0
 2  0      0 45474656 1147528 79117776    0    0     0    55 1196 4872  1  0 99  0  0
 0  0      0 45474376 1147532 79117824    0    0     8   113 1170 4990  2  0 98  0  0
 1  0      0 45475440 1147532 79117872    0    0     1    56 1187 5301  3  0 97  0  0
 1  0      0 45475824 1147532 79117888    0    0     3    99 1083 4643  3  0 97  0  0

结合上面的等待事件查询,我们可以大概评估出来,当cursor: mutex S等待消失后,系统负载也恢复正常,现在已经不存在环境,如果要找出问题只能够是借助AWR和ASH

分析ASH
Top User Events

Top SQL with Top Events

Activity Over Time

通过对ASH分析,可以大概确定,在这段时间内,引起系统负载高主要是cursor: mutex S导致

分析AWR
awr整体信息(从这里看数据库是相当的繁忙)

Load Profile(从这里看数据库业务比较小)

Top 5 Timed Foreground Events(主要等待事件cursor: mutex S,和前面分析相符)

OS LOAD(虽然和系统看到有一定出入,但是整体还是展示系统负载较高)

SQL ordered by Version Count(出现cursor: mutex S,因为load profile中解析不多,所以想到高版本问题,这里确实非常高)

通过这里的一些列分析,我们已经基本上可以确定,该数据库因为高版本问题导致cursor: mutex S以及library cache 相关等待严重,从而出现系统负载过高.

找出高版本原因
高版本相关信息和查询请见:关于High Versions Count总结

SQL> select * from table(version_rpt('f8b9tba7sfsb5'));
COLUMN_VALUE
--------------------------------------------------------------------------------
Version Count Report Version 3.2.1 -- Today's Date 19-jul-12 18:13
RDBMS Version :11.2.0.1.0 Host: zwq-kfdialdb Instance 1 : ahdial
==================================================================
Addr: 000000080FA4CEA0  Hash_Value: 2408014181  SQL_ID f8b9tba7sfsb5
Sharable_Mem: 206315729 bytes   Parses: 48689
Stmt:
0 select count(*) as col_0_0_ from TBL_SP_SALES_RECORDS tblspsales
1 0_ where tblspsales0_.SALES_RECORDS_STATUS=:"SYS_B_0" and tblsps
2 ales0_.MOBILE_TELE_NO=:"SYS_B_1"
3
COLUMN_VALUE
--------------------------------------------------------------------------------
Versions Summary
----------------
AUTH_CHECK_MISMATCH :4
TRANSLATION_MISMATCH :4
ROLL_INVALID_MISMATCH :10219
PURGED_CURSOR :9
Total Versions:10219
Plan Hash Value Summary
-----------------------
COLUMN_VALUE
--------------------------------------------------------------------------------
Plan Hash Value Count
=============== =====
      791727930 920
     2820478500 9300
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Details for AUTH_CHECK_MISMATCH :
  # of Ver PARSING_USER_ID PARSING_SCHEMA_ID PARSING_SCHEMA_NAME
========== =============== ================= ===================
     10218               75                75 HOLLYSP
COLUMN_VALUE
--------------------------------------------------------------------------------
         2              107               107 HOLLYSP_TEST
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Details for TRANSLATION_MISMATCH :
Summary of objects probably causing TRANSLATION_MISMATCH
  Object# Owner.Object_Name
========= =================
    76737 HOLLYSP.TBL_SP_SALES_RECORDS
 HOLLYSP.TBL_SP_SALES_RECORDS
 HOLLYSP_TEST.TBL_SP_SALES_RECORDS
COLUMN_VALUE
--------------------------------------------------------------------------------
   107043 HOLLYSP_TEST.TBL_SP_SALES_RECORDS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Details for ROLL_INVALID_MISMATCH :
No details available
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Details for PURGED_CURSOR :
No details available
####
To further debug Ask Oracle Support for the appropiate level LLL.
COLUMN_VALUE
--------------------------------------------------------------------------------
alter session set events
 'immediate trace name cursortrace address 2408014181, level LLL';
To turn it off do use address 1, level 2147483648
================================================================
59 rows selected.
SQL> select * from table(version_rpt('6zhjf3qh8gyp9'));
COLUMN_VALUE
--------------------------------------------------------------------------------
Version Count Report Version 3.2.1 -- Today's Date 19-jul-12 18:15
RDBMS Version :11.2.0.1.0 Host: zwq-kfdialdb Instance 1 : ahdial
==================================================================
Addr: 00000007FD46D2A8  Hash_Value: 2693266089  SQL_ID 6zhjf3qh8gyp9
Sharable_Mem: 111904227 bytes   Parses: 4880
Stmt:
0 select count(*) as col_0_0_ from TBL_SP_SALES_RECORDS tblspsales
1 0_ where tblspsales0_.SALES_RECORDS_STATUS=:"SYS_B_0" and tblsps
2 ales0_.CAMPAIGN_ID=:"SYS_B_1" and tblspsales0_.IS_CONN=:"SYS_B_2
3 " and tblspsales0_.SALES_TIME>=:"SYS_B_3" and tblspsales0_.SALES
4 _PERSON=:"SYS_B_4"
COLUMN_VALUE
--------------------------------------------------------------------------------
5
Versions Summary
----------------
BIND_MISMATCH :4804
INCOMP_LTRL_MISMATCH :372
HASH_MATCH_FAILED :4936
Total Versions:4935
Plan Hash Value Summary
COLUMN_VALUE
--------------------------------------------------------------------------------
-----------------------
Plan Hash Value Count
=============== =====
     1645985080 3
     2040125427 4933
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Details for BIND_MISMATCH :
Consolidated details for :
BIND_MISMATCH,USER_BIND_PEEK_MISMATCH,BIND_UACS_DIFF and
COLUMN_VALUE
--------------------------------------------------------------------------------
BIND_EQUIV_FAILURE (Mislabled as ROW_LEVEL_SEC_MISMATCH BY bug 6964441 in 11gR1)
from v$sql_bind_capture
COUNT(*) POSITION MIN(MAX_LENGTH) MAX(MAX_LENGTH) DATATYPE (PRECISION,SCALE)
======== ======== =============== =============== ======== ================
    4936        1              32              32        1 (,)
    4936        2              32              32        1 (,)
    4936        3              32              32        1 (,)
    4936        4              32              32        1 (,)
    4936        5              32              32        1 (,)
COLUMN_VALUE
--------------------------------------------------------------------------------
SUM(DECODE(column,Y, 1, 0) FROM V$SQL
IS_OBSOLETE IS_BIND_SENSITIVE IS_BIND_AWARE IS_SHAREABLE
=========== ================= ============= ============
          0              4309             0         4309
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Details for INCOMP_LTRL_MISMATCH :
No details available
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Details for HASH_MATCH_FAILED :
COLUMN_VALUE
--------------------------------------------------------------------------------
No details available
####
To further debug Ask Oracle Support for the appropiate level LLL.
alter session set events
 'immediate trace name cursortrace address 2693266089, level LLL';
To turn it off do use address 1, level 2147483648
================================================================
62 rows selected.

根据经验,出现这么多的高版本情况,很可能是cursor_sharing参数设置问题

--对应sql语句
select count(*) as col_0_0_ from TBL_SP_SALES_RECORDS tblspsales0_
where tblspsales0_.SALES_RECORDS_STATUS=:"SYS_B_0" and
tblspsales0_.CAMPAIGN_ID=:"SYS_B_1" and tblspsales0_.IS_CONN=:"SYS_B_2"
and tblspsales0_.SALES_TIME>=:"SYS_B_3" and tblspsales0_.SALES_PERSON=:"SYS_B_4"
--cursor_sharing参数
SQL> show parameter cursor_sharing;
NAME                                 TYPE                             VALUE
------------------------------------ -------------------------------- ----------------
cursor_sharing                       string                           SIMILAR

根据oracle官方建议在11g中不推荐使用cursor_sharing=SIMILAR,其实在所有版本中都不推荐,设置为该值很容易导致高版本问题.而且该值会出现莫名其妙的,无法解释的高版本问题.而且根据oracle相关文档,在即将发布的12c版本中,将除掉SIMILAR值.对于客户库的该问题,因为很多sql未绑定参数,为了减少硬解析,建议在业务低谷时设置cursor_sharing=FORCE,并刷新shared pool.

ALERT_QUE表重建方法

最近关注了下网络上,出现了很多AQ$_ALERT_QT_N的index SYS_IOT_TOP_NNNN坏块引起的数据库异常(主要是SYSAUX表空间),因为他们是IOT表和主键的关系,不能简单的rebuild.查询了一些资料,得到一些信息

ALERT_QUE表用途
The ALERT_QUE is used by the Grid Control and DB Control Management Agents to monitor server-generated alerts

ALERT_QUE表重建方法

--方法1
SQL> connect / as sysdba
SQL>alter system enable restricted session;
To drop server alert schema.
SQL>@$ORACLE_HOME/rdbms/admin/catnoalr.sql
To re-create tables, sequence, type and queue for server alert
SQL>@$ORACLE_HOME/rdbms/admin/dbmsslrt.sql
SQL>@$ORACLE_HOME/rdbms/admin/catalrt.sql
To recompile the invalid objects
SQL>@$ORACLE_HOME/rdbms/admin/utlrp.sql
SQL> alter system disable restricted session;
--方法2
SQL> connect / as sysdba
SQL>alter system enable restricted session;
To drop server alert schema.
SQL>@$ORACLE_HOME/rdbms/admin/catnoalr.sql
Rerun catproc.sql
SQL>@$ORACLE_HOME/rdbms/admin/catproc.sql
SQL> alter system disable restricted session;

补充说明

By running the script up, the queue tables will be recreated and the messages
in the queue will be lost.
For 11g you can use catmwin.sql which has the steps to recreate the ALERT_QT.
Alternatively, for 11g you can use the catproc.sql to recreate.
If this option may leave DBSNMP.MGMT_BSLN_INTERNAL invalid.
To validate the same run catsnmp.sql [NOTE:603289.1]

客户端版本导致ORA-00600[kssadd_stage: null parent]

有客户一台应用不能正常工作,报ORA-00600[kssadd_stage: null parent]错误,重启中间件后工作正常.
alert日志

ORA-00600: internal error code, arguments: [kssadd_stage: null parent], [], [], [], [], [], [], []
Tue Jul 17 14:57:37 2012
Trace dumping is performing id=[cdmp_20120717145742]
Tue Jul 17 14:57:39 2012
Errors in file /oracle/10g/admin/fdjdb/udump/fdjdb2_ora_307720.trc:
ORA-00600: internal error code, arguments: [kssadd_stage: null parent], [], [], [], [], [], [], []
Tue Jul 17 14:57:45 2012
Errors in file /oracle/10g/admin/fdjdb/udump/fdjdb2_ora_357344.trc:
ORA-00600: internal error code, arguments: [kssadd_stage: null parent], [], [], [], [], [], [], []

trace文件

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /oracle/10g/db
System name:    AIX
Node name:      ora2
Release:        1
Version:        6
Machine:        00CCFD354C00
Instance name: fdjdb2
Redo thread mounted by this instance: 2
Oracle process number: 89
Unix process pid: 111068, image: oracle@ora2
*** ACTION NAME:() 2012-07-17 15:08:42.043
*** MODULE NAME:(gsrvr.exe) 2012-07-17 15:08:42.043
*** SERVICE NAME:(fdjdb) 2012-07-17 15:08:42.043
*** SESSION ID:(991.44140) 2012-07-17 15:08:42.043
*** 2012-07-17 15:08:42.043
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kssadd_stage: null parent], [], [], [], [], [], [], []
No current SQL statement being executed.
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              40D1A9663F9E7ABB ?
                                                   6ED89E14D59386B5 ?
ksedmp+0290          bl       ksedst               104A2CDB0 ?
ksfdmp+0018          bl       03F2735C
kgerinv+00dc         bl       _ptrgl
kgeasnmierr+004c     bl       kgerinv              11041D938 ? 700000220FFBFC8 ?
                                                   110000770 ? 7000004FDF0C700 ?
                                                   FFFFFFFFFFF89C0 ?
kssadd_stage+0080    bl       kgeasnmierr          110195490 ? 110450040 ?
                                                   104AC46B8 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 7000004F0FA9208 ?
kqreqa+0058          bl       kssadd_stage         105670038 ? 104CF7BA0 ?
                                                   000000000 ? 000000000 ?
kqrpre1+0850         bl       kqreqa               100203514 ? 1101A2B20 ?
kqrpre+001c          bl       kqrpre1              710000770 ? 000000009 ?
                                                   FFFFFFFFFFF9088 ?
                                                   28A4202200000000 ?
                                                   10012AEE4 ? FFFFFFFFFFF9080 ?
                                                   000000000 ? 11022A3E0 ?
opiosq0+009c         bl       kqrpre               000000000 ? 000000000 ?
                                                   000000000 ? 1101A2B20 ?
                                                   FFFFFFFFFFF9198 ? 1104B7C60 ?
                                                   FFFFFFFFFFF9458 ?
kpooprx+0168         bl       opiosq0              4A00000001 ? 000000001 ?
                                                   000000000 ? A40000000000FF ?
kpoal8+0400          bl       kpooprx              FFFFFFFFFFFB964 ?
                                                   FFFFFFFFFFFB680 ?
                                                   5000000050 ? 100000001 ?
                                                   000000000 ? A40000000000A4 ?
                                                   000000000 ? 1103A1AD8 ?
opiodr+0ae0          bl       _ptrgl
ttcpip+1020          bl       _ptrgl
opitsk+1124          bl       01F971E8
opiino+0990          bl       opitsk               000000000 ? 000000000 ?
opiodr+0ae0          bl       _ptrgl
opidrv+0484          bl       01F96034
sou2o+0090           bl       opidrv               3C02D9A29C ? 4A006E298 ?
                                                   FFFFFFFFFFFF8A0 ?
opimai_real+01bc     bl       01F939B4
main+0098            bl       opimai_real          000000000 ? 000000000 ?
__start+0070         bl       main                 000000000 ? 000000000 ?
--------------------- Binary Stack Dump ---------------------

咨询客户得知访问该数据库的是通过中间件(OCI)+10g Release 1 (10.1) for Windows访问数据库,然后查询MOS[ID 752149.1]发现stack trace
kssadd_stage <- kqreqa <- kqrpre1 <- kqrpre <- opiosq0 <- kpooprx <- kpoal8 <- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o <- opimai_real <- main <- start 和客户端版本和访问环境都和unpublished Bug 4937225相似
处理建议
客户端升级到10.2.0.3或者更高版本

8i升级到9i出现ORA-07445[pevm_MOVC_i()+18]

一个朋友数据库从8i升级到9i后,出现ORA-07445[pevm_MOVC_i()+18]错误
alert日志ORA-07445[pevm_MOVC_i()+18]

Mon Jul 16 12:21:54 2012
Errors in file /oracle/admin/ora8/udump/ora8_ora_8938.trc:
ORA-07445: exception encountered: core dump [pevm_MOVC_i()+18] [SIGSEGV] [Address not mapped to object] [0x7] [] []

trace文件

--版本平台信息
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production
ORACLE_HOME = /oracle/product/9.2.0
System name:	Linux
Node name:	localhost.localdomain
Release:	2.6.18-194.el5PAE
Version:	#1 SMP Tue Mar 16 22:00:21 EDT 2010
Machine:	i686
Instance name: ora8
Redo thread mounted by this instance: 1
Oracle process number: 15
Unix process pid: 8938, image: oracle@localhost.localdomain (TNS V1-V3)
--trace信息
*** 2012-07-16 12:21:54.399
*** SESSION ID:(12.6) 2012-07-16 12:21:54.399
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0x7, PC: [0x9bfac06, pevm_MOVC_i()+18]
Registers:
%eax: 0x00000000 %ebx: 0x00000025 %ecx: 0x00000000
%edx: 0xbf93bf50 %edi: 0x00000000 %esi: 0x002ff1d8
%esp: 0xbf93bc28 %ebp: 0xbf93bc60 %eip: 0x09bfac06
%efl: 0x00010296
  pevm_MOVC_i()+6 (0x9bfabfa) mov %edi,0xffffffcc(%ebp)
  pevm_MOVC_i()+9 (0x9bfabfd) mov %esi,0xffffffd0(%ebp)
  pevm_MOVC_i()+12 (0x9bfac00) mov %ebx,0xffffffc8(%ebp)
  pevm_MOVC_i()+15 (0x9bfac03) mov 0x14(%ebp),%eax
> pevm_MOVC_i()+18 (0x9bfac06) movb 0x7(%eax),%dl
  pevm_MOVC_i()+21 (0x9bfac09) mov $0x0,0xfffffff0(%ebp)
  pevm_MOVC_i()+28 (0x9bfac10) movb %dl,0xffffffe0(%ebp)
  pevm_MOVC_i()+31 (0x9bfac13) movb %dl,0xffffffe0(%ebp)
  pevm_MOVC_i()+34 (0x9bfac16) cmpb $0x1,%dl
*** 2012-07-16 12:21:54.407
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [pevm_MOVC_i()+18] [SIGSEGV] [Address not mapped to object] [0x7] [] []
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp()+269         call     ksedst()+0           1 ? 0 ? 0 ? 1 ? 64252C31 ?
                                                   6666006C ?
ssexhd()+1108        call     ksedmp()+0           3 ? 0 ? 0 ? 0 ? 0 ? 0 ?
pevm_MOVC_i()+18     signal   ssexhd()+0           B ? BF93B8BC ? BF93B93C ?
pfrrun()+8458        call     pevm_MOVC_i()+0      2FF19C ? 16 ? BE14650 ? 0 ?
pricar()+1277        call     pfrrun()+0           2FF19C ? 1 ? BF93CCFC ?
                                                   AD638A0 ? 2DFBAC ? 0 ?
pricbr()+427         call     pricar()+0           BF93DA88 ? BF93D084 ?
                                                   9BEAE0C ? 1 ? 0 ? 98C93728 ?
prient2()+598        call     pricbr()+0           BF93DA88 ? BF93D084 ? 0 ?
prient()+1438        call     prient2()+0          BF93DA88 ? BF93D084 ? 1 ?
                                                   BF93E4E0 ? 0 ?
kkxrpc()+347         call     prient()+0           BF93DA88 ? AD638A0 ?
                                                   BF93E534 ? 38 ? 1C8C997 ? 0 ?
kporpc()+138         call     kkxrpc()+0           4C ? F ? BF93E63C ?
opiodr()+5238        call     kjushutdown()+2671   4C ? F ? BF93E63C ?
ttcpip()+2124        call     opiodr()+0           4C ? F ? BF93E63C ? 0 ?
Cannot find symbol in /lib/libc.so.6.
opitsk()+1635        call     ttcpip()+0           AD638A0 ? 4C ? BF93E63C ? 0 ?
                                                   BF93EF14 ? BF93EF10 ?
opiino()+602         call     opitsk()+0           0 ? 0 ? AD638A0 ? BE01DE0 ?
                                                   103 ? 0 ?
opiodr()+5238        call     kjushutdown()+2671   3C ? 4 ? BF9402E0 ?
opidrv()+517         call     opiodr()+0           3C ? 4 ? BF9402E0 ? 0 ?
sou2o()+25           call     opidrv()+0           3C ? 4 ? BF9402E0 ?
main()+182           call     sou2o()+0            BF9402C4 ? 3C ? 4 ?
                                                   BF9402E0 ? 0 ? 0 ?
00125E9C             call     main()+0             2 ? BF940384 ? BF940390 ?
                                                   88A810 ? 0 ? 1 ?
---------------------Binary Stack Dump ---------------------
--进程信息
Process global information:
     process: 0x962ba0b8, call: 0x96342cd8, xact: (nil), curses: 0x962e4070, usrses: 0x962e4070
  ----------------------------------------
  SO: 0x962ba0b8, type: 2, owner: (nil), flag: INIT/-/-/0x00
  (process) Oracle pid=15, calls cur/top: 0x96342cd8/0x96342cd8, flag: (0) -
            int error: 0, call error: 0, sess error: 0, txn error 0
  (post info) last post received: 196 0 4
              last post received-location: kslpsr
              last process to post me: 962b7828 1 6
              last post sent: 0 0 15
              last post sent-location: ksasnd
              last process posted by me: 962b7828 1 6
    (latch info) wait_event=0 bits=0
    Process Group: DEFAULT, pseudo proc: 0x962d9444
    O/S info: user: oracle, term: UNKNOWN, ospid: 8938
    OSD pid info: Unix process pid: 8938, image: oracle@localhost.localdomain (TNS V1-V3)
    ----------------------------------------
    SO: 0x962e4070, type: 4, owner: 0x962ba0b8, flag: INIT/-/-/0x00
    (session) trans: (nil), creator: 0x962ba0b8, flag: (8000041) USR/- BSY/-/-/-/-/-
              DID: 0001-000F-00000004, short-term DID: 0000-0000-00000000
              txn branch: (nil)
              oct: 0, prv: 0, sql: (nil), psql: 0x98c3b858, user: 95/DDDD
    O/S info: user: mis, term: LANDERSVR3, ospid: 7904:3012, machine: XANDER\LANDERSVR3
              program: c:\orant\bin\f50run32.exe c:\forms\bas9010.fmx
    application name: c:\orant\bin\f50run32.exe c:\forms\bas9010.fmx, hash value=0
    last wait for 'db file sequential read' blocking sess=0x0 seq=1277 wait_time=11
                file#=1, block#=21b, blocks=1

1.该用户程序是从8i升级到9i之后产生该错误
2.报错的访问程序是FORM 5

解决方案
查询MOS[ID 273411.1]发现是因为FORM 5和9i不兼容导致该错误,ORACLE未给出解决方案,言外之意,如果FORM不能升级,那就只能把ORACLE重新降级到8i.

温馨提示
在做oracle数据库升级前,需要实现进行评估,测试,如果是oracle相关软件和oracle数据库结合紧密,升级前最好需要和ORACLE技术人员确认是否兼容.

using backup controlfile 两种使用情况区别

使用备份控制文件和重建控制文件恢复,都需要使用到using backup controlfile命令,但是两种情况下却有着本质的区别
试验准备条件

SQL> select * from v$version;
BANNER
-------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> select to_char(sysdate,'yyyy-mm-dd')"www.xifenfei.com" from dual;
www.xifenfei.com
--------------------
2012-07-13
SQL> alter database open resetlogs;
Database altered.
SQL> select name from v$controlfile;
NAME
----------------------------------------------------
/u01/oracle/oradata/ora11g/control01.ctl
SQL> !cp /u01/oracle/oradata/ora11g/control01.ctl /tmp/xff.ctl
SQL> alter system checkpoint;
System altered.
SQL> /
System altered.
SQL> /
System altered.
SQL> alter system switch logfile;
System altered.
SQL> /
System altered.
SQL> /
System altered.
SQL> /
System altered.
SQL> alter system checkpoint;
System altered.
SQL> shutdown abort;
ORACLE instance shut down.
SQL> !rm /u01/oracle/oradata/ora11g/control01.ctl
SQL> ! cp /tmp/xff.ctl /u01/oracle/oradata/ora11g/control01.ctl
SQL> startup
ORACLE instance started.
Total System Global Area  523108352 bytes
Fixed Size                  1346052 bytes
Variable Size             432014844 bytes
Database Buffers           83886080 bytes
Redo Buffers                5861376 bytes
Database mounted.
ORA-01122: database file 1 failed verification check
ORA-01110: data file 1: '/u01/oracle/oradata/ora11g/system01.dbf'
ORA-01207: file is more recent than control file - old control file

相关说明:
1.通过resetlogs使得试验更加清晰
2.通过多次的checkpoint实现增加scn,switch logfile实现日志组切换
3.通过模拟备份控制文件恢复

查询相关SCN

SQL> set linesize 150
SQL> select file#,to_char(checkpoint_change#,'9999999999999999') "SCN",
  2  to_char(RESETLOGS_CHANGE#,'9999999999999999') "RESETLOGS SCN",FUZZY
  3  from v$datafile_header;
     FILE# SCN                                RESETLOGS SCN                      FUZZY
---------- ---------------------------------- ---------------------------------- ------
         1           2118981                            2118577                  YES
         2           2118981                            2118577                  YES
         3           2118981                            2118577                  YES
         4           2118981                            2118577                  YES
         6           2118981                            2118577                  YES
SQL> select file#,to_char(checkpoint_change#,'999999999999999') "SCN",
  2  to_char(last_change#,'999999999999999')"STOP_SCN" from v$datafile;
     FILE# SCN                              STOP_SCN
---------- -------------------------------- --------------------------------
         1          2118580
         2          2118580
         3          2118580
         4          2118580
         6          2118580
SQL> select CONTROLFILE_CHANGE#   from v$database;
CONTROLFILE_CHANGE#
-------------------
            2118713

做关于控制文件和数据文件dump

SQL> oradebug setmypid;
Statement processed.
SQL> oradebug dump file_hdrs 3;
Statement processed.
SQL> oradebug tracefile_name;
/u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_593.trc
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@xifenfei ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Fri Jul 13 03:05:48 2012
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> oradebug setmypid;
Statement processed.
SQL>  oradebug dump controlf 3;
Statement processed.
SQL> oradebug tracefile_name;
/u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_775.trc

分析file_hdrs 3 dump文件

--datafile 1的datafile header信息
Tablespace #0 - SYSTEM  rel_fn:1
Creation   at   scn: 0x0000.00000007 09/18/2011 17:33:47
Backup taken at scn: 0x0000.00000000 01/01/1988 00:00:00 thread:0
 reset logs count:0x2eff82e3 scn: 0x0000.002053b1
 prev reset logs count:0x2e9e8451 scn: 0x0000.0016eaab
 recovered at 07/13/2012 02:56:18
 status:0x2004 root dba:0x00400208 chkpt cnt: 760 ctl cnt:759
begin-hot-backup file size: 0
Checkpointed at scn:  0x0000.00205545 07/13/2012 03:01:42
--datafile 1的控制文件中信息
DATA FILE #1:
  name #7: /u01/oracle/oradata/ora11g/system01.dbf
creation size=0 block size=8192 status=0xe head=7 tail=7 dup=1
 tablespace 0, index=1 krfil=1 prev_file=0
 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
 Checkpoint cnt:752 scn: 0x0000.002053b4 07/13/2012 02:59:18
 Stop scn: 0xffff.ffffffff 07/13/2012 02:58:43
 Creation Checkpointed at scn:  0x0000.00000007 09/18/2011 17:33:47

通过这里可以看出来:
datafile header的Checkpointed scn(00205545)>controfile datafile的Checkpoint scn(002053b4)
datafile header的checkpiont count(760)>controfile datafile的checkpiont count(752)
所以在数据库open的时候会报ORA-01207错误

尝试恢复数据库

SQL> recover database using backup controlfile;
ORA-00279: change 2118713 generated at 07/13/2012 02:58:43 needed for thread 1
ORA-00289: suggestion : /u01/oracle/oradata/archivelog/ora11g/1_1_788497123.dbf
ORA-00280: change 2118713 for thread 1 is in sequence #1
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
ORA-00308: cannot open archived log
'/u01/oracle/oradata/archivelog/ora11g/1_1_788497123.dbf'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
SQL>
SQL> select to_char(2118713,'xxxxxxx') from dual;
TO_CHAR(2118713,
----------------
  205439

分析controlf 3 dump文件

***************************************************************************
DATABASE ENTRY
***************************************************************************
 (size = 316, compat size = 316, section max = 1, section in-use = 1,
  last-recid= 0, old-recno = 0, last-recno = 0)
 (extent = 1, blkno = 1, numrecs = 1)
 03/12/2012 22:17:06
 DB Name "ORA11G"
 Database flags = 0x00404000 0x00001000
 Controlfile Creation Timestamp  03/12/2012 22:17:07
 Incmplt recovery scn: 0x0000.00000000
 Resetlogs scn: 0x0000.002053b1 Resetlogs Timestamp  07/13/2012 02:58:43
 Prior resetlogs scn: 0x0000.0016eaab Prior resetlogs Timestamp  05/01/2012 13:14:57
 Redo Version: compatible=0xb200000
 #Data files = 5, #Online files = 5
 Database checkpoint: Thread=1 scn: 0x0000.002053b4
 Threads: #Enabled=1, #Open=1, Head=1, Tail=1
 Max log members = 3, Max data members = 1
 Arch list: Head=0, Tail=0, Force scn: 0x0000.00000000scn: 0x0000.00000000
 Activation ID: 4184707968
 Controlfile Checkpointed at scn:  0x0000.00205439 07/13/2012 02:59:25 <==控制文件checkpiont,控制文件每3秒的一次checkpiont
 thread:0 rba:(0x0.0.0)

通过recover的提示和对于controlf 3 dump文件的分析,可以确定数据库使用备份控制文件恢复,需要改控制文件备份之时开始的所有归档日志

查询数据库当前redo情况

SQL> select member from v$logfile;
MEMBER
-------------------------------------------------------------
/u01/oracle/oradata/ora11g/redo03.log
/u01/oracle/oradata/ora11g/redo02.log
/u01/oracle/oradata/ora11g/redo01.log
SQL> select SEQUENCE#,STATUS,FIRST_CHANGE# ,NEXT_CHANGE# from v$log;
 SEQUENCE# STATUS                           FIRST_CHANGE# NEXT_CHANGE#
---------- -------------------------------- ------------- ------------
         1 CURRENT                                2118577   2.8147E+14
         0 UNUSED                                       0            0
         0 UNUSED                                       0            0
SQL> archive log list;
Database log mode              No Archive Mode
Automatic archival             Disabled
Archive destination            /u01/oracle/oradata/archivelog/ora11g
Oldest online log sequence     1
Current log sequence           1
SQL> alter system dump logfile '/u01/oracle/oradata/ora11g/redo03.log';
System altered.
SQL> alter system dump logfile '/u01/oracle/oradata/ora11g/redo02.log';
System altered.
SQL> alter system dump logfile '/u01/oracle/oradata/ora11g/redo01.log';
System altered.

因为数据库处于非归档模式,而得到的redo信息主要都是来自控制文件,所以只能通过dump redo来分析当前redo的情况

分析redo log dump

DUMP OF REDO FROM FILE '/u01/oracle/oradata/ora11g/redo03.log'
 Opcodes *.*
 RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
 SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
 Times: creation thru eternity
 FILE HEADER:
	Compatibility Vsn = 186646528=0xb200000
	Db ID=4173966754=0xf8c9ada2, Db Name='ORA11G'
	Activation ID=4184707968=0xf96d9380
	Control Seq=7660=0x1dec, File size=30720=0x7800
	File Number=3, Blksiz=512, File Type=2 LOG
 descrip:"Thread 0001, Seq# 0000000003, SCN 0x00000020553d-0x000000205540"
 thread: 1 nab: 0x2 seq: 0x00000003 hws: 0x2 eot: 0 dis: 0
 resetlogs count: 0x2eff82e3 scn: 0x0000.002053b1 (2118577)
 prev resetlogs count: 0x2e9e8451 scn: 0x0000.0016eaab (1501867)
 Low  scn: 0x0000.0020553d (2118973) 07/13/2012 03:01:34
 Next scn: 0x0000.00205540 (2118976) 07/13/2012 03:01:35
 Enabled scn: 0x0000.002053b1 (2118577) 07/13/2012 02:58:43
 Thread closed scn: 0x0000.0020553d (2118973) 07/13/2012 03:01:34
 Disk cksum: 0xa716 Calc cksum: 0xa716
DUMP OF REDO FROM FILE '/u01/oracle/oradata/ora11g/redo02.log'
 Opcodes *.*
 RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
 SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
 Times: creation thru eternity
 FILE HEADER:
	Compatibility Vsn = 186646528=0xb200000
	Db ID=4173966754=0xf8c9ada2, Db Name='ORA11G'
	Activation ID=4184707968=0xf96d9380
	Control Seq=7663=0x1def, File size=30720=0x7800
	File Number=2, Blksiz=512, File Type=2 LOG
 descrip:"Thread 0001, Seq# 0000000005, SCN 0x000000205543-0xffffffffffff"
 thread: 1 nab: 0xffffffff seq: 0x00000005 hws: 0x1 eot: 1 dis: 0
 resetlogs count: 0x2eff82e3 scn: 0x0000.002053b1 (2118577)
 prev resetlogs count: 0x2e9e8451 scn: 0x0000.0016eaab (1501867)
 Low  scn: 0x0000.00205543 (2118979) 07/13/2012 03:01:36
 Next scn: 0xffff.ffffffff 01/01/1988 00:00:00
 Enabled scn: 0x0000.002053b1 (2118577) 07/13/2012 02:58:43
 Thread closed scn: 0x0000.00205543 (2118979) 07/13/2012 03:01:36
 Disk cksum: 0xc3f9 Calc cksum: 0xc3f9
 Terminal recovery stop scn: 0x0000.00000000
 Terminal recovery  01/01/1988 00:00:00
 Most recent redo scn: 0x0000.00000000
DUMP OF REDO FROM FILE '/u01/oracle/oradata/ora11g/redo01.log'
 Opcodes *.*
 RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
 SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
 Times: creation thru eternity
 FILE HEADER:
	Compatibility Vsn = 186646528=0xb200000
	Db ID=4173966754=0xf8c9ada2, Db Name='ORA11G'
	Activation ID=4184707968=0xf96d9380
	Control Seq=7663=0x1def, File size=30720=0x7800
	File Number=1, Blksiz=512, File Type=2 LOG
 descrip:"Thread 0001, Seq# 0000000004, SCN 0x000000205540-0x000000205543"
 thread: 1 nab: 0x2 seq: 0x00000004 hws: 0x2 eot: 0 dis: 0
 resetlogs count: 0x2eff82e3 scn: 0x0000.002053b1 (2118577)
 prev resetlogs count: 0x2e9e8451 scn: 0x0000.0016eaab (1501867)
 Low  scn: 0x0000.00205540 (2118976) 07/13/2012 03:01:35
 Next scn: 0x0000.00205543 (2118979) 07/13/2012 03:01:36
 Enabled scn: 0x0000.002053b1 (2118577) 07/13/2012 02:58:43
 Thread closed scn: 0x0000.00205540 (2118976) 07/13/2012 03:01:35
 Disk cksum: 0xaa26 Calc cksum: 0xaa26
 Terminal recovery stop scn: 0x0000.00000000
 Terminal recovery  01/01/1988 00:00:00
 Most recent redo scn: 0x0000.00000000

通过对redo dump的分析可以得到:
1.最小的sequence#=3是redo03.log
2.current redo为redo02.log

继续尝试恢复

SQL> recover database using backup controlfile;
ORA-00279: change 2118713 generated at 07/13/2012 02:58:43 needed for thread 1
ORA-00289: suggestion : /u01/oracle/oradata/archivelog/ora11g/1_1_788497123.dbf
ORA-00280: change 2118713 for thread 1 is in sequence #1
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/u01/oracle/oradata/ora11g/redo02.log
ORA-00326: log begins at change 2118979, need earlier change 2118713
ORA-00334: archived log: '/u01/oracle/oradata/ora11g/redo02.log'

关于备份控制文件使用using backup controlfile总结:恢复的启动控制文件备份的scn,需要该控制文件备份后的所有归档日志.对于当前这个非归档,而且redo被覆盖的库,该方法无法正常恢复

重建控制文件并做controlf 3 dump

SQL> alter database backup controlfile to trace as '/tmp/ctl.trace';
Database altered.
SQL> shutdown immediate
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
SQL> STARTUP NOMOUNT
CREATE CONTROLFILE REUSE DATABASE "ORA11G" NORESETLOGS  NOARCHIVELOG
    MAXLOGFILES 16
    MAXLOGMEMBERS 3
    MAXDATAFILES 100
    MAXINSTANCES 8
    MAXLOGHISTORY 292
LOGFILE
  GROUP 1 '/u01/oracle/oradata/ora11g/redo01.log'  SIZE 15M BLOCKSIZE 512,
  GROUP 2 '/u01/oracle/oradata/ora11g/redo02.log'  SIZE 15M BLOCKSIZE 512,
  GROUP 3 '/u01/oracle/oradata/ora11g/redo03.log'  SIZE 15M BLOCKSIZE 512
DATAFILE
  '/u01/oracle/oradata/ora11g/system01.dbf',
  '/u01/oracle/oradata/ora11g/sysaux01.dbf',
  '/u01/oracle/oradata/ora11g/undotbs01.dbf',
  '/u01/oracle/oradata/ora11g/users01.dbf',
  '/u01/oracle/oradata/ora11g/xifenfei02.dbf'
CHARACTER SET AL32UTF8
;
ORACLE instance started.
Total System Global Area  523108352 bytes
Fixed Size                  1346052 bytes
Variable Size             432014844 bytes
Database Buffers           83886080 bytes
Redo Buffers                5861376 bytes
SQL>   2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18
Control file created.
SQL> oradebug setmypid;
Statement processed.
SQL> oradebug dump controlf 3;
Statement processed.
SQL> oradebug tracefile_name;
/u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_1867.trc

尝试数据库恢复

SQL>  recover database using backup controlfile;
ORA-00279: change 2118981 generated at 07/13/2012 03:01:42 needed for thread 1
ORA-00289: suggestion : /u01/oracle/oradata/archivelog/ora11g/1_5_788497123.dbf
ORA-00280: change 2118981 for thread 1 is in sequence #5
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
ORA-00308: cannot open archived log
'/u01/oracle/oradata/archivelog/ora11g/1_5_788497123.dbf'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
SQL>
SQL> select to_char(2118981,'xxxxxxx') from dual;
TO_CHAR(2118981,
----------------
  205545
SQL> set linesize 150
SQL> select file#,to_char(checkpoint_change#,'9999999999999999') "SCN",
  2  to_char(RESETLOGS_CHANGE#,'9999999999999999') "RESETLOGS SCN",FUZZY
  3  from v$datafile_header;
     FILE# SCN                                RESETLOGS SCN                      FUZZY
---------- ---------------------------------- ---------------------------------- ------
         1           2118981                            2118577                  YES
         2           2118981                            2118577                  YES
         3           2118981                            2118577                  YES
         4           2118981                            2118577                  YES
         6           2118981                            2118577                  YES
SQL> select file#,to_char(checkpoint_change#,'999999999999999') "SCN",
  2  to_char(last_change#,'999999999999999')"STOP_SCN" from v$datafile;
     FILE# SCN                              STOP_SCN
---------- -------------------------------- --------------------------------
         1          2118981
         2          2118981
         3          2118981
         4          2118981
         6          2118981

分析 controlf 3 dump文件

***************************************************************************
DATABASE ENTRY
***************************************************************************
 (size = 316, compat size = 316, section max = 1, section in-use = 1,
  last-recid= 0, old-recno = 0, last-recno = 0)
 (extent = 1, blkno = 1, numrecs = 1)
 07/13/2012 03:24:51
 DB Name "ORA11G"
 Database flags = 0x00400102 0x00001000
 Controlfile Creation Timestamp  07/13/2012 03:24:51
 Incmplt recovery scn: 0x0000.00000000
 Resetlogs scn: 0x0000.002053b1 Resetlogs Timestamp  07/13/2012 02:58:43
 Prior resetlogs scn: 0x0000.0016eaab Prior resetlogs Timestamp  05/01/2012 13:14:57
 Redo Version: compatible=0xb200000
 #Data files = 5, #Online files = 5
 Database checkpoint: Thread=1 scn: 0x0000.00205543
***************************************************************************
DATA FILE RECORDS
***************************************************************************
 (size = 520, compat size = 520, section max = 100, section in-use = 6,
  last-recid= 0, old-recno = 0, last-recno = 0)
 (extent = 1, blkno = 11, numrecs = 100)
DATA FILE #1:
  name #8: /u01/oracle/oradata/ora11g/system01.dbf
creation size=0 block size=8192 status=0x12 head=8 tail=8 dup=1
 tablespace 0, index=1 krfil=1 prev_file=0
 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
 Checkpoint cnt:760 scn: 0x0000.00205545 07/13/2012 03:01:42
 Stop scn: 0xffff.ffffffff 07/13/2012 03:24:51
 Creation Checkpointed at scn:  0x0000.00000007 09/18/2011 17:33:47

完成恢复

SQL> recover database using backup controlfile;
ORA-00279: change 2118981 generated at 07/13/2012 03:01:42 needed for thread 1
ORA-00289: suggestion : /u01/oracle/oradata/archivelog/ora11g/1_5_788497123.dbf
ORA-00280: change 2118981 for thread 1 is in sequence #5
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/u01/oracle/oradata/ora11g/redo02.log
Log applied.
Media recovery complete.

关于重建控制文件后使用using backup总结:重建控制文件后,恢复的起点是datafile header scn 最小值,需要改scn之后的所有日志