db_block_checksum实质是通过flg_kcbh来控制block checksum

db_block_checksum 用于DBWn和direct loader数据块写入到磁盘时,基于块内的所有字节计算得出一个校验值并将其写入块头。在该参数设置为typical和full时,当读入时候重新计算校验和写出时候的校验对比,如果不同则认为是块损坏。如果设置为FULL模式,则基于update/delete应用程序语句级别的改变发生后,校验值会被重新计算并写入。同时对于日志块,在写入之前,同样会生产校验值并写入到块头。该参数主要是防止IO硬件和IO子系统的错误。
这里提示我们是在系统参数级别使用db_block_checksum来控制block是否进行验证,那在block本身级别,是否有类似的值来控制,实现对block值的checksum?通过dump结合bbed给出相关答案
db_block_checksum为TYPICAL测试

CDB_CDB$ROOT@SYS> show parameter db_block_checksum;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_block_checksum                    string      TYPICAL
CDB_CDB$ROOT@SYS> conn / as sysdba
已连接。
CDB_CDB$ROOT@SYS> alter session set container=pdb;
会话已更改。
CDB_CDB$ROOT@SYS> alter database open;
数据库已更改。
CDB_CDB$ROOT@SYS> alter session set current_schema=chf;
会话已更改。
CDB_CDB$ROOT@SYS> create table t_xifenfei(id number,name varchar2(100));
表已创建。
CDB_CDB$ROOT@SYS> insert into t_xifenfei values(1,'www.xifenfei.com');
已创建 1 行。
CDB_CDB$ROOT@SYS> alter system checkpoint;
系统已更改。
CDB_CDB$ROOT@SYS>  select dbms_rowid.rowid_relative_fno(rowid) file_no,
2  dbms_rowid.rowid_block_number(rowid) block_no  from t_xifenfei;
   FILE_NO   BLOCK_NO
---------- ----------
         9      19229
CDB_CDB$ROOT@SYS> SELECT NAME FROM V$DATAFILE WHERE FILE#=9;
NAME
---------------------------------------------------------------------
E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF

dump 该block

buffer tsn: 3 rdba: 0x02404b1d (9/19229)
scn: 0xb8c.3c232935 seq: 0x01 flg: 0x06 tail: 0x29350601
frmt: 0x02 chkval: 0xe08b type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
tab 0, row 0, @0x1f81
tl: 23 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [16]  77 77 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d
end_of_block_dump
End dump data blocks tsn: 3 file#: 9 minblk 19229 maxblk 19229

这里可以看到因为db_block_checksum=TYPICAL,因此插入记录的时候,dump block发现flg: 0x06 和 chkval: 0xe08b

使用bbed查看相关记录
补充:在win系统中,bbed查看block和block num相差1

BBED> p kcbh
struct kcbh, 20 bytes                       @0
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub1 spare1_kcbh                          @2        0x00
   ub1 spare2_kcbh                          @3        0x00
   ub4 rdba_kcbh                            @4        0x02404b1d
   ub4 bas_kcbh                             @8        0x3c232935
   ub2 wrp_kcbh                             @12       0x0b8c
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x06 (KCBHFDLC, KCBHFCKV)
   ub2 chkval_kcbh                          @16       0xe08b
   ub2 spare3_kcbh                          @18       0x0000

通过对比bbed和dump出来数据,可以得出flg: 0x06(flg_kcbh),chkval: 0xe08b(chkval_kcbh)

使用bbed修改block验证chkval_kcbh
提问:在使用bbed修改block的时候,一般都需要使用sum apply 处理下,为什么呢?

BBED> d /v
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19230   Offsets: 8165 to 8191  Dba:0x00000000
-------------------------------------------------------
 2c010202 c1021077 77772e78 6966656e l ,...?.www.xifen
 6665692e 636f6d01 063529            l fei.com..5)
 <16 bytes per line>
--没有修改任何值,sum的current和required值相同,而且和chkval_kcbh也相同
BBED> sum
Check value for File 0, Block 19230:
current = 0xe08b, required = 0xe08b
--尝试修改值
BBED> m /x 78 offset 8173
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19230            Offsets: 8173 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 78772e78 6966656e 6665692e 636f6d01 063529
 <32 bytes per line>
BBED> d /v offset 8165
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19230   Offsets: 8165 to 8191  Dba:0x00000000
-------------------------------------------------------
 2c010202 c1021077 78772e78 6966656e l ,...?.wxw.xifen
 6665692e 636f6d01 063529            l fei.com..5)
 <16 bytes per line>
--发现current<>required
BBED> sum
Check value for File 0, Block 19230:
current = 0xe08b, required = 0xef8b
--apply把current修改为required值
BBED> sum apply
Check value for File 0, Block 19230:
current = 0xef8b, required = 0xef8b
--发现chkval_kcbh也修改为了required值
BBED> p kcbh.chkval_kcbh
ub2 chkval_kcbh                             @16       0xef8b

通过这里可以发现,在flg_kcbh含(KCBHFCKV)的情况下,如果block发生改变,则运行sum apply之后chkval_kcbh也发生改变

继续dump block

buffer tsn: 3 rdba: 0x02404b1d (9/19229)
scn: 0xb8c.3c232935 seq: 0x01 flg: 0x06 tail: 0x29350601
frmt: 0x02 chkval: 0xef8b type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
tab 0, row 0, @0x1f81
tl: 23 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [16]  77 78 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d
end_of_block_dump
End dump data blocks tsn: 3 file#: 9 minblk 19229 maxblk 19229

证明上述结论正确:dump.flg=bbed.flg_kcbh,dump.chkval=bbed.chkval_kcbh

db_block_checksum为FALSE测试

CDB_CDB$ROOT@SYS> alter system set db_block_checksum=false;
系统已更改。
CDB_CDB$ROOT@SYS> drop table t_xifenfei_new purge;
表已删除。
CDB_CDB$ROOT@SYS> alter session set current_schema=chf;
会话已更改。
CDB_CDB$ROOT@SYS>  create table t_xifenfei_new(id number,name varchar2(100));
表已创建。
CDB_CDB$ROOT@SYS>  insert into t_xifenfei_new values(1,'www.orasos.com');
已创建 1 行。
CDB_CDB$ROOT@SYS> commit;
提交完成。
CDB_CDB$ROOT@SYS>  select dbms_rowid.rowid_relative_fno(rowid) file_no,
 2  dbms_rowid.rowid_block_number(rowid) block_no  from t_xifenfei_new;
   FILE_NO   BLOCK_NO
---------- ----------
         9      19237
CDB_CDB$ROOT@SYS> alter system checkpoint;
系统已更改。
CDB_CDB$ROOT@SYS> alter system dump datafile 9 block 19237;
系统已更改。

dump block分析

buffer tsn: 3 rdba: 0x02404b25 (9/19237)
scn: 0xb8c.3c23c8b2 seq: 0x01 flg: 0x02 tail: 0xc8b20601
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
tab 0, row 0, @0x1f83
tl: 21 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [14]  77 77 77 2e 6f 72 61 73 6f 73 2e 63 6f 6d
end_of_block_dump
End dump data blocks tsn: 3 file#: 9 minblk 19237 maxblk 19237

这里可以发现当设置db_block_checksum=false之时,插入数据,显示flg: 0x02,chkval: 0x0000,由此猜测无对block写入进行部分验证(tailchk依然验证)

使用bbed修改block

BBED> set filename 'E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF'
        FILENAME        E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF
BBED> set blocksize 19238
BBED-00108: illegal BLOCKSIZE (19238) specified
BBED> set blocksize 8192
        BLOCKSIZE       8192
BBED> set block 8192
        BLOCK#          8192
BBED> set blocksize 19238
BBED-00108: illegal BLOCKSIZE (19238) specified
BBED> set block 19238
        BLOCK#          19238
BBED> p kcbh
struct kcbh, 20 bytes                       @0
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub1 spare1_kcbh                          @2        0x00
   ub1 spare2_kcbh                          @3        0x00
   ub4 rdba_kcbh                            @4        0x02404b25
   ub4 bas_kcbh                             @8        0x3c23c8b2
   ub2 wrp_kcbh                             @12       0x0b8c
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x02 (KCBHFDLC)
   ub2 chkval_kcbh                          @16       0x0000
   ub2 spare3_kcbh                          @18       0x0000
--这里看到flg_kcbh与chkval_kcbh和dump结果一致
BBED> sum
Check value for File 0, Block 19238:
current = 0x0000, required = 0x0000
--required为0,表示不验证
BBED> d /v offset 8174
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238   Offsets: 8174 to 8191  Dba:0x00000000
-------------------------------------------------------
 7777772e 6f726173 6f732e63 6f6d0106 l www.orasos.com..
 b2c8                                l 踩
 <16 bytes per line>
BBED> set mode edit
        MODE            Edit
BBED> m /x 78
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 8174 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 7877772e 6f726173 6f732e63 6f6d0106 b2c8
 <32 bytes per line>
BBED> d /v
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238   Offsets: 8174 to 8191  Dba:0x00000000
-------------------------------------------------------
 7877772e 6f726173 6f732e63 6f6d0106 l xww.orasos.com..
 b2c8                                l 踩
 <16 bytes per line>
--修改了block,但是sum依然提示required为0
BBED> sum
Check value for File 0, Block 19238:
current = 0x0000, required = 0x0000
BBED> sum apply
Check value for File 0, Block 19238:
current = 0x0000, required = 0x0000

dump block核对

buffer tsn: 3 rdba: 0x02404b25 (9/19237)
scn: 0xb8c.3c23c8b2 seq: 0x01 flg: 0x02 tail: 0xc8b20601
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
tab 0, row 0, @0x1f83
tl: 21 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [14]  78 77 77 2e 6f 72 61 73 6f 73 2e 63 6f 6d
end_of_block_dump
End dump data blocks tsn: 3 file#: 9 minblk 19237 maxblk 19237

修改flg_kcbh测试

BBED> m /x 06 offset 15
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets:   15 to   30           Dba:0x00000000
------------------------------------------------------------------------
 06000000 00010000 00d86b01 00aec823
 <32 bytes per line>
BBED> sum
Check value for File 0, Block 19238:
current = 0x0000, required = 0x04b3
--修改flg_kcbh=x06后,sum中的required出现了非0值,表示已经启动了block完整性检测
BBED> m /x 79 offset 1876
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 1876 to 1891           Dba:0x00000000
------------------------------------------------------------------------
 79391b01 0000c24e 07000205 c406573f
 <32 bytes per line>
BBED> sum
Check value for File 0, Block 19238:
current = 0x0000, required = 0x049d
--修改了chkval_kcbh值
BBED> sum apply
Check value for File 0, Block 19238:
current = 0x049d, required = 0x049d

再次检查dump block

buffer tsn: 3 rdba: 0x02404b25 (9/19237)
scn: 0xb8c.3c23c8b2 seq: 0x01 flg: 0x06 tail: 0xc8b20601
frmt: 0x02 chkval: 0x049d type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
tl: 21 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [14]  78 77 77 2e 6f 72 61 73 6f 73 2e 63 6f 6d
end_of_block_dump
End dump data blocks tsn: 3 file#: 9 minblk 19237 maxblk 19237

证明修改flg_kcbh后,block的sum验证起效

完整测试

--flg_kcbh=2,chkval_kcbh为04b3,修改block观察变化
BBED> sum
Check value for File 0, Block 19238:
current = 0x04b3, required = 0x04b3
BBED> p kcbh
struct kcbh, 20 bytes                       @0
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub1 spare1_kcbh                          @2        0x00
   ub1 spare2_kcbh                          @3        0x00
   ub4 rdba_kcbh                            @4        0x02404b25
   ub4 bas_kcbh                             @8        0x3c23c8b2
   ub2 wrp_kcbh                             @12       0x0b8c
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x02 (KCBHFDLC)
   ub2 chkval_kcbh                          @16       0x04b3
   ub2 spare3_kcbh                          @18       0x0000
BBED> m /x 11 offset 7184
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 7184 to 7199           Dba:0x00000000
------------------------------------------------------------------------
 110000c2 39b50002 05c40721 07280000
 <32 bytes per line>
BBED> undo
BBED> modify /x 00 filename 'E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF' block 19238. offset 7184.
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 7184 to 7199           Dba:0x00000000
------------------------------------------------------------------------
 000000c2 39b50002 05c40721 07280000
 <32 bytes per line>
BBED> m /x 11 offset 8174
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 8174 to 8189           Dba:0x00000000
------------------------------------------------------------------------
 1177772e 6f726173 6f732e63 6f6d0106
 <32 bytes per line>
BBED> sum
Check value for File 0, Block 19238:
current = 0x04b3, required = 0x04b3
--证明当flg_kcbh=2修改block之后chkval_kcbh依然为04b3,证明flg_kcbh不含(KCBHFCKV),
--修改block不会导致chkval_kcbh改变,也就是说,该block为启用db_block_checksum
--flg_kcbh=6的时候验证修改block导致的chkval_kcbh变化
BBED> m /x 06 offset 15
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets:   15 to   30           Dba:0x00000000
------------------------------------------------------------------------
 06b30400 00010000 00d86b01 00aec823
 <32 bytes per line>
--刚刚修改flg_kcbh=6,马上看到required非0
BBED> sum
Check value for File 0, Block 19238:
current = 0x04b3, required = 0x04da
BBED> sum apply
Check value for File 0, Block 19238:
current = 0x04da, required = 0x04da
--尝试修改block
BBED> m /x 22 offset 8174
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 8174 to 8189           Dba:0x00000000
------------------------------------------------------------------------
 2277772e 6f726173 6f732e63 6f6d0106
 <32 bytes per line>
--required发生改变
BBED> sum
Check value for File 0, Block 19238:
current = 0x04da, required = 0x04e9
BBED> sum apply
Check value for File 0, Block 19238:
current = 0x04e9, required = 0x04e9
--修改为以前值,验证required
BBED> m /x 11 offset 8174
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 8174 to 8189           Dba:0x00000000
------------------------------------------------------------------------
 1177772e 6f726173 6f732e63 6f6d0106
 <32 bytes per line>
BBED> sum
Check value for File 0, Block 19238:
current = 0x04e9, required = 0x04da
BBED> sum apply
Check value for File 0, Block 19238:
current = 0x04da, required = 0x04da
--发现当block修改回来后,required值和以前一致(也就是说chkval_kcbh值还原)
--进步一说明chkval_kcbh取决于block内部值

通过相关测试db_block_checksum是在实例级别启动block checksum,但是具体到每个block是通过flg_kcbh来控制,而具体体现是在chkval_kcbh值上

使用UltraEdit修改oracle二进制文件

对数据库OPEN的过程做10046,因为数据库没有open,那写着语句是写在什么地方的呢?通过使用UltraEdit打开$ORACLE_HOME/bin/oracle文件,可以发现很多sql语句,而且与open过程的10046很多一致.

044acdc0h: 73 65 6C 65 63 74 20 75 2E 6E 61 6D 65 2C 20 6F ; select u.name, o
044acdd0h: 32 2E 6E 61 6D 65 2C 20 6F 32 2E 6F 62 6A 23 20 ; 2.name, o2.obj#
044acde0h: 66 72 6F 6D 20 69 6E 64 24 20 69 2C 20 6F 62 6A ; from ind$ i, obj
044acdf0h: 24 20 6F 31 2C 20 6F 62 6A 24 20 6F 32 2C 20 75 ; $ o1, obj$ o2, u
044ace00h: 73 65 72 24 20 75 20 77 68 65 72 65 20 6F 31 2E ; ser$ u where o1.
044ace10h: 6F 77 6E 65 72 23 20 3D 20 3A 31 20 20 61 6E 64 ; owner# = :1  and
044ace20h: 20 6F 31 2E 74 79 70 65 23 20 3D 20 32 20 61 6E ;  o1.type# = 2 an
044ace30h: 64 20 69 2E 74 79 70 65 23 20 3D 20 39 20 61 6E ; d i.type# = 9 an
044ace40h: 64 20 69 2E 62 6F 23 20 3D 20 6F 31 2E 6F 62 6A ; d i.bo# = o1.obj
044ace50h: 23 20 20 61 6E 64 20 69 2E 6F 62 6A 23 20 3D 20 ; #  and i.obj# =
044ace60h: 6F 32 2E 6F 62 6A 23 20 61 6E 64 20 6F 32 2E 6F ; o2.obj# and o2.o
044ace70h: 77 6E 65 72 23 20 3D 20 75 2E 75 73 65 72 23 20 ; wner# = u.user#
044ace80h: 61 6E 64 20 69 2E 74 79 70 65 23 20 3D 20 39 20 ; and i.type# = 9
044ace90h: 61 6E 64 20 6F 31 2E 6F 77 6E 65 72 23 20 21 3D ; and o1.owner# !=
044acea0h: 20 6F 32 2E 6F 77 6E 65 72 23 00 00 00 00 00 00 ;  o2.owner#......
044aceb0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
044acec0h: 73 65 6C 65 63 74 20 75 2E 6E 61 6D 65 2C 20 6F ; select u.name, o
044aced0h: 2E 6E 61 6D 65 2C 20 6F 2E 6F 62 6A 23 20 66 72 ; .name, o.obj# fr
044acee0h: 6F 6D 20 6F 62 6A 24 20 6F 2C 20 75 73 65 72 24 ; om obj$ o, user$
044acef0h: 20 75 2C 20 69 6E 64 24 20 69 20 77 68 65 72 65 ;  u, ind$ i where
044acf00h: 20 6F 2E 6F 77 6E 65 72 23 3D 3A 31 20 61 6E 64 ;  o.owner#=:1 and
044acf10h: 20 6F 2E 6F 77 6E 65 72 23 3D 75 2E 75 73 65 72 ;  o.owner#=u.user
044acf20h: 23 20 61 6E 64 20 6F 2E 6F 62 6A 23 3D 69 2E 6F ; # and o.obj#=i.o
044acf30h: 62 6A 23 20 61 6E 64 20 69 2E 74 79 70 65 23 3D ; bj# and i.type#=
044acf40h: 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; 9...............
044acf50h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
044acf60h: 73 65 6C 65 63 74 20 76 6E 61 6D 65 20 66 72 6F ; select vname fro
044acf70h: 6D 20 73 79 73 2E 73 6E 61 70 24 20 77 68 65 72 ; m sys.snap$ wher
044acf80h: 65 20 73 6F 77 6E 65 72 20 3D 20 3A 31 20 61 6E ; e sowner = :1 an
044acf90h: 64 20 69 6E 73 74 73 69 74 65 20 3D 20 30 20 20 ; d instsite = 0
044acfa0h: 61 6E 64 20 28 62 69 74 61 6E 64 28 66 6C 61 67 ; and (bitand(flag
044acfb0h: 2C 20 32 36 38 34 33 35 34 35 36 29 20 3D 20 30 ; , 268435456) = 0
044acfc0h: 20 6F 72 20 62 69 74 61 6E 64 28 6F 62 6A 66 6C ;  or bitand(objfl
044acfd0h: 61 67 2C 20 33 32 29 20 3D 20 30 29 00 00 00 00 ; ag, 32) = 0)....

既然数据库OPEN的过程很多sql是在oracle二进制文件中写的,那是否可以考虑通过修改二进制文件中的sql语句来改变执行计划,甚至可以尝试修改这些语句使得异常的数据库能够正常open.这里演示通过修改sql语句来展示改变执行计划(不使用index,而改用全表扫描)
10046 捕获语句和执行计划

PARSING IN CURSOR #2 len=116 dep=2 uid=0 oct=3 lid=0 tim=1345475346332403 hv=854877822 ad='2f2be060'
select o.owner#,o.name,o.namespace,o.remoteowner,o.linkname,o.subname,o.dataobj#,o.flags from obj$ o where o.obj#=:1
END OF STMT
PARSE #2:c=1000,e=1258,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,tim=1345475346332394
BINDS #2:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=b69ef000  bln=22  avl=03  flg=05
  value=893
EXEC #2:c=2000,e=1382,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,tim=1345475346333978
WAIT #2: nam='db file sequential read' ela= 17820 file#=1 block#=220 blocks=1 obj#=-1 tim=1345475346351927
FETCH #2:c=1000,e=18054,p=1,cr=3,cu=0,mis=0,r=1,dep=2,og=4,tim=1345475346352100
STAT #2 id=1 cnt=1 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ$ (cr=3 pr=1 pw=0 time=18047 us)'
STAT #2 id=2 cnt=1 pid=1 pos=1 obj=36 op='INDEX UNIQUE SCAN I_OBJ1 (cr=2 pr=1 pw=0 time=18014 us)'

使用UE查看该sql在二进制文件内容

0459c650h: 3A 35 2C 3A 36 2C 3A 37 29 00 00 00 00 00 00 00 ; :5,:6,:7).......
0459c660h: 73 65 6C 65 63 74 20 6F 2E 6F 77 6E 65 72 23 2C ; select o.owner#,
0459c670h: 6F 2E 6E 61 6D 65 2C 6F 2E 6E 61 6D 65 73 70 61 ; o.name,o.namespa
0459c680h: 63 65 2C 6F 2E 72 65 6D 6F 74 65 6F 77 6E 65 72 ; ce,o.remoteowner
0459c690h: 2C 6F 2E 6C 69 6E 6B 6E 61 6D 65 2C 6F 2E 73 75 ; ,o.linkname,o.su
0459c6a0h: 62 6E 61 6D 65 2C 6F 2E 64 61 74 61 6F 62 6A 23 ; bname,o.dataobj#
0459c6b0h: 2C 6F 2E 66 6C 61 67 73 20 66 72 6F 6D 20 6F 62 ; ,o.flags from ob
0459c6c0h: 6A 24 20 6F 20 77 68 65 72 65 20 6F 2E 6F 62 6A ; j$ o where o.obj
0459c6d0h: 23 3D 3A 31 00 00 00 00 00 00 00 00 00 00 00 00 ; #=:1............
0459c6e0h: 73 65 6C 65 63 74 20 74 79 70 65 23 2C 63 6F 6E ; select type#,con

修改二进制文件中sql语句

0459c650h: 3A 35 2C 3A 36 2C 3A 37 29 00 00 00 00 00 00 00 ; :5,:6,:7).......
0459c660h: 73 65 6C 65 63 74 20 6F 2E 6F 77 6E 65 72 23 2C ; select o.owner#,
0459c670h: 6F 2E 6E 61 6D 65 2C 6F 2E 6E 61 6D 65 73 70 61 ; o.name,o.namespa
0459c680h: 63 65 2C 6F 2E 72 65 6D 6F 74 65 6F 77 6E 65 72 ; ce,o.remoteowner
0459c690h: 2C 6F 2E 6C 69 6E 6B 6E 61 6D 65 2C 6F 2E 73 75 ; ,o.linkname,o.su
0459c6a0h: 62 6E 61 6D 65 2C 6F 2E 64 61 74 61 6F 62 6A 23 ; bname,o.dataobj#
0459c6b0h: 2C 6F 2E 66 6C 61 67 73 20 66 72 6F 6D 20 6F 62 ; ,o.flags from ob
0459c6c0h: 6A 24 20 6F 20 77 68 65 72 65 20 6F 62 6A 23 2B ; j$ o where obj#+
0459c6d0h: 30 3D 3A 31 00 00 00 00 00 00 00 00 00 00 00 00 ; 0=:1............
0459c6e0h: 73 65 6C 65 63 74 20 74 79 70 65 23 2C 63 6F 6E ; select type#,con

这里通过对obj#加上一个常量0,使得该sql在执行之时不会使用obj$.obj#上的index,从而使用全表扫描,来实现我们修改执行计划,屏蔽该index的目的
修改后的10046观察

PARSING IN CURSOR #2 len=116 dep=2 uid=0 oct=3 lid=0 tim=1345475781593851 hv=493726595 ad='2f2ba76c'
select o.owner#,o.name,o.namespace,o.remoteowner,o.linkname,o.subname,o.dataobj#,o.flags from obj$ o where obj#+0=:1
END OF STMT
PARSE #2:c=1000,e=1095,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,tim=1345475781593840
BINDS #2:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=b6a86000  bln=22  avl=03  flg=05
  value=893
EXEC #2:c=2999,e=1603,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,tim=1345475781595754
WAIT #2: nam='db file sequential read' ela= 18417 file#=1 block#=121 blocks=1 obj#=-1 tim=1345475781614344
WAIT #2: nam='db file sequential read' ela= 1000 file#=1 block#=123 blocks=1 obj#=-1 tim=1345475781615726
WAIT #2: nam='db file sequential read' ela= 1462 file#=1 block#=124 blocks=1 obj#=-1 tim=1345475781617720
WAIT #2: nam='db file sequential read' ela= 492 file#=1 block#=125 blocks=1 obj#=-1 tim=1345475781618452
WAIT #2: nam='db file sequential read' ela= 1358 file#=1 block#=126 blocks=1 obj#=-1 tim=1345475781620361
WAIT #2: nam='db file sequential read' ela= 500 file#=1 block#=127 blocks=1 obj#=-1 tim=1345475781621039
WAIT #2: nam='db file sequential read' ela= 63132 file#=1 block#=128 blocks=1 obj#=-1 tim=1345475781684316
WAIT #2: nam='db file sequential read' ela= 989 file#=1 block#=5233 blocks=1 obj#=-1 tim=1345475781685559
WAIT #2: nam='db file sequential read' ela= 792 file#=1 block#=5234 blocks=1 obj#=-1 tim=1345475781686583
FETCH #2:c=7999,e=90994,p=9,cr=13,cu=0,mis=0,r=1,dep=2,og=4,tim=1345475781686865
STAT #2 id=1 cnt=1 pid=0 pos=1 obj=18 op='TABLE ACCESS FULL OBJ$ (cr=13 pr=9 pw=0 time=90977 us)'

很明显,到这里我们通过UE修改oracle二进制文件,实现了sql语句执行计划的更改.
免责申明:本方法仅供在非常特殊或者测试环境下使用,一般生产环境请勿模仿,否则后果自负

误杀进程导致rac hang住

有客户反馈系统hang住,不能归档,需要我们紧急介入分析
节点1日志
出现redo不能归档,redo日志都已经被写满,人工执行了ALTER SYSTEM ARCHIVE LOG CURRENT,数据库就开始把redo全部归档,但是后面产生的redo又不能归档,当redo全部写满之后,数据库有出现大量log file switch (archiving needed)等待

Tue Sep 24 22:05:37 2013
Thread 1 advanced to log sequence 47282 (LGWR switch)
  Current log# 6 seq# 47282 mem# 0: +DATA/q9db/onlinelog/group_6.1244.818697409
Tue Sep 24 22:07:31 2013
ORACLE Instance q9db1 - Can not allocate log, archival required
Thread 1 cannot allocate new log, sequence 47283
All online logs needed archiving
  Current log# 6 seq# 47282 mem# 0: +DATA/q9db/onlinelog/group_6.1244.818697409
Tue Sep 24 22:28:17 2013
ALTER SYSTEM ARCHIVE LOG
Archived Log entry 259646 added for thread 1 sequence 47266 ID 0x354620c2 dest 1:
Tue Sep 24 22:28:18 2013
Thread 1 advanced to log sequence 47283 (LGWR switch)
  Current log# 7 seq# 47283 mem# 0: +DATA/q9db/onlinelog/group_7.1243.818697415
Archived Log entry 259647 added for thread 1 sequence 47267 ID 0x354620c2 dest 1:
Archived Log entry 259648 added for thread 1 sequence 47268 ID 0x354620c2 dest 1:
Archived Log entry 259649 added for thread 1 sequence 47269 ID 0x354620c2 dest 1:
Archived Log entry 259650 added for thread 1 sequence 47270 ID 0x354620c2 dest 1:
Archived Log entry 259651 added for thread 1 sequence 47271 ID 0x354620c2 dest 1:
Archived Log entry 259652 added for thread 1 sequence 47272 ID 0x354620c2 dest 1:
Tue Sep 24 22:28:28 2013
Archived Log entry 259653 added for thread 1 sequence 47273 ID 0x354620c2 dest 1:
Archived Log entry 259654 added for thread 1 sequence 47274 ID 0x354620c2 dest 1:
Archived Log entry 259655 added for thread 1 sequence 47275 ID 0x354620c2 dest 1:
Archived Log entry 259656 added for thread 1 sequence 47276 ID 0x354620c2 dest 1:
Archived Log entry 259657 added for thread 1 sequence 47277 ID 0x354620c2 dest 1:
Archived Log entry 259658 added for thread 1 sequence 47278 ID 0x354620c2 dest 1:
Archived Log entry 259659 added for thread 1 sequence 47279 ID 0x354620c2 dest 1:
Tue Sep 24 22:28:39 2013
Archived Log entry 259660 added for thread 1 sequence 47280 ID 0x354620c2 dest 1:
Archived Log entry 259661 added for thread 1 sequence 47281 ID 0x354620c2 dest 1:
Archived Log entry 259662 added for thread 1 sequence 47282 ID 0x354620c2 dest 1:
Tue Sep 24 22:29:39 2013
Thread 1 advanced to log sequence 47284 (LGWR switch)
  Current log# 8 seq# 47284 mem# 0: +DATA/q9db/onlinelog/group_8.1242.818697417
Tue Sep 24 22:31:18 2013
Thread 1 advanced to log sequence 47285 (LGWR switch)
  Current log# 16 seq# 47285 mem# 0: +DATA/q9db/onlinelog/group_16.1884.827003545
Thread 1 advanced to log sequence 47286 (LGWR switch)
  Current log# 17 seq# 47286 mem# 0: +DATA/q9db/onlinelog/group_17.1885.827003587

节点2日志
节点2中出现大量的IPC Send timeout

Tue Sep 24 15:22:19 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
…………
Tue Sep 24 18:51:55 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
Tue Sep 24 18:57:54 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
Receiver: inst 1 binc 464003926 ospid 1566
Tue Sep 24 19:03:57 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
Receiver: inst 1 binc 464003926 ospid 1566
Tue Sep 24 19:09:53 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
…………
Tue Sep 24 20:22:00 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]

节点1因为不能归档hang住,节点2紧接着也就hang住。对节点1hang住之时对两个节点分别做systemstate dump,使用ass进行分析得到节点1和节点2的记录大体如下:
节点1

393:waiting for 'log file switch (archiving needed)'
394:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
395:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
397:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
398:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
451:waiting for 'SQL*Net message from client'
469:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
470:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
471:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
618:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
626:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
NO BLOCKING PROCESSES FOUND

节点2

515:waiting for 'gc buffer busy acquire'
     Cmd: Insert
516:waiting for 'gc buffer busy acquire'
     Cmd: Insert
517:waiting for 'gc buffer busy acquire'
     Cmd: Insert
518:waiting for 'gc buffer busy acquire'
     Cmd: Insert
519:waiting for 'gc buffer busy acquire'
     Cmd: Insert
520:waiting for 'gc buffer busy acquire'
     Cmd: Select
521:waiting for 'gc current request'
     Cmd: Insert
522:waiting for 'enq: TX - row lock contention'[Enq TX-00BA0020-001E3E3C]
     Cmd: Select
523:waiting for 'gc buffer busy acquire'
     Cmd: Insert
524:waiting for 'SQL*Net message from client'
525:waiting for 'gc buffer busy acquire'
     Cmd: Insert
526:waiting for 'gc buffer busy acquire'
     Cmd: Insert
527:waiting for 'enq: TX - row lock contention'[Enq TX-00BA0020-001E3E3C]
     Cmd: Select
528:waiting for 'SQL*Net message from client'
529:waiting for 'gc buffer busy acquire'
     Cmd: Select
                    Resource Holder State
    Enq TX-0005001E-0022374F   223: waiting for 'gc current request'
    Enq TX-0047001B-002BCEB2   247: waiting for 'gc current request'
    Enq TX-015B001E-000041FF   330: waiting for 'gc current request'
    Enq TX-00010010-002EA7CD   179: waiting for 'gc current request'
    Enq TX-00BA0020-001E3E3C    ??? Blocker
Object Names
~~~~~~~~~~~~
Enq TX-0005001E-0022374F
Enq TX-0047001B-002BCEB2
Enq TX-015B001E-000041FF
Enq TX-00010010-002EA7CD
Enq TX-00BA0020-001E3E3C

通过这里,我们可以明白,节点2的很多事务hang住是因为请求gc current request,而该等待是因为节点1无法归档,有些block无法正常传输到节点2,导致节点2一直hang在这里,然后就出现IPC Send timeout;节点1上的事务阻塞甚至hang住是因为无法归档导致.到此需要定位的问题是为什么节点1不能归档
继续分析节点1 alert日志

Tue Sep 24 15:18:20 2013
opidrv aborting process O000 ospid (7332) as a result of ORA-28
Immediate Kill Session#: 1904, Serial#: 1065
Immediate Kill Session: sess: 0x24a2522a38  OS pid: 7338
Immediate Kill Session#: 3597, Serial#: 11107
Immediate Kill Session: sess: 0x24c27cf498  OS pid: 7320
Tue Sep 24 15:18:23 2013
opidrv aborting process W000 ospid (7980) as a result of ORA-28
Tue Sep 24 15:18:23 2013
opidrv aborting process W001 ospid (8560) as a result of ORA-28
Tue Sep 24 15:18:35 2013
LGWR: Detected ARCH process failure
LGWR: Detected ARCH process failure
LGWR: Detected ARCH process failure
LGWR: Detected ARCH process failure
LGWR: STARTING ARCH PROCESSES
Tue Sep 24 15:18:35 2013
ARC0 started with pid=66, OS id=10793
Tue Sep 24 15:18:35 2013
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db1/trace/q9db1_nsa2_12635.trc:
ORA-00028: your session has been killed
LNS: Failed to archive log 8 thread 1 sequence 47156 (28)
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
Thread 1 advanced to log sequence 47157 (LGWR switch)
ARC0: STARTING ARCH PROCESSES
  Current log# 9 seq# 47157 mem# 0: +DATA/q9db/onlinelog/group_9.1241.818697421
Tue Sep 24 15:18:36 2013
ARC1 started with pid=81, OS id=10805
Tue Sep 24 15:18:36 2013
ARC2 started with pid=84, OS id=10807
Tue Sep 24 15:18:36 2013
ARC3 started with pid=87, OS id=10809
ARC1: Archival started
ARC2: Archival started
ARC2: Becoming the 'no FAL' ARCH
ARC2: Becoming the 'no SRL' ARCH
ARC1: Becoming the heartbeat ARCH
Error 1031 received logging on to the standby
PING[ARC1]: Heartbeat failed to connect to standby 'q9adgdg'. Error is 1031.
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Archived Log entry 259135 added for thread 1 sequence 47156 ID 0x354620c2 dest 1:
Error 1031 received logging on to the standby
FAL[server, ARC3]: Error 1031 creating remote archivelog file 'q9adgdg'
FAL[server, ARC3]: FAL archive failed, see trace file.
ARCH: FAL archive failed. Archiver continuing
ORACLE Instance q9db1 - Archival Error. Archiver continuing.
Tue Sep 24 15:18:46 2013
opidrv aborting process O001 ospid (9605) as a result of ORA-28
Tue Sep 24 15:18:46 2013
opidrv aborting process O000 ospid (10813) as a result of ORA-28
Tue Sep 24 15:18:46 2013
Immediate Kill Session#: 2909, Serial#: 369
Immediate Kill Session: sess: 0x24226c7200  OS pid: 9091
Immediate Kill Session#: 3380, Serial#: 30271
Immediate Kill Session: sess: 0x2422782c58  OS pid: 10265
Immediate Kill Session#: 3597, Serial#: 11109
Immediate Kill Session: sess: 0x24c27cf498  OS pid: 10267
Tue Sep 24 15:20:14 2013
Restarting dead background process DIAG
Tue Sep 24 15:20:14 2013
DIAG started with pid=64, OS id=20568
Restarting dead background process PING
Tue Sep 24 15:20:14 2013
PING started with pid=68, OS id=20570
Restarting dead background process LMHB
Tue Sep 24 15:20:14 2013
LMHB started with pid=70, OS id=20572
Restarting dead background process SMCO
…………
Tue Sep 24 15:23:13 2013
ARC0: Detected ARCH process failure
Tue Sep 24 15:23:13 2013
Thread 1 advanced to log sequence 47158 (LGWR switch)
  Current log# 10 seq# 47158 mem# 0: +DATA/q9db/onlinelog/group_10.1240.818697423
ARC0: STARTING ARCH PROCESSES
ARC0: STARTING ARCH PROCESSES COMPLETE
ARC0: Becoming the heartbeat ARCH
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance q9db1 - Archival Error
ORA-00028: your session has been killed

查看ARCn进程

[oracle@q9db01 ~]$ ps -ef|grep ora_ar
oracle   20718 12870  0 22:07 pts/14   00:00:00 grep ora_ar
[oracle@q9db01 ~]$ ps -ef|grep ora_ar
oracle   25998 12870  0 22:07 pts/14   00:00:00 grep ora_ar

这里基本上明白了,因为客户的系统从15:15开始由于中间件程序异常,导致大量会话连接数据库,然后dba为了防止其他业务不受影响,然后开始大量通过alter system kill session,误杀了不少系统进程,包括ARCn(0,1,2,3)进程,在后面ARCn进程因为某种原因无法正常启动,导致redo无法归档,所有的redo组写满系统即hang住,该系统由于大量kill session已经导致了实例本身异常(正常情况ARCn进程kill之后会自动重启),处理方案:先增加redo组配合定时人工归档,等待业务低峰重启节点1,解决问题。温馨提醒:kill进程请小心

ALERT: Disable Transparent HugePages on SLES11, RHEL6, OEL6 and UEK2 Kernels

随着去ioe的潮流越来越汹涌,随着内存越来越便宜,使用Liunx大内存跑核心业务的系统越来越多。因为大内存,多会话,使得配置HugePages也成了必然的选择,在以前的文章中介绍过怎么去配置HugePages(Liunx系统中Oracle使用HugePages配置)以及因为没有配置Hugepage导致的故障(因未配置Hugepage会话数添增悲剧案例).但是随着Linux 6(RedHat 6, OEL 6, SLES 11 and UEK2 kernels等)的广泛应用,他们所在HugePages方面所具有的新特性(Transparent HugePages)很多人可能还没有注意到,Oracle强烈建议无论是rac还是单实例都关闭该特性.
检查是否启用Transparent HugePages方法

To check if the Transparent HugePages are enabled in your server execute the following:
Default/Enabled setting is  [always]:
# cat /sys/kernel/mm/transparent_hugepage/enabled
[always] never
 Disabled setting is [never]:
# cat /sys/kernel/mm/transparent_hugepage/enabled
always [never]
If "enabled" is NOT set to "[never]", the Transparent HugePages are being used.
You can also issue:
# grep AnonHugePages /proc/meminfo
If the output contains a line like "AnonHugepages: xxxx kB", with a value > 0kB the kernel is using Transparent HugePages.

禁用Transparent HugePages方法

Add the following to the kernel boot line in /etc/grub.conf (this is the preferred method) and reboot the server:
transparent_hugepage=never
Once modified the line will read similar to the following example:
title Oracle Linux Server (2.6.32-300.25.1.el6uek.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-300.25.1.el6uek.x86_64 ro root=LABEL=/ transparent_hugepage=never
        initrd /initramfs-2.6.32-300.25.1.el6uek.x86_64.img
OR
Add the following lines in /etc/rc.local and reboot the server:
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
   echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
   echo never > /sys/kernel/mm/transparent_hugepage/defrag
fi

具体见文档:ALERT Disable Transparent HugePages on SLES11 RHEL6 OEL6 and UEK2 Kernels(Doc ID 1557478.1)

oerr ora code本质探讨

oerr ora code 命令

[oracle@q9-gg02 ~]$ oerr ora 10047
10047, 00000, "trace switching of sessions"
// *Cause:
// *Action:

unix/linux系统下$ORACLE_HOME/rdbms/mesg/oraus.msg包含ora-记录

[oracle@q9-gg02 ~]$ vi $ORACLE_HOME/rdbms/mesg/oraus.msg
…………
10047, 00000, "trace switching of sessions"
// *Cause:
// *Action:
10048, 00000, "Undo segment shrink"
// *Cause:
// *Action:
10049, 00000, "protect library cache memory heaps"
// *Cause:
// *Action: Use the OS memory protection (if available) to protect library
//          cache memory heaps that are pinned.

oerr ora code本质

[oracle@q9-gg02 ~]$ strace oerr ora 10047
execve("/u01/oracle/app/product/10.2/db_1/bin/oerr", ["oerr", "ora", "10047"], [/* 34 vars */]) = 0
brk(0)                                  = 0x1c239000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b60db3fe000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b60db3ff000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/u01/oracle/app/product/10.2/db_1/lib/tls/x86_64/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/u01/oracle/app/product/10.2/db_1/lib/tls/x86_64", 0x7fff7a3cb830) = -1 ENOENT (No such file or directory)
open("/u01/oracle/app/product/10.2/db_1/lib/tls/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/u01/oracle/app/product/10.2/db_1/lib/tls", 0x7fff7a3cb830) = -1 ENOENT (No such file or directory)
open("/u01/oracle/app/product/10.2/db_1/lib/x86_64/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/u01/oracle/app/product/10.2/db_1/lib/x86_64", 0x7fff7a3cb830) = -1 ENOENT (No such file or directory)
open("/u01/oracle/app/product/10.2/db_1/lib/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/u01/oracle/app/product/10.2/db_1/lib", {st_mode=S_IFDIR|0750, st_size=12288, ...}) = 0
open("tls/x86_64/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("tls/libtermcap.so.2", O_RDONLY)   = -1 ENOENT (No such file or directory)
open("x86_64/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("libtermcap.so.2", O_RDONLY)       = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=154281, ...}) = 0
mmap(NULL, 154281, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b60db400000
close(3)                                = 0
open("/lib64/libtermcap.so.2", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\20 \214>\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=15840, ...}) = 0
mmap(0x3e8c200000, 2108944, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3e8c200000
mprotect(0x3e8c203000, 2093056, PROT_NONE) = 0
mmap(0x3e8c402000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x3e8c402000
close(3)                                = 0
open("/u01/oracle/app/product/10.2/db_1/lib/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("tls/x86_64/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("tls/libdl.so.2", O_RDONLY)        = -1 ENOENT (No such file or directory)
open("x86_64/libdl.so.2", O_RDONLY)     = -1 ENOENT (No such file or directory)
open("libdl.so.2", O_RDONLY)            = -1 ENOENT (No such file or directory)
open("/lib64/libdl.so.2", O_RDONLY)     = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\16`\214>\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=23360, ...}) = 0
mmap(0x3e8c600000, 2109696, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3e8c600000
mprotect(0x3e8c602000, 2097152, PROT_NONE) = 0
mmap(0x3e8c802000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x3e8c802000
close(3)                                = 0
open("/u01/oracle/app/product/10.2/db_1/lib/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("tls/x86_64/libc.so.6", O_RDONLY)  = -1 ENOENT (No such file or directory)
open("tls/libc.so.6", O_RDONLY)         = -1 ENOENT (No such file or directory)
open("x86_64/libc.so.6", O_RDONLY)      = -1 ENOENT (No such file or directory)
open("libc.so.6", O_RDONLY)             = -1 ENOENT (No such file or directory)
open("/lib64/libc.so.6", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\332\341\213>\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1722328, ...}) = 0
mmap(0x3e8be00000, 3502424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3e8be00000
mprotect(0x3e8bf4f000, 2093056, PROT_NONE) = 0
mmap(0x3e8c14e000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14e000) = 0x3e8c14e000
mmap(0x3e8c153000, 16728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3e8c153000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b60db426000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b60db427000
arch_prctl(ARCH_SET_FS, 0x2b60db426fe0) = 0
mprotect(0x3e8c802000, 4096, PROT_READ) = 0
mprotect(0x3e8c14e000, 16384, PROT_READ) = 0
mprotect(0x3e8bc1c000, 4096, PROT_READ) = 0
munmap(0x2b60db400000, 154281)          = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
open("/dev/tty", O_RDWR|O_NONBLOCK)     = 3
close(3)                                = 0
brk(0)                                  = 0x1c239000
brk(0x1c25a000)                         = 0x1c25a000
open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=56458208, ...}) = 0
mmap(NULL, 56458208, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b60db428000
close(3)                                = 0
getuid()                                = 1000
getgid()                                = 1000
geteuid()                               = 1000
getegid()                               = 1000
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
open("/proc/meminfo", O_RDONLY)         = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b60dea00000
read(3, "MemTotal:     132051308 kB\nMemFr"..., 1024) = 781
close(3)                                = 0
munmap(0x2b60dea00000, 4096)            = 0
rt_sigaction(SIGCHLD, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGCHLD, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGQUIT, {0x1, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
uname({sys="Linux", node="q9-gg02.800best.com", ...}) = 0
stat("/home/oracle", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
stat(".", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
getpid()                                = 25928
open("/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=25464, ...}) = 0
mmap(NULL, 25464, PROT_READ, MAP_SHARED, 3, 0) = 0x2b60dea00000
close(3)                                = 0
getppid()                               = 25927
getpgrp()                               = 25927
rt_sigaction(SIGCHLD, {0x436360, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
open("/u01/oracle/app/product/10.2/db_1/bin/oerr", O_RDONLY) = 3
ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff7a3cc060) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "#!/bin/sh\n#\n# $Id: oerr 28-aug-2"..., 80) = 80
lseek(3, 0, SEEK_SET)                   = 0
getrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=64*1024}) = 0
dup2(3, 255)                            = 255
close(3)                                = 0
fcntl(255, F_SETFD, FD_CLOEXEC)         = 0
fcntl(255, F_GETFL)                     = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fstat(255, {st_mode=S_IFREG|0755, st_size=2365, ...}) = 0
lseek(255, 0, SEEK_CUR)                 = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "#!/bin/sh\n#\n# $Id: oerr 28-aug-2"..., 2365) = 2365
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
pipe([3, 4])                            = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
lseek(255, -973, SEEK_CUR)              = 1392
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b60db427070) = 25929
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGCHLD, {0x436360, [], SA_RESTORER, 0x3e8be302d0}, {0x436360, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
close(4)                                = 0
read(3, "ora:rdbms:*:\n", 128)          = 13
read(3, "", 128)                        = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25929
wait4(-1, 0x7fff7a3cb804, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn(0)                         = 0
close(3)                                = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "if [ $? -ne 0 ]\nthen\n\techo \"oerr"..., 2365) = 973
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
pipe([3, 4])                            = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
lseek(255, -645, SEEK_CUR)              = 1720
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b60db427070) = 25931
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGCHLD, {0x436360, [], SA_RESTORER, 0x3e8be302d0}, {0x436360, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
close(4)                                = 0
read(3, "Facility=ora\nComponent=rdbms\n", 128) = 29
read(3, "", 128)                        = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25931
wait4(-1, 0x7fff7a3cb904, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn(0)                         = 0
close(3)                                = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "if [ -z \"$Facility\" -o -z \"$Comp"..., 2365) = 645
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
--检查oraus.msg文件是否存在并且有权限访问
stat("/u01/oracle/app/product/10.2/db_1/rdbms/mesg/oraus.msg", {st_mode=S_IFREG|0644, st_size=3789794, ...}) = 0
geteuid()                               = 1000
getegid()                               = 1000
getuid()                                = 1000
getgid()                                = 1000
--访问oraus.msg文件
access("/u01/oracle/app/product/10.2/db_1/rdbms/mesg/oraus.msg", R_OK) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
stat(".", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
--查找awk命令位置
stat("/u01/oracle/app/product/10.2/db_1/jre/1.4.2/bin/awk", 0x7fff7a3cbe80) = -1 ENOENT (No such file or directory)
stat("/u01/oracle/app/product/10.2/db_1/bin/awk", 0x7fff7a3cbe80) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/qt-3.3/bin/awk", 0x7fff7a3cbe80) = -1 ENOENT (No such file or directory)
stat("/usr/kerberos/bin/awk", 0x7fff7a3cbe80) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/awk", 0x7fff7a3cbe80) = -1 ENOENT (No such file or directory)
--找到awk命令
stat("/bin/awk", {st_mode=S_IFREG|0755, st_size=338744, ...}) = 0
access("/bin/awk", X_OK)                = 0
access("/bin/awk", R_OK)                = 0
stat("/bin/awk", {st_mode=S_IFREG|0755, st_size=338744, ...}) = 0
access("/bin/awk", X_OK)                = 0
access("/bin/awk", R_OK)                = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [INT CHLD], 8) = 0
rt_sigprocmask(SIG_SETMASK, [INT CHLD], NULL, 8) = 0
lseek(255, -8, SEEK_CUR)                = 2357
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b60db427070) = 25934
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
--通过awk命令获取oraus.msg中相关命令
wait4(-1, 10047, 00000, "trace switching of sessions"
// *Cause:
// *Action:
[{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 25934
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, 0x7fff7a3cbb54, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn(0xffffffffffffffff)        = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "\nexit 0\n", 2365)           = 8
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
exit_group(0)                           = ?

通过这里可以知道oerr其实就是通过awk对$ORACLE_HOME/rdbms/mesg/oraus.msg文件进行筛选,显示出来对应ORA-错误号的提示

记录一次数据库异常导致ipc未释放案例

很多时候,linux/unix中的ipc未正常释放,表现的现象是再次启动实例的时候表现出来内存不足等类似现象,但是今天在数据库启动过程中因为数据库后台报ora-600[6002]错误,导致数据库启动失败,但是后续的一些列奇怪现象让我吃惊:
1.启动数据库异常的session一直处于卡主状态,任何办法无法终止
2.开打新会话,登录数据库提示进行操作提示not logging,但是abort会出现和1中一样的hang住
数据库open过程卡主 会话无法退出,crtl+c依然无效

SQL> startup
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
Database mounted.
^C
^C^C^C^C^C

alert报错

Tue Aug 27 20:33:57 2013
SMON: enabling cache recovery
Archived Log entry 29 added for thread 1 sequence 29 ID 0xfb920b77 dest 1:
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_4992.trc  (incident=201905):
ORA-00600: internal error code, arguments: [6002], [6], [6], [1], [0], [], [], [], [], [], [], []
Incident details in: /u01/oracle/diag/rdbms/ora11g/ora11g/incident/incdir_201905/ora11g_ora_4992_i201905.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Undo initialization errored: err:600 serial:0 start:1524754 end:1528104 diff:3350 (33 seconds)
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_4992.trc:
ORA-00600: internal error code, arguments: [6002], [6], [6], [1], [0], [], [], [], [], [], [], []
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_4992.trc:
ORA-00600: internal error code, arguments: [6002], [6], [6], [1], [0], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 4992): terminating the instance due to error 600
Instance terminated by USER, pid = 4992
ORA-1092 signalled during: alter database Open...

新开启会话登录

[oracle@xifenfei ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Tue Aug 27 20:27:38 2013
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected.
SQL> startup mount
ORA-01012: not logged on
SQL> conn / as sysdba
Connected to an idle instance.
SQL> startup mount
ORA-01012: not logged on

alert日志提示

Tue Aug 27 20:53:18 2013
ORA-1092 : opitsk aborting process
Tue Aug 27 20:53:44 2013
ORA-1092 : opitsk aborting process
Tue Aug 27 20:54:21 2013

shutdown abort 依然卡主

SQL> shutdown abort
^C
^C^C

查看ipc
根据alert日志提示,数据库已经异常关闭了,但是sqlplus中显示异常,怀疑是ipc未被正常释放

[oracle@xifenfei ~]$ ipcs -m
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 2228224    oracle    640        8388608    1
0x00000000 2260993    oracle    640        167772160  1
0xee5c29c4 2293762    oracle    640        2097152    1
[oracle@xifenfei ~]$ ps -ef|grep pmon
oracle    4843  4785  0 20:29 pts/3    00:00:00 grep pmon

释放ipc

[oracle@xifenfei ~]$ ipcrm -m 2260993
[oracle@xifenfei ~]$ ipcs -m
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 2228224    oracle    640        8388608    1
0x00000000 2260993    oracle    640        167772160  1          dest
0xee5c29c4 2293762    oracle    640        2097152    1
[oracle@xifenfei ~]$ ipcrm -m 2228224
[oracle@xifenfei ~]$ ipcs -m
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 2228224    oracle    640        8388608    1          dest
0x00000000 2260993    oracle    640        167772160  1          dest
0xee5c29c4 2293762    oracle    640        2097152    1
[oracle@xifenfei ~]$ ipcrm -m 2293762
[oracle@xifenfei ~]$ ipcs -m
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 2228224    oracle    640        8388608    1          dest
0x00000000 2260993    oracle    640        167772160  1          dest
0x00000000 2293762    oracle    640        2097152    1          dest

重新启动数据库

[oracle@xifenfei ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Tue Aug 27 20:33:35 2013
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup mount;
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
Database mounted.

再次查看ipc

[oracle@xifenfei ~]$ ipcs -m
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 2228224    oracle    640        8388608    1          dest
0x00000000 2260993    oracle    640        167772160  1          dest
0x00000000 2293762    oracle    640        2097152    1          dest
0x00000000 2490374    oracle    640        8388608    29
0x00000000 2523143    oracle    640        167772160  29
0xee5c29c4 2555912    oracle    640        2097152    29

模拟基表事务未提交数据库crash,undo丢失恢复异常恢复

对于基表特别是bootstrap$中的表,如果进行了dml操作,但是没有提交,这个时候数据库crash,而且undo异常.对于类似的场景,都是很多数据库恢复的人都有畏惧,因为_corrupted_rollback_segments参数不能起作用.本实验dml修改seq$表,最终却引起了undo$和I_UNDO1异常(他们都在bootstrap$中)
模拟seq$表事务未提交

SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> select sysdate "www.xifenfei.com" from dual;
www.xifenfei
------------
31-AUG-13
SQL> CREATE SEQUENCE seq_xifenfei INCREMENT BY 1 START WITH 1 CACHE 3;
Sequence created.
SQL> select object_id from dba_objects where object_name='SEQ_XIFENFEI';
 OBJECT_ID
----------
     77282
SQL> SELECT MINVALUE,HIGHWATER,INCREMENT$, CYCLE#,ORDER$,CACHE from seq$ where obj#=77282;
  MINVALUE  HIGHWATER INCREMENT$     CYCLE#     ORDER$      CACHE
---------- ---------- ---------- ---------- ---------- ----------
         1          1          1          0          0          3
SQL> update seq$ set  CYCLE#=3 where obj#=77282;
1 row updated.
--另外一个会话abort数据库
SQL> shutdown abort;
ORACLE instance shut down.
--人工rm掉undotbs01.dbf文件
--启动数据库
SQL> startup
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 3 - see DBWR trace file
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
--离线数据文件
SQL> alter database datafile 3 offline;
Database altered.
--尝试open数据库
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
Process ID: 17864
Session ID: 125 Serial number: 5

alert日志

SMON: enabling cache recovery
Mon Aug 26 17:08:49 2013
ARC3 started with pid=25, OS id=17902
ARC1: Archival started
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Archived Log entry 13 added for thread 1 sequence 13 ID 0xfb920b77 dest 1:
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_17864.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_17864.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
Error 704 happened during db open, shutting down database
USER (ospid: 17864): terminating the instance due to error 704
Instance terminated by USER, pid = 17864
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (17864) as a result of ORA-1092

做10046跟踪

SQL> conn / as sysdba
Connected to an idle instance.
SQL> startup nomount;
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
SQL> alter database mount;
Database altered.
SQL> alter session set events  '10046 trace name context forever,level 12';
Session altered.
SQL> alter session set db_file_multiblocK_read_count=1;
Session altered.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
Process ID: 17979
Session ID: 125 Serial number: 3

trace文件内容

PARSING IN CURSOR #3064524800 len=142 dep=3 uid=0 oct=3 lid=0 tim=1377508386082495 hv=361892850 ad='265c66d0' sqlid='7bd391hat42zk'
select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1
END OF STMT
PARSE #3064524800:c=2999,e=2162,p=0,cr=0,cu=0,mis=1,r=0,dep=3,og=3,plh=0,tim=1377508386082492
BINDS #3064524800:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=b6a8ebd0  bln=22  avl=02  flg=05
  value=27
EXEC #3064524800:c=1999,e=2030,p=0,cr=0,cu=0,mis=1,r=0,dep=3,og=3,plh=906473769,tim=1377508386084833
WAIT #3064524800: nam='db file sequential read' ela= 65 file#=1 block#=321 blocks=1 obj#=34 tim=1377508386085083
WAIT #3064524800: nam='db file sequential read' ela= 21 file#=1 block#=225 blocks=1 obj#=15 tim=1377508386085208
FETCH #3064524800:c=0,e=375,p=2,cr=2,cu=0,mis=0,r=1,dep=3,og=3,plh=906473769,tim=1377508386085319
STAT #3064524800 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=2 pw=0 time=362 us)'
STAT #3064524800 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=1 pw=0 time=223 us)'
CLOSE #3064524800:c=0,e=12,dep=3,type=0,tim=1377508386085657
WAIT #3064511164: nam='control file sequential read' ela= 33 file#=0 block#=1 blocks=1 obj#=15 tim=1377508386085816
WAIT #3064511164: nam='control file sequential read' ela= 18 file#=0 block#=15 blocks=1 obj#=15 tim=1377508386085889
WAIT #3064511164: nam='control file sequential read' ela= 18 file#=0 block#=17 blocks=1 obj#=15 tim=1377508386085994
WAIT #3064511164: nam='control file sequential read' ela= 18 file#=0 block#=23 blocks=1 obj#=15 tim=1377508386086058
WAIT #3064511164: nam='control file sequential read' ela= 17 file#=0 block#=1 blocks=1 obj#=15 tim=1377508386086135
WAIT #3064511164: nam='control file sequential read' ela= 15 file#=0 block#=15 blocks=1 obj#=15 tim=1377508386086196
WAIT #3064511164: nam='control file sequential read' ela= 15 file#=0 block#=17 blocks=1 obj#=15 tim=1377508386086255
WAIT #3064511164: nam='control file sequential read' ela= 18 file#=0 block#=32 blocks=1 obj#=15 tim=1377508386086317
DDE rules only execution for: ORA 1110
----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
----- START DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (Async) -----
Successfully dispatched
----- END DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (SUCCESS, 0 csec) -----
Executing ASYNC actions
----- END DDE Actions Dump (total 0 csec) -----
FETCH #3064511164:c=8998,e=9239,p=5,cr=6,cu=0,mis=0,r=0,dep=2,og=4,plh=2853959010,tim=1377508386088336
STAT #3064511164 id=1 cnt=0 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ$ (cr=0 pr=0 pw=0 time=82 us)'
STAT #3064511164 id=2 cnt=1 pid=1 pos=1 obj=37 op='INDEX RANGE SCAN I_OBJ2 (cr=3 pr=3 pw=0 time=1166 us)'
=====================
PARSE ERROR #3064512008:len=60 dep=1 uid=0 oct=3 lid=0 tim=1377508386088729 err=604
SELECT NULL FROM PROPS$ WHERE NAME='BOOTSTRAP_UPGRADE_ERROR'
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'

_corrupted_rollback_segments无法打开数据库

SQL> show parameter _corrupted_rollback_segments;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
_corrupted_rollback_segments         string      _SYSSMU22_1366820418$, _SYSSMU
                                                 23_2165404374$, _SYSSMU24_1842
                                                 376483$, _SYSSMU25_2373003110$
                                                 , _SYSSMU26_3635237253$, _SYSS
                                                 MU27_1730246918$, _SYSSMU28_31
                                                 59173804$, _SYSSMU29_141006113
                                                 4$, _SYSSMU30_2299069988$, _SY
                                                 SSMU31_290584066$
SQL> startup pfile='/tmp/pfile' mount
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
Database mounted.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
Process ID: 17979
Session ID: 125 Serial number: 3

dump block225

Block header dump:  0x004000e1
 Object id on Block? Y
 seg/obj: 0xf  csc: 0xb8c.3c0c297b  itc: 1  flg: O  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01
 Itl           Xid                  Uba        Flag  Lck        Scn/Fsc
0x01   0x0000.001.00000034  0x00400221.0034.08  --U-    1  fsc 0x0000.3c0c297c
bdba: 0x004000e1
data_block_dump,data header at 0xb6b8a644

bbed 提交225事务

[oracle@xifenfei ~]$ bbed password=blockedit filename='/u01/oracle/oradata/ora11g/system01.dbf' blocksize=8192
BBED: Release 2.0.0.0.0 - Limited Production on Mon Aug 26 17:31:09 2013
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
************* !!! For Oracle Internal Use only !!! ***************
BBED> set block 225
        BLOCK#          225
BBED> map
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 225                                   Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Table/Cluster)
 struct kcbh, 20 bytes                      @0
 struct ktbbh, 48 bytes                     @20
 struct kdbh, 14 bytes                      @68
 struct kdbt[1], 4 bytes                    @82
 sb2 kdbr[32]                               @86
 ub1 freespace[59]                          @150
 ub1 rowdata[7979]                          @209
 ub4 tailchk                                @8188
BBED> p ktbbh
struct ktbbh, 48 bytes                      @20
   ub1 ktbbhtyp                             @20       0x01 (KDDBTDATA)
   union ktbbhsid, 4 bytes                  @24
      ub4 ktbbhsg1                          @24       0x0000000f
      ub4 ktbbhod1                          @24       0x0000000f
   struct ktbbhcsc, 8 bytes                 @28
      ub4 kscnbas                           @28       0x3c0c297b
      ub2 kscnwrp                           @32       0x0b8c
   b2 ktbbhict                              @36      -2047
   ub1 ktbbhflg                             @38       0x03 (KTBFONFL)
   ub1 ktbbhfsl                             @39       0x00
   ub4 ktbbhfnx                             @40       0x00000000
   struct ktbbhitl[0], 24 bytes             @44
      struct ktbitxid, 8 bytes              @44
         ub2 kxidusn                        @44       0x0000
         ub2 kxidslt                        @46       0x0001
         ub4 kxidsqn                        @48       0x00000034
      struct ktbituba, 8 bytes              @52
         ub4 kubadba                        @52       0x00400221
         ub2 kubaseq                        @56       0x0034
         ub1 kubarec                        @58       0x08
      ub2 ktbitflg                          @60       0x2001 (KTBFUPB)
      union _ktbitun, 2 bytes               @62
         b2 _ktbitfsc                       @62       0
         ub2 _ktbitwrp                      @62       0x0000
      ub4 ktbitbas                          @64       0x3c0c297c
BBED> set mode edit
        MODE            Edit
BBED> m /x 0180 offset 60
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 225              Offsets:   60 to   91           Dba:0x00000000
------------------------------------------------------------------------
 01800000 7c290c3c 00012000 ffff5200 8d003017 30170000 20008e1f 4106f805
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 225:
current = 0x55db, required = 0x55db
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/oracle/oradata/ora11g/system01.dbf
BLOCK = 225
Block Checking: DBA = 4194529, Block Type = KTB-managed data block
data header at 0xb6773244
kdbchk: row locked by non-existent transaction
        table=0   slot=20
        lockid=1   ktbbhitc=1
Block 225 failed with check code 6101
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 1
Total Blocks Failing   (Data) : 1
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0
BBED> p *kdbr[20]
rowdata[608]
------------
ub1 rowdata[608]                            @817      0x2c
BBED> dump
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 225              Offsets:  817 to  848           Dba:0x00000000
------------------------------------------------------------------------
 2c011102 c115145f 53595353 4d553230 5f343035 34303430 31322402 c10202c1
 <32 bytes per line>
BBED> m /x 002c
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 225              Offsets:  817 to  848           Dba:0x00000000
------------------------------------------------------------------------
 002c1102 c115145f 53595353 4d553230 5f343035 34303430 31322402 c10202c1
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 225:
current = 0x79f6, required = 0x79f6
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/oracle/oradata/ora11g/system01.dbf
BLOCK = 225
Block Checking: DBA = 4194529, Block Type = KTB-managed data block
data header at 0xb6773244
kdbchk: row locked by non-existent transaction
        table=0   slot=20
        lockid=44   ktbbhitc=1
Block 225 failed with check code 6101
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 1
Total Blocks Failing   (Data) : 1
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0
BBED> p *kdbr[20]
rowdata[608]
------------
ub1 rowdata[608]                            @817      0x00
BBED> d
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 225              Offsets:  817 to  848           Dba:0x00000000
------------------------------------------------------------------------
 002c1102 c115145f 53595353 4d553230 5f343035 34303430 31322402 c10202c1
 <32 bytes per line>
BBED> m /x 2c00
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 225              Offsets:  817 to  848           Dba:0x00000000
------------------------------------------------------------------------
 2c001102 c115145f 53595353 4d553230 5f343035 34303430 31322402 c10202c1
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 225:
current = 0x55da, required = 0x55da
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/oracle/oradata/ora11g/system01.dbf
BLOCK = 225
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 1
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0

dump 321 block

SQL> alter system dump datafile '/u01/oracle/oradata/ora11g/system01.dbf' block 321;
System altered.
SQL> select value from v$diag_info where name='Default Trace File';
VALUE
--------------------------------------------------------------------------------
/u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_19704.trc
--trace文件
Block header dump:  0x00400141
 Object id on Block? Y
 seg/obj: 0x22  csc: 0xb8c.3c0c28ec  itc: 2  flg: -  typ: 2 - INDEX
     fsl: 0  fnx: 0x0 ver: 0x01
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x02   0x0000.038.00000033  0x0040021e.0033.3d  --U-    1  fsc 0x0000.3c0c28ed

bbed提交321 block 事务

BBED> set block 321
        BLOCK#          321
BBED> map
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 321                                   Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Index Leaf)
 struct kcbh, 20 bytes                      @0
 struct ktbbh, 72 bytes                     @20
 struct kdxle, 32 bytes                     @92
 b2 kd_off[32]                              @124
 ub1 freespace[7581]                        @188
 ub1 rowdata[351]                           @7769
 ub4 tailchk                                @8188
BBED> p ktbbh
struct ktbbh, 72 bytes                      @20
   ub1 ktbbhtyp                             @20       0x02 (KDDBTINDEX)
   union ktbbhsid, 4 bytes                  @24
      ub4 ktbbhsg1                          @24       0x00000022
      ub4 ktbbhod1                          @24       0x00000022
   struct ktbbhcsc, 8 bytes                 @28
      ub4 kscnbas                           @28       0x3c0c28ec
      ub2 kscnwrp                           @32       0x0b8c
   b2 ktbbhict                              @36      -2046
   ub1 ktbbhflg                             @38       0x02 (NONE)
   ub1 ktbbhfsl                             @39       0x00
   ub4 ktbbhfnx                             @40       0x00000000
   struct ktbbhitl[0], 24 bytes             @44
      struct ktbitxid, 8 bytes              @44
         ub2 kxidusn                        @44       0x0000
         ub2 kxidslt                        @46       0x0000
         ub4 kxidsqn                        @48       0x00000000
      struct ktbituba, 8 bytes              @52
         ub4 kubadba                        @52       0x00000000
         ub2 kubaseq                        @56       0x0000
         ub1 kubarec                        @58       0x00
      ub2 ktbitflg                          @60       0x0000 (NONE)
      union _ktbitun, 2 bytes               @62
         b2 _ktbitfsc                       @62       0
         ub2 _ktbitwrp                      @62       0x0000
      ub4 ktbitbas                          @64       0x00000000
   struct ktbbhitl[1], 24 bytes             @68
      struct ktbitxid, 8 bytes              @68
         ub2 kxidusn                        @68       0x0000
         ub2 kxidslt                        @70       0x0038
         ub4 kxidsqn                        @72       0x00000033
      struct ktbituba, 8 bytes              @76
         ub4 kubadba                        @76       0x0040021e
         ub2 kubaseq                        @80       0x0033
         ub1 kubarec                        @82       0x3d
      ub2 ktbitflg                          @84       0x2001 (KTBFUPB)
      union _ktbitun, 2 bytes               @86
         b2 _ktbitfsc                       @86       0
         ub2 _ktbitwrp                      @86       0x0000
      ub4 ktbitbas                          @88       0x3c0c28ed
BBED> m /x 0180 offset 84
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 321              Offsets:   84 to  595           Dba:0x00000000
------------------------------------------------------------------------
 01800000 ed280c3c 00008001 00000000 20006400 011e9d1d 00000000 00000000
 00000000 06000000 601f0000 561f4b1f 401f351f 2a1f1f1f 141f091f fe1ef31e
 e81edd1e d21ec71e bc1eb11e a61e9b1e 901e851e 7a1e6f1e 641e591e 4e1e431e
 381e2d1e 221e171e 0c1e011e 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 321:
current = 0x990e, required = 0x990e
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/oracle/oradata/ora11g/system01.dbf
BLOCK = 321
Block Checking: DBA = 4194625, Block Type = KTB-managed data block
**** actual rows locked by itl 2  = 1 != # in trans. header = 0
---- end index block validation
Block 321 failed with check code 6401
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 1
Total Blocks Failing   (Index): 1
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0
BBED> p kdxle
struct kdxle, 32 bytes                      @92
   struct kdxlexco, 16 bytes                @92
      ub1 kdxcolev                          @92       0x00
      ub1 kdxcolok                          @93       0x00
      ub1 kdxcoopc                          @94       0x80
      ub1 kdxconco                          @95       0x01
      ub4 kdxcosdc                          @96       0x00000000
      sb2 kdxconro                          @100      32
      b2 kdxcofbo                           @102      100
      b2 kdxcofeo                           @104      7681
      b2 kdxcoavs                           @106      7581
   b2 kdxlespl                              @108      0
   sb2 kdxlende                             @110      0
   ub4 kdxlenxt                             @112      0x00000000
   ub4 kdxleprv                             @116      0x00000000
   ub1 kdxledsz                             @120      0x06
   ub1 kdxleunuse                           @121      0x00
BBED> m /x 00 offset 95
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 321              Offsets:   95 to  606           Dba:0x00000000
------------------------------------------------------------------------
 00000000 00200064 00011e9d 1d000000 00000000 00000000 00060000 00601f00
 00561f4b 1f401f35 1f2a1f1f 1f141f09 1ffe1ef3 1ee81edd 1ed21ec7 1ebc1eb1
 1ea61e9b 1e901e85 1e7a1e6f 1e641e59 1e4e1e43 1e381e2d 1e221e17 1e0c1e01
 1e000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 321:
current = 0x380e, required = 0x380e
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/oracle/oradata/ora11g/system01.dbf
BLOCK = 321
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 1
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0

bbed提交事务open依旧报错

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
Process ID: 20229
Session ID: 125 Serial number: 5

做10046的trace文件错误依旧

=====================
PARSING IN CURSOR #3065238148 len=142 dep=3 uid=0 oct=3 lid=0 tim=1377519352653391 hv=361892850 ad='269a8064' sqlid='7bd391hat42zk'
select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1
END OF STMT
PARSE #3065238148:c=2999,e=3152,p=0,cr=0,cu=0,mis=1,r=0,dep=3,og=3,plh=0,tim=1377519352653388
BINDS #3065238148:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=b6b32268  bln=22  avl=02  flg=05
  value=27
EXEC #3065238148:c=2999,e=3182,p=0,cr=0,cu=0,mis=1,r=0,dep=3,og=3,plh=906473769,tim=1377519352657030
WAIT #3065238148: nam='db file sequential read' ela= 169 file#=1 block#=321 blocks=1 obj#=34 tim=1377519352657400
WAIT #3065238148: nam='db file sequential read' ela= 25 file#=1 block#=225 blocks=1 obj#=15 tim=1377519352657546
FETCH #3065238148:c=1000,e=578,p=2,cr=2,cu=0,mis=0,r=1,dep=3,og=3,plh=906473769,tim=1377519352657738
STAT #3065238148 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=2 pw=0 time=538 us)'
STAT #3065238148 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=1 pw=0 time=338 us)'
CLOSE #3065238148:c=0,e=13,dep=3,type=0,tim=1377519352657942
WAIT #3065201888: nam='control file sequential read' ela= 111 file#=0 block#=1 blocks=1 obj#=15 tim=1377519352658170
WAIT #3065201888: nam='control file sequential read' ela= 19 file#=0 block#=16 blocks=1 obj#=15 tim=1377519352658269
WAIT #3065201888: nam='control file sequential read' ela= 18 file#=0 block#=18 blocks=1 obj#=15 tim=1377519352658329
WAIT #3065201888: nam='control file sequential read' ela= 19 file#=0 block#=24 blocks=1 obj#=15 tim=1377519352658391
WAIT #3065201888: nam='control file sequential read' ela= 17 file#=0 block#=1 blocks=1 obj#=15 tim=1377519352658469
WAIT #3065201888: nam='control file sequential read' ela= 16 file#=0 block#=16 blocks=1 obj#=15 tim=1377519352658528
WAIT #3065201888: nam='control file sequential read' ela= 28 file#=0 block#=18 blocks=1 obj#=15 tim=1377519352658787
WAIT #3065201888: nam='control file sequential read' ela= 20 file#=0 block#=32 blocks=1 obj#=15 tim=1377519352658883
DDE rules only execution for: ORA 1110
----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
----- START DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (Async) -----
Successfully dispatched
----- END DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (SUCCESS, 0 csec) -----
Executing ASYNC actions
----- END DDE Actions Dump (total 0 csec) -----
FETCH #3065201888:c=11998,e=12032,p=5,cr=6,cu=0,mis=0,r=0,dep=2,og=4,plh=2853959010,tim=1377519352661044
STAT #3065201888 id=1 cnt=0 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ$ (cr=0 pr=0 pw=0 time=19 us)'
STAT #3065201888 id=2 cnt=1 pid=1 pos=1 obj=37 op='INDEX RANGE SCAN I_OBJ2 (cr=3 pr=3 pw=0 time=1001 us)'

通过观察每次的10046文件,发现总是在查询select /*+ rule */ name,file#, block#,status$,user#,undosqn,xactsqn, scnbas,scnwrp,DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1并且总是us#=27号回滚段上过不去,通过使用bbed或者dul修改数据文件,让数据库启动时候不再扫描27号回滚段(主要是修改undo$.status$=1,block位置可以通过10046确定,也可以通过odu来确定,亦或者找个相同版本的库查看)

继续open数据库

SQL> startup pfile='/tmp/pfile' mount
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
Database mounted.
SQL> alter database open;
Database altered.

删除有问题undo,重建新undo

SQL> drop tablespace undotbs1;
drop tablespace undotbs1
*
ERROR at line 1:
ORA-01561: failed to remove all objects in the tablespace specified
SQL> !oerr ora 01561
01561, 00000, "failed to remove all objects in the tablespace specified"
// *Cause: Failed to remove all objects when dropping a tablespace
// *Action: Retry the drop tablespace until all objects are dropped
SQL> drop tablespace undotbs1 including contents;
drop tablespace undotbs1 including contents
*
ERROR at line 1:
ORA-01561: failed to remove all objects in the tablespace specified
SQL> drop tablespace undotbs1 including contents and datafiles;
drop tablespace undotbs1 including contents and datafiles
*
ERROR at line 1:
ORA-01561: failed to remove all objects in the tablespace specified
-------------------------------------------------
因为前面使用了bbed/dul之类的工具,修改了undo$表,
导致undo$中的标记27号回滚段与seg$中不一致
解决方法使得修改undo$.status$为3或者2
-------------------------------------------------
SQL> update undo$ set status$=3 where us#=27;
1 row updated.
SQL> commit;
Commit complete.
SQL> drop tablespace undotbs1 ;
drop tablespace undotbs1
*
ERROR at line 1:
ORA-01561: failed to remove all objects in the tablespace specified
SQL> shutdown abort;
ORACLE instance shut down.
SQL> startup pfile='/tmp/pfile'
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
Database mounted.
Database opened.
SQL> drop tablespace undotbs1 ;
Tablespace dropped.
SQL> create undo tablespace undotbs1 datafile '/u01/oracle/oradata/ora11g/undotbs1.dbf' size 10M;
Tablespace created.

直接修改基表,bbed/dul修改数据文件等操作都是危险级别非常高的操作,而且oracle not support,在没有十足把握之前,一定不要在生产环境中执行,如果数据库挂了需要类似方法恢复,也请一定保留现场,如果你搞不定能够还原现场
部分操作可以参考:通过bbed修改回滚段状态解决ORA-00704故障

重建控制文件丢失undo异常恢复—ORA-01173模拟与恢复

数据库异常关闭,使用resetlogs方式重建控制文件,不包含undo表空间相关数据库,然后尝试resetlogs打开数据库,非常容易出现ORA-01173: data dictionary indicates missing data file from system tablespace之类的错误,本文通过模拟该错误,并简单提供解决方法
模拟ORA-01173错误

SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> shutdown abort;
ORACLE instance shut down.
--除掉undo表空间相关数据库然后使用resetlogs模式重建控制文件
SQL> startup nomount;
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
SQL> @/tmp/ctl.sql
Control file created.
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/oracle/oradata/ora11g/system01.dbf'
SQL> recover database until cancel;
ORA-00283: recovery session canceled due to errors
ORA-01610: recovery using the BACKUP CONTROLFILE option must be done
SQL> recover database until cancel using backup controlfile;
ORA-00279: change 12696930512547 generated at 08/26/2013 14:43:13 needed for
thread 1
ORA-00289: suggestion : /u01/oracle/oradata/ora11g/archivelog/1_8_824475918.dbf
ORA-00280: change 12696930512547 for thread 1 is in sequence #8
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel;
ORA-00308: cannot open archived log 'cancel;'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/oracle/oradata/ora11g/system01.dbf'
ORA-01112: media recovery not started
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/oracle/oradata/ora11g/system01.dbf'
SQL> recover database until cancel using backup controlfile;
ORA-00279: change 12696930512547 generated at 08/26/2013 14:43:13 needed for
thread 1
ORA-00289: suggestion : /u01/oracle/oradata/ora11g/archivelog/1_8_824475918.dbf
ORA-00280: change 12696930512547 for thread 1 is in sequence #8
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/u01/oracle/oradata/ora11g/redo02.log
Log applied.
Media recovery complete.
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-01173: data dictionary indicates missing data file from system tablespace
Process ID: 15053
Session ID: 125 Serial number: 3

alert日志

Mon Aug 26 15:22:20 2013
alter database open resetlogs
RESETLOGS after complete recovery through change 12696930514503
Archived Log entry 1 added for thread 1 sequence 7 ID 0xfb91eb36 dest 1:
Archived Log entry 2 added for thread 1 sequence 8 ID 0xfb91eb36 dest 1:
Archived Log entry 3 added for thread 1 sequence 6 ID 0xfb91eb36 dest 1:
Clearing online redo logfile 1 /u01/oracle/oradata/ora11g/redo01.log
Clearing online log 1 of thread 1 sequence number 7
Clearing online redo logfile 1 complete
Clearing online redo logfile 2 /u01/oracle/oradata/ora11g/redo02.log
Clearing online log 2 of thread 1 sequence number 8
Clearing online redo logfile 2 complete
Clearing online redo logfile 3 /u01/oracle/oradata/ora11g/redo03.log
Clearing online log 3 of thread 1 sequence number 6
Clearing online redo logfile 3 complete
Resetting resetlogs activation ID 4220644150 (0xfb91eb36)
Online log /u01/oracle/oradata/ora11g/redo01.log: Thread 1 Group 1 was previously cleared
Online log /u01/oracle/oradata/ora11g/redo02.log: Thread 1 Group 2 was previously cleared
Online log /u01/oracle/oradata/ora11g/redo03.log: Thread 1 Group 3 was previously cleared
Mon Aug 26 15:22:26 2013
Setting recovery target incarnation to 2
Using SCN growth rate of 16384 per second
Mon Aug 26 15:22:26 2013
Assigning activation ID 4220652407 (0xfb920b77)
LGWR: STARTING ARCH PROCESSES
Mon Aug 26 15:22:26 2013
ARC0 started with pid=20, OS id=15131
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Mon Aug 26 15:22:27 2013
ARC1 started with pid=21, OS id=15135
Mon Aug 26 15:22:27 2013
ARC2 started with pid=22, OS id=15139
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /u01/oracle/oradata/ora11g/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon Aug 26 15:22:27 2013
SMON: enabling cache recovery
Mon Aug 26 15:22:27 2013
ARC3 started with pid=23, OS id=15143
ARC1: Archival started
ARC2: Archival started
ARC2: Becoming the 'no FAL' ARCH
ARC2: Becoming the 'no SRL' ARCH
ARC1: Becoming the heartbeat ARCH
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_15053.trc:
ORA-01173: data dictionary indicates missing data file from system tablespace
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_15053.trc:
ORA-01173: data dictionary indicates missing data file from system tablespace
Error 1173 happened during db open, shutting down database
USER (ospid: 15053): terminating the instance due to error 1173
Instance terminated by USER, pid = 15053
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (15053) as a result of ORA-1092
Mon Aug 26 15:22:29 2013
ORA-1092 : opitsk aborting process

解决方案
解决方案设置相关异常回滚段,然后屏蔽掉,数据库即可正常起来

--参数文件修改(使用event或者工具,系统命令等)
undo_management='manual'
_corrupted_rollback_segments
--重启数据库使得_corrupted_rollback_segments生效
shutdown immediate;
startup
--尝试open数据库
recover database;
alter database open;
--新建undo
create undo tablespace undo_new datafile '' size 100m autoextend on next 10m maxsize 30G;
--删除老undo
drop tablespace old_undo
--修改参数
shutdonw immediate
undo_management='auto'
undo_tablespace='unod_new'
--启动数据库
startup
--逻辑导出数据,导入新库

姊妹篇:重建控制文件丢失undo异常恢复—ORA-600[25025]模拟与恢复

重建控制文件丢失undo异常恢复—ORA-600 25025模拟与恢复

在现实的情况中,有些人因为某种原因重建控制文件(丢失undo[有意或者无意]),然后又resetlogs库尝试恢复,这样的操作可能导致各种比较麻烦的恢复,这里模拟ORA-600[25025]异常恢复
模拟ORA-600[25025]错误

SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> alter database backup controlfile to trace as '/tmp/ctl';
Database altered.
SQL> create table chf.t_xifenfei_www as select * from dba_objects where 1=0;
Table created.
SQL> insert into chf.t_xifenfei_www select * from dba_objects;
74749 rows created.
--另外一个会话abort
SQL> shutdown abort;
ORACLE instance shut down.
SQL> STARTUP NOMOUNT
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
SQL> !vi /tmp/ctl.sql
CREATE CONTROLFILE REUSE DATABASE "ORA11G" NORESETLOGS  ARCHIVELOG
    MAXLOGFILES 16
    MAXLOGMEMBERS 3
    MAXDATAFILES 100
    MAXINSTANCES 8
    MAXLOGHISTORY 292
LOGFILE
  GROUP 1 '/u01/oracle/oradata/ora11g/redo01.log'  SIZE 50M BLOCKSIZE 512,
  GROUP 2 '/u01/oracle/oradata/ora11g/redo02.log'  SIZE 50M BLOCKSIZE 512,
  GROUP 3 '/u01/oracle/oradata/ora11g/redo03.log'  SIZE 50M BLOCKSIZE 512
DATAFILE
  '/u01/oracle/oradata/ora11g/system01.dbf',
  '/u01/oracle/oradata/ora11g/sysaux01.dbf',
 -- '/u01/oracle/oradata/ora11g/undo02.dbf',
  '/u01/oracle/oradata/ora11g/users01.dbf',
  '/u01/oracle/oradata/ora11g/dbfs01.dbf',
  '/u01/oracle/oradata/ora11g/tts_xifenfei02.dbf',
  '/u01/oracle/oradata/ora11g/tts_xifenfei01.dbf',
  '/u01/oracle/oradata/ora11g/system02.dbf',
  '/u01/oracle/oradata/ora11g/czum01.dbf',
  '/u01/oracle/oradata/ora11g/undotbs02.dbf',
  '/u01/oracle/oradata/sp2008',
  '/u01/oracle/oradata/sp_2009',
  '/u01/oracle/oradata/sp_2010',
  '/u01/oracle/oradata/sp_2011',
  '/u01/oracle/oradata/sp_2012',
  '/u01/oracle/oradata/sp_2013',
  '/u01/oracle/oradata/sp_2014',
  '/u01/oracle/oradata/sp_2015',
  '/u01/oracle/oradata/sp_2016',
  '/u01/oracle/oradata/sp_2017',
  '/u01/oracle/oradata/sp_2018',
  '/u01/oracle/oradata/sp_2019',
  '/u01/oracle/oradata/sp_2020',
  '/u01/oracle/oradata/sp_2021',
  '/u01/oracle/oradata/sp_2022',
  '/u01/oracle/oradata/sp_2023',
  '/u01/oracle/oradata/sp_2024',
  '/u01/oracle/oradata/sp_2025',
  '/u01/oracle/oradata/sp_20max'
CHARACTER SET ZHS16GBK
;
"/tmp/ctl.sql" [New] 43L, 1519C written
SQL> @/tmp/ctl.sql
Control file created.
SQL> recover database using backup controlfile until cancel;
ORA-00279: change 12696930343864 generated at 05/18/2013 01:17:54 needed for
thread 1
ORA-00289: suggestion :
/u01/oracle/oradata/ora11g/archivelog/1_38_805394597.dbf
ORA-00280: change 12696930343864 for thread 1 is in sequence #38
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/u01/oracle/oradata/ora11g/redo01.log
ORA-00310: archived log contains sequence 37; sequence 38 required
ORA-00334: archived log: '/u01/oracle/oradata/ora11g/redo01.log'
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/oracle/oradata/ora11g/system01.dbf'
SQL>  recover database using backup controlfile until cancel;
ORA-00279: change 12696930343864 generated at 05/18/2013 01:17:54 needed for
thread 1
ORA-00289: suggestion :
/u01/oracle/oradata/ora11g/archivelog/1_38_805394597.dbf
ORA-00280: change 12696930343864 for thread 1 is in sequence #38
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/u01/oracle/oradata/ora11g/redo03.log
ORA-00310: archived log contains sequence 39; sequence 38 required
ORA-00334: archived log: '/u01/oracle/oradata/ora11g/redo03.log'
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/oracle/oradata/ora11g/system01.dbf'
SQL> recover database using backup controlfile until cancel;
ORA-00279: change 12696930343864 generated at 05/18/2013 01:17:54 needed for
thread 1
ORA-00289: suggestion :
/u01/oracle/oradata/ora11g/archivelog/1_38_805394597.dbf
ORA-00280: change 12696930343864 for thread 1 is in sequence #38
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/u01/oracle/oradata/ora11g/redo02.log
ORA-00279: change 12696930370956 generated at 08/26/2013 13:00:25 needed for
thread 1
ORA-00289: suggestion :
/u01/oracle/oradata/ora11g/archivelog/1_39_805394597.dbf
ORA-00280: change 12696930370956 for thread 1 is in sequence #39
ORA-00278: log file '/u01/oracle/oradata/ora11g/redo02.log' no longer needed
for this recovery
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/u01/oracle/oradata/ora11g/redo03.log
Log applied.
Media recovery complete.
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [25025], [3], [], [], [], [], [],
[], [], [], [], []
Process ID: 12603
Session ID: 125 Serial number: 3

alert日志信息

ORA-279 signalled during: ALTER DATABASE RECOVER  database using backup controlfile until cancel  ...
ALTER DATABASE RECOVER    LOGFILE '/u01/oracle/oradata/ora11g/redo02.log'
Media Recovery Log /u01/oracle/oradata/ora11g/redo02.log
Mon Aug 26 13:05:00 2013
ORA-279 signalled during: ALTER DATABASE RECOVER    LOGFILE '/u01/oracle/oradata/ora11g/redo02.log'  ...
Mon Aug 26 13:05:10 2013
ALTER DATABASE RECOVER    LOGFILE '/u01/oracle/oradata/ora11g/redo03.log'
Media Recovery Log /u01/oracle/oradata/ora11g/redo03.log
Mon Aug 26 13:05:10 2013
Incomplete recovery applied all redo ever generated.
Recovery completed through change 12696930370973 time 08/26/2013 13:00:56
Media Recovery Complete (ora11g)
Completed: ALTER DATABASE RECOVER    LOGFILE '/u01/oracle/oradata/ora11g/redo03.log'
alter database open resetlogs
RESETLOGS after complete recovery through change 12696930370973
Archived Log entry 1 added for thread 1 sequence 37 ID 0xfa6fa6cb dest 1:
Archived Log entry 2 added for thread 1 sequence 38 ID 0xfa6fa6cb dest 1:
Archived Log entry 3 added for thread 1 sequence 39 ID 0xfa6fa6cb dest 1:
Clearing online redo logfile 1 /u01/oracle/oradata/ora11g/redo01.log
Clearing online log 1 of thread 1 sequence number 37
Mon Aug 26 13:05:22 2013
Clearing online redo logfile 1 complete
Clearing online redo logfile 2 /u01/oracle/oradata/ora11g/redo02.log
Clearing online log 2 of thread 1 sequence number 38
Clearing online redo logfile 2 complete
Clearing online redo logfile 3 /u01/oracle/oradata/ora11g/redo03.log
Clearing online log 3 of thread 1 sequence number 39
Clearing online redo logfile 3 complete
Resetting resetlogs activation ID 4201621195 (0xfa6fa6cb)
Online log /u01/oracle/oradata/ora11g/redo01.log: Thread 1 Group 1 was previously cleared
Online log /u01/oracle/oradata/ora11g/redo02.log: Thread 1 Group 2 was previously cleared
Online log /u01/oracle/oradata/ora11g/redo03.log: Thread 1 Group 3 was previously cleared
Mon Aug 26 13:05:33 2013
Setting recovery target incarnation to 2
Mon Aug 26 13:05:33 2013
Using SCN growth rate of 16384 per second
Mon Aug 26 13:05:33 2013
Assigning activation ID 4220644150 (0xfb91eb36)
LGWR: STARTING ARCH PROCESSES
Mon Aug 26 13:05:33 2013
ARC0 started with pid=20, OS id=12679
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Mon Aug 26 13:05:35 2013
ARC1 started with pid=21, OS id=12683
Mon Aug 26 13:05:35 2013
ARC2 started with pid=22, OS id=12687
Mon Aug 26 13:05:36 2013
ARC3 started with pid=24, OS id=12691
ARC1: Archival started
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /u01/oracle/oradata/ora11g/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon Aug 26 13:05:36 2013
SMON: enabling cache recovery
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_12603.trc  (incident=146705):
ORA-00600: internal error code, arguments: [25025], [3], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/oracle/diag/rdbms/ora11g/ora11g/incident/incdir_146705/ora11g_ora_12603_i146705.trc
Mon Aug 26 13:05:45 2013
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Undo initialization errored: err:600 serial:0 start:57601994 end:57610584 diff:8590 (85 seconds)
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_12603.trc:
ORA-00600: internal error code, arguments: [25025], [3], [], [], [], [], [], [], [], [], [], []
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_12603.trc:
ORA-00600: internal error code, arguments: [25025], [3], [], [], [], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 12603): terminating the instance due to error 600
Instance terminated by USER, pid = 12603
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (12603) as a result of ORA-1092
Mon Aug 26 13:05:47 2013
ORA-1092 : opitsk aborting process

trace文件

*** 2013-08-26 13:05:38.945
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=7j16t46cacjt9) -----
alter database open resetlogs
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
skdstdst()+41        call     kgdsdst()            BF8A34E4 ? 2 ?
ksedst1()+77         call     skdstdst()           BF8A34E4 ? 0 ? 1 ? 8592C38 ?
                                                   8592AB6 ? 8592C38 ?
ksedst()+33          call     ksedst1()            0 ? 1 ?
dbkedDefDump()+2704  call     ksedst()             0 ? BF8A40E7 ? 47EF1FF4 ?
                                                   BF8A3D66 ? 0 ? BF8A3BF4 ?
ksedmp()+47          call     dbkedDefDump()       3 ? 2 ?
ksfdmp()+59          call     ksedmp()             3EB ? BF8A5458 ? F1DADED ?
                                                   116056E0 ? 3EB ? 116056E0 ?
dbgexPhaseII()+1725  call     00000000             116056E0 ? 3EB ?
dbgexProcessError()  call     dbgexPhaseII()       B6C515AC ? B6A0C890 ?
+2550                                              BF8A8D30 ?
dbgeExecuteForError  call     dbgexProcessError()  B6C515AC ? B6A0C890 ? 1 ? 0 ?
()+65                                              B6C515AC ? B6A0C890 ?
dbgePostErrorKGE()+  call     dbgeExecuteForError  B6C515AC ? B6A0C890 ? 0 ? 1 ?
1794                          ()                   0 ?
dbkePostKGE_kgsf()+  call     dbgePostErrorKGE()   116056E0 ? B6C330D4 ? 258 ?
50
kgeade()+324         call     00000000             116056E0 ? B6C330D4 ? 258 ?
kgeriv_int()+107     call     kgeade()             116056E0 ? 11605808 ?
                                                   B6C330D4 ? 258 ? 0 ? 61C1 ?
kgeriv()+25          call     kgeriv_int()         116056E0 ? B6C330D4 ? 61C1 ?
                                                   0 ? 1 ? BF8A96B8 ?
kgesiv()+98          call     kgeriv()             116056E0 ? B6C330D4 ? 61C1 ?
                                                   1 ? BF8A96B8 ?
ksesic1()+48         call     kgesiv()             116056E0 ? B6C330D4 ? 61C1 ?
                                                   1 ? BF8A96B8 ? 61C1 ? 1 ?krta2rfn()+78
                                                   call     ksesic1()            61C1 ? 0 ? 3 ? 0 ?
kttsinfo()+496       call     krta2rfn()           3 ? 0 ? 0 ? 0 ? 0 ? 0 ?
ktusmout_online_ut(  call     kttsinfo()           9 ? 0 ? 0 ? BF8A9948 ?
)+810
ktusmiut_init_ut()+  call     ktusmout_online_ut(  B000 ? 89E57F8 ? 9 ?
                                                   BF8A96B8 ?
348                           )
ktuini()+518         call     ktusmiut_init_ut()   0 ? 0 ? 0 ? 0 ? 1 ? 0 ?
adbdrv()+46345       call     ktuini()             0 ? BF8A9DE0 ? 1004BF42 ?
                                                   BF8AA258 ? B6A0BFFC ?
                                                   10EA1F20 ?
opiexe()+16835       call     adbdrv()             25C8F1F8 ? 0 ? 0 ? 2A34F528 ?
                                                   2A0400C8 ? BF8AF07C ?
opiosq0()+3437       call     opiexe()             4 ? 0 ? BF8AFE8C ?
kpooprx()+239        call     opiosq0()            3 ? E ? BF8B0184 ? A4 ? 0 ?
kpoal8()+607         call     kpooprx()            BF8B2D6C ? BF8B10AC ? 1D ?
                                                   1 ? 0 ? A4 ?
opiodr()+962         call     00000000             5E ? 1C ? BF8B2D68 ?
ttcpip()+1930        call     00000000             5E ? 1C ? BF8B2D68 ? 0 ?
opitsk()+1355        call     ttcpip()             11616580 ? 5E ? BF8B2D68 ?
                                                   0 ? BF8B29F8 ? BF8B2E90 ?
                                                   FDEBA80 ? 0 ? BF8B2EBC ?
opiino()+827         call     opitsk()             0 ? 0 ?
opiodr()+962         call     00000000             3C ? 4 ? BF8B3E2C ?
opidrv()+479         call     opiodr()             3C ? 4 ? BF8B3E2C ? 0 ?
sou2o()+80           call     opidrv()             3C ? 4 ? BF8B3E2C ?
opimai_real()+109    call     sou2o()              BF8B3E10 ? 3C ? 4 ?
                                                   BF8B3E2C ?
ssthrdmain()+212     call     00000000             2 ? BF8B3F58 ? 0 ? 47DA6F14 ?
                                                   BF8B3F14 ? 47D9A670 ?
main()+147           call     ssthrdmain()         2 ? BF8B3F58 ?
__libc_start_main()  call     00000000             2 ? BF8B4054 ? BF8B4060 ?
+220                                               47D9A828 ? 0 ? 1 ?
_start()+33          call     __libc_start_main()  856F3B4 ? 2 ? BF8B4054 ?
                                                   BCC1EA0 ? BCC1E90 ?
                                                   47D8C790 ?
--------------------- Binary Stack Dump ---------------------

MOS中有类似描述ORA-600 [25025] [25] While Opening the Clone Database in Resetlog Mode (Doc ID 603100.1),该解决方案是重建控制文件增加所有数据文件,在本次测试中,我就是人为除掉了undo,模拟undo丢失[其实数据库已经resetlogs过了,就算加入undo重建控制文件也不会成功(人工修改undo文件头除外)],又做了不正确的重建控制文件操作的故障,我提供解决方案如下

解决办法

--参数文件修改
undo_management='manual'
--尝试open数据库
recover database;
alter database open;
--新建undo
create undo tablespace undo_new datafile '' size 100m autoextend on next 10m maxsize 30G;
--屏蔽需要恢复回滚段
select tablespace_name,segment_name,status from dba_rollback_segs;
_corrupted_rollback_segments
--重启数据库使得_corrupted_rollback_segments生效
shutdown immediate;
startup
--删除老undo
drop tablespace old_undo
--修改参数
shutdonw immediate
undo_management='auto'
undo_tablespace='unod_new'
--启动数据库
startup
--导出数据,导入新库

姊妹篇:重建控制文件丢失undo异常恢复—ORA-01173模拟与恢复

ORACLE db top 命令(oratop)

oracle 也推出来了数据库的top命令,叫做oratap,是使用c语言写的,目前只支持linux x86与x64的11gR2与12cR1
使用说明

[oracle@xifenfei tmp]$ ./oratap  -h
oratop: Release 13.2.4
Usage:
         oratop [ [Options] [Logon] ]
         Logon:
                {username[/password][@connect_identifier] | / }
                [AS {SYSDBA|SYSOPER}]
         Options:
             -i : Interval Delay (requires value in seconds, default: 3s)
             -f : Long format for header & section 4 (default: 80 column)
             -r : IORL mode for Section 2 (default is IOPS)
             -d : Real-Time Top 5 Wait Events (default: Cumulative)
             -m : MODULE/ACTION mode for Section 4 (default: USER/PROGRAM_NAME)
             -b : Batch mode
             -n : maximum number of iterations (requires number)
             -h : Help

使用说明
1. Log in to the system as the Oracle RDBMS software installation owner
2. Stage the oratop executable on the server on which the tool will be executed. On a RAC system it is only necessary to stage the executable on one node as it is RAC aware.
3. Change the name of the executable, eg.

$ mv oratop* oratop

4. Validate the permissions for oratop are 755 (-rwxr-xr-x). If the permissions are not currently set to 755, set the permissions on oratop as follows:

$ chmod 755 oratop

5. Configure the execution environment as follows (if not already set accordingly)

$ export TERM=xterm #or vt100
$ export ORACLE_HOME=<11.2 database home>
$ export LD_LIBRARY_PATH=$ORACLE_HOME/lib
$ export PATH=$ORACLE_HOME/bin:$PATH
$ export ORACLE_SID=<local 11.2 database SID to be monitored> #only needed if connecting to a local database

6. Invoke the tool to monitor a local database as follows (interval can be any interval in seconds):

$ ./oratop -i 10 / as sysdba

7. Non-dba privileged user requires specific grants to use the tool. See User Guide for grants required.
8. Databases can be monitored remotely using a tns alias as described in the oratop Users Guide .

$ ./oratop -i 10 username/password@tns_alias
$ ./oratop -i 10 system/manager@tns_alias

9. To exit the program simply press the keyboard key “q”. To abort, CTRL-C.
Troubleshooting:
If you receive an error similar to
./oratop: error while loading shared libraries: libclntsh.so.11.1: cannot open shared object file: No such file or directory that is an indication that libclntsh.so.11.1 was not found in the path defined in LD_LIBRARY_PATH similar to

$ export LD_LIBRARY_PATH=$ORACLE_HOME/lib (ie., the location of libclntsh.so.11.1)

使用效果

[oracle@xifenfei tmp]$ ./oratap  -i 10 chf/xifenfei@ora11g_p
oratop: Release 13.2.4 Production on Mon Aug 26 03:10:17 2013
Copyright (c) 2011, Oracle.  All rights reserved.
Processing ...
oratop 1:  148 ora1 03:10:13 up  3.3m,  1 ins, 301M mt,   3 sn,  1 us,   6% db
ID %CU HLD MBPS IOPS %FR PGAU ASC ASI ASW ASP   AAS USN  TPS UCPS SSRT DBC DBW
 1   0   0    0    1   5 133M   1   0   0   0   0.1   3    0    2   0u   0   0
EVENT     (Cumulative)           TOT WAITS  TIME(s)  AVG_MS PCT  WAIT_CLASS
DB CPU                                          218          68
control file parallel write           5535       30     5.5   9  System I/O
db file sequential read              10080       27     2.7   8  User I/O
os thread startup                      198       25   124.9   8  Concurrency
db file async I/O submit              2575       21     8.0   7  System I/O
ID   SID  SPID USR PROG  PGA OPN SQLID/BLOCKER  E/T STATUS STE WAIT_EVENT  W/T
 1   148  8254 CHF DEDI   2M SEL 4agz3g5aajkdc    0 ACTIVE CPU wa for cpu   0u

参考文档:
oratop – utility for near real-time monitoring of databases, RAC and Single Instance (Doc ID 1500864.1)