通过bbed修改回滚段状态解决ORA-00704故障

undo文件丢失

SQL> startup
ORACLE instance started.
Total System Global Area  313860096 bytes
Fixed Size                  1344652 bytes
Variable Size             218106740 bytes
Database Buffers           88080384 bytes
Redo Buffers                6328320 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 3 - see DBWR trace file
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
[root@xifenfei ~]# ls -l /u01/oracle/oradata/ora11g/undotbs01.dbf
ls: /u01/oracle/oradata/ora11g/undotbs01.dbf: No such file or directory

尝试offline 数据文件

SQL> alter database datafile 3 offline;
Database altered.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
Process ID: 16365
Session ID: 125 Serial number: 5

通过这里的错误提示可以看到因为datafile 3丢失并且offline,而在数据库启动的时候需要去使用该数据文件中的回滚段去回滚事务,但是因为该数据文件被offline,使得回滚不能进行从而出现该错误.这里出现ORA-00704和ORA-00604的错误,根据感觉不能轻易的使用屏蔽回滚段的方法实现,但是还是做一尝试.使用其他方法找到回滚段.

屏蔽回滚段后重启库

SQL> startup mount pfile='/tmp/pfile'   force
ORACLE instance started.
Total System Global Area  313860096 bytes
Fixed Size                  1344652 bytes
Variable Size             218106740 bytes
Database Buffers           88080384 bytes
Redo Buffers                6328320 bytes
Database mounted.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
Process ID: 16696
Session ID: 125 Serial number: 5

增加了合适的回滚段屏蔽,但是数据库还不能正常启动,而且依然报ORA-00704,决定对数据库启动过程做一个10046跟踪,来判断在哪一步出了问题

对数据库open过程做10046

SQL> conn  / as sysdba
Connected to an idle instance.
SQL> startup pfile='/tmp/pfile' mount;
ORACLE instance started.
Total System Global Area  313860096 bytes
Fixed Size                  1344652 bytes
Variable Size             218106740 bytes
Database Buffers           88080384 bytes
Redo Buffers                6328320 bytes
Database mounted.
SQL> oradebug setmypid
Statement processed.
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
Statement processed.
SQL> oradebug TRACEFILE_NAME
/u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_16869.trc
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
Process ID: 16869
Session ID: 125 Serial number: 5
--trace中发现
PARSE ERROR #3063083528:len=60 dep=1 uid=0 oct=3 lid=0 tim=1342472283605146 err=604
SELECT NULL FROM PROPS$ WHERE NAME='BOOTSTRAP_UPGRADE_ERROR'
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'

通过这里可以看出来很可能是在SELECT PROPS$表的时候,需要使用到回滚段,这里有两种可能需要使用回滚段(1.有数据未提交[提交事务],2.数据块scn过大[提交事务/推进scn]).对于这个问题,我尝试着修改回滚段状态来解决这个问题

修改回滚段状态

[oracle@xifenfei ~]$ bbed listfile=listfile password=blockedit  mode=edit
BBED: Release 2.0.0.0.0 - Limited Production on Tue Jul 17 09:10:01 2012
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
************* !!! For Oracle Internal Use only !!! ***************
BBED> set file 1 block 225
        FILE#           1
        BLOCK#          225
BBED> map
 File: /u01/oracle/oradata/ora11g/system01.dbf (1)
 Block: 225                                   Dba:0x004000e1
------------------------------------------------------------
 KTB Data Block (Table/Cluster)
 struct kcbh, 20 bytes                      @0
 struct ktbbh, 48 bytes                     @20
 struct kdbh, 14 bytes                      @68
 struct kdbt[1], 4 bytes                    @82
 sb2 kdbr[21]                               @86
 ub1 freespace[5357]                        @128
 ub1 rowdata[2703]                          @5485
 ub4 tailchk                                @8188
BBED> p *kdbr[1]
rowdata[1393]
-------------
ub1 rowdata[1393]                           @6878     0x2c
BBED> x /rncnnnnn
rowdata[1393]                               @6878
-------------
flag@6878: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@6879: 0x00
cols@6880:   17
col    0[2] @6881: 1
col   1[20] @6884: _SYSSMU1_3138885392$
col    2[2] @6905: 1
col    3[2] @6908: 3
col    4[3] @6911: 128
col    5[4] @6915: 822624
col    6[1] @6920: 0
col    7[3] @6922: 498
col    8[2] @6926: 94
col    9[1] @6929: 0
col   10[2] @6931: 3
col   11[2] @6934: 2
col   12[0] @6937: *NULL*
col   13[0] @6938: *NULL*
col   14[0] @6939: *NULL*
col   15[0] @6940: *NULL*
col   16[2] @6941: 2
BBED> m /x 02 offset 6933
 File: /u01/oracle/oradata/ora11g/system01.dbf (1)
 Block: 225              Offsets: 6933 to 7444           Dba:0x004000e1
------------------------------------------------------------------------
 0202c103 ffffffff 02c1032c 001102c1 03145f53 5953534d 55325f34 32323832
 33383232 322402c1 0202c104 03c2022d 04c3531b 16018003 c2075303 c2031401
 8002c102 02c103ff ffffff02 c1032c00 1102c104 145f5359 53534d55 335f3232
 31303734 32363432 2402c102 02c10403 c2023d04 c3531b17 018003c2 074903c2
 02290180 02c10202 c103ffff ffff02c1 032c0011 02c10514 5f535953 534d5534
 5f313435 35333138 30303624 02c10202 c10403c2 024d04c3 4f4f5101 8003c205
 4502c164 018002c1 0402c103 ffffffff 02c1032c 001102c1 06145f53 5953534d
 55355f33 37383736 32323331 362402c1 0202c104 03c2025d 04c3531b 1c018003
 c2071603 c2021701 8002c102 02c103ff ffffff02 c1032c00 1102c107 145f5359
 53534d55 365f3234 36303234 38303639 2402c102 02c10403 c2030904 c3531b1d
 018003c2 075a03c2 02130180 02c10202 c103ffff ffff02c1 032c0011 02c10814
 5f535953 534d5537 5f313932 34383833 30333724 02c10202 c10403c2 031904c3
 531b2201 8003c207 5a03c202 1d018002 c10202c1 03ffffff ff02c103 2c001102
 c109135f 53595353 4d55385f 32373630 33383233 372402c1 0202c104 03c20329
 04c34f2e 34018003 c2071703 c2024a01 8002c102 02c103ff ffffff02 c1032c00
 1102c10a 145f5359 53534d55 395f3335 39333435 30363135 2402c102 02c10403
 <32 bytes per line>
…………类似方法修改其他值
BBED> sum apply
Check value for File 1, Block 225:
current = 0x4d51, required = 0x4d51

启动数据库并解决异常undo

SQL> conn / as sysdba
Connected to an idle instance.
SQL> startup pfile='/tmp/pfile'
ORACLE instance started.
Total System Global Area  313860096 bytes
Fixed Size                  1344652 bytes
Variable Size             218106740 bytes
Database Buffers           88080384 bytes
Redo Buffers                6328320 bytes
Database mounted.
Database opened.
SQL> drop tablespace undotbs1;
drop tablespace undotbs1
*
ERROR at line 1:
ORA-01561: failed to remove all objects in the tablespace specified
SQL> drop tablespace undotbs1 including contents;
drop tablespace undotbs1
*
ERROR at line 1:
ORA-01561: failed to remove all objects in the tablespace specified
SQL> select type# from seg$ where file#=3;
     TYPE#
----------
        10
        10
        10
        10
        10
        10
        10
        10
        10
        10
10 rows selected.
SQL> update seg$ set type#=3 where file#=3;
10 rows updated.
SQL> commit;
Commit complete.
SQL>  drop tablespace undotbs1 including contents;
Tablespace dropped.
SQL> create undo tablespace undotbs1 datafile '/u01/oracle/oradata/ora11g/undotbs01.dbf' size 10m;
Tablespace created.

相关文档
bbed修改undo$(回滚段)状态
因非常规操作导致删除表空间提示ORA-01561解决办法

记录8.0.5数据库恢复过程

某银行需要恢复2004年磁带备份数据库 8.0.5,当时因为开发商使用begin backup命令进行热备,然后压缩到磁带上.在硬盘紧张,rman不成熟的时代,使用这样的方法备份本身没有大问题,可是因为没有备份归档日志,给现在的恢复带来了很多的问题.
尝试不完全恢复启动库

SVRMGR> startup pfile='init.ora' mount
ORACLE instance started.
Total System Global Area                        141826208 bytes
Fixed Size                                          47264 bytes
Variable Size                                   124829696 bytes
Database Buffers                                 16777216 bytes
Redo Buffers                                       172032 bytes
Database mounted.
SVRMGR> recover database using backup controlfile until cancel;
ORA-00279: change 613561556 generated at 08/21/04 22:42:48 needed for thread 1
ORA-00289: suggestion : /oracle/product/8.0.5/dbs/arch1_55329.dbf
ORA-00280: change 613561556 for thread 1 is in sequence #55329
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
Media recovery cancelled.
SVRMGR> alter database open resetlogs;
alter database open resetlogs
*
ORA-00600: internal error code, arguments: [4146], [31400], [31370], [], [], [], [], []

这个错误是因为回滚段corruption导致,客户已经没有了一致性要求,所以解决办法是通过一些手段找出回滚段并屏蔽

屏蔽回滚段重启库

SVRMGR> shutdown abort
ORACLE instance shut down.
SVRMGR> startup  pfile='init.ora' mount
ORACLE instance started.
Total System Global Area                        141826208 bytes
Fixed Size                                          47264 bytes
Variable Size                                   124829696 bytes
Database Buffers                                 16777216 bytes
Redo Buffers                                       172032 bytes
Database mounted.
SVRMGR> alter database open;
alter database open
*
ORA-00600: internal error code, arguments: [3668], [1], [2], [55992], [55992], [4], [], []

根据ORA-00600[3668]错误的提示,因为重建控制文件,有一个数据文件因为在磁带恢复丢失,所以控制文件在上次重建的时候没有包含进去,根据经验在8.0中可以通过resetlogs解决

resetlogs重新打开数据库

SVRMGR> alter database open resetlogs;
alter database open resetlogs
*
ORA-01139: RESETLOGS option only valid after an incomplete database recovery
SVRMGR> recover datatabase using backup controlfile until cancel;
ORA-00274: illegal recovery option DATATABASE
SVRMGR> recover database using backup controlfile until cancel;
ORA-00279: change 613561558 generated at 07/30/12 09:14:49 needed for thread 1
ORA-00289: suggestion : /oracle/product/8.0.5/dbs/arch1_1.dbf
ORA-00280: change 613561558 for thread 1 is in sequence #1
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
Media recovery cancelled.
SVRMGR> alter database open resetlogs;
alter database open resetlogs
*
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small

根据经验可能是屏蔽了undo或者undo损坏,然后有事务可能需要读undo,无法读到对应记录,可能出现此错误,因为无trace文件,尝试推进scn解决.当然这个问题可以通过open时做10046然后深入分析也许可以找到原因.

open数据库成功
根据上面的分析,重启库,推进scn,resetlogs库解决问题

SVRMGR> shutdown immediate
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
SVRMGR> startup pfile='init.ora' mount;
ORACLE instance started.
Total System Global Area                        141826208 bytes
Fixed Size                                          47264 bytes
Variable Size                                   124829696 bytes
Database Buffers                                 16777216 bytes
Redo Buffers                                       172032 bytes
Database mounted.
SVRMGR> alter database open ;
alter database open
*
ORA-00600: internal error code, arguments: [3668], [1], [2], [55994], [55994], [4], [], []
SVRMGR> recover database using backup controlfile until cancel;
ORA-00279: change 613561560 generated at 07/30/12 09:17:11 needed for thread 1
ORA-00289: suggestion : /oracle/product/8.0.5/dbs/arch1_1.dbf
ORA-00280: change 613561560 for thread 1 is in sequence #1
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
Media recovery cancelled.
SVRMGR> alter database open resetlogs;
Statement processed.

这次恢复比较幸运,在备份到磁带过程中被覆盖的一个同名的数据文件,因为是index,通过查询dba_extents,然后删除并重建相关index,s数据库算恢复完全.有些时候,对数据库多一份备份,可能成为最后的救命稻草,哪怕是一份不是十分完整的备份.再次重复:对dba而言,数据库备份重于一切

bbed修改undo$(回滚段)状态

指定undo$对应数据块(11g 1_225,10g 1_106)[具体时候需要查询]

BBED> set file 1 block 225
        FILE#           1
        BLOCK#          225

查询数据内容,主要关注kdbr

BBED> map
 File: /u01/oracle/oradata/ora11g/system01.dbf (1)
 Block: 225                                   Dba:0x004000e1
------------------------------------------------------------
 KTB Data Block (Table/Cluster)
 struct kcbh, 20 bytes                      @0
 struct ktbbh, 48 bytes                     @20
 struct kdbh, 14 bytes                      @68
 struct kdbt[1], 4 bytes                    @82
 sb2 kdbr[31]                               @86      <===表示31条记录,从0开始
 ub1 freespace[3644]                        @148
 ub1 rowdata[4396]                          @3792
 ub4 tailchk                                @8188

查看30号回滚段内容(列举其中一个)

BBED> p *kdbr[30]
rowdata[0]
----------
ub1 rowdata[0]                              @3792     0x2c
BBED> x /rncnnnnnnn
rowdata[0]                                  @3792
----------
flag@3792: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@3793: 0x01
cols@3794:   17
col    0[2] @3795: 30
col   1[21] @3798: _SYSSMU30_3379578723$
col    2[2] @3820: 1
col    3[2] @3823: 3
col    4[3] @3826: 432
col    5[1] @3830: 0
col    6[1] @3832: 0
col    7[1] @3834: 0
col    8[1] @3836: 0
col    9[1] @3838: 0
col   10[2] @3840: 5     <===修改前对应值undo$.status$
col   11[2] @3843: 6
col   12[0] @3846: *NULL*
col   13[0] @3847: *NULL*
col   14[0] @3848: *NULL*
col   15[0] @3849: *NULL*
col   16[2] @3850: 2
BBED> x /r
rowdata[0]                                  @3792
----------
flag@3792: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@3793: 0x01
cols@3794:   17
col    0[2] @3795:  0xc1  0x1f
col   1[21] @3798:  0x5f  0x53  0x59  0x53  0x53  0x4d  0x55  0x33  0x30  0x5f
 0x33  0x33  0x37  0x39  0x35  0x37  0x38  0x37  0x32  0x33  0x24
col    2[2] @3820:  0xc1  0x02
col    3[2] @3823:  0xc1  0x04
col    4[3] @3826:  0xc2  0x05  0x21
col    5[1] @3830:  0x80
col    6[1] @3832:  0x80
col    7[1] @3834:  0x80
col    8[1] @3836:  0x80
col    9[1] @3838:  0x80
col   10[2] @3840:  0xc1  0x06   <===修改前16进制值
col   11[2] @3843:  0xc1  0x07
col   12[0] @3846: *NULL*
col   13[0] @3847: *NULL*
col   14[0] @3848: *NULL*
col   15[0] @3849: *NULL*
col   16[2] @3850:  0xc1  0x03

修改回滚段状态

BBED> m /x 02 offset 3842    <===注意修改为02
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: /u01/oracle/oradata/ora11g/system01.dbf (1)
 Block: 225              Offsets: 3842 to 4353           Dba:0x004000e1
------------------------------------------------------------------------
 0202c107 ffffffff 02c1032c 001102c1
 <32 bytes per line>
BBED> p *kdbr[30]
rowdata[0]
----------
ub1 rowdata[0]                              @3792     0x2c
BBED> x /r
rowdata[0]                                  @3792
----------
flag@3792: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@3793: 0x01
cols@3794:   17
col    0[2] @3795:  0xc1  0x1f
col   1[21] @3798:  0x5f  0x53  0x59  0x53  0x53  0x4d  0x55  0x33  0x30  0x5f
 0x33  0x33  0x37  0x39  0x35  0x37  0x38  0x37  0x32  0x33  0x24
col    2[2] @3820:  0xc1  0x02
col    3[2] @3823:  0xc1  0x04
col    4[3] @3826:  0xc2  0x05  0x21
col    5[1] @3830:  0x80
col    6[1] @3832:  0x80
col    7[1] @3834:  0x80
col    8[1] @3836:  0x80
col    9[1] @3838:  0x80
col   10[2] @3840:  0xc1  0x02   <===修改后16进制值
col   11[2] @3843:  0xc1  0x07
col   12[0] @3846: *NULL*
col   13[0] @3847: *NULL*
col   14[0] @3848: *NULL*
col   15[0] @3849: *NULL*
col   16[2] @3850:  0xc1  0x03
BBED>  x /rncnnnnnnn
rowdata[0]                                  @3792
----------
flag@3792: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@3793: 0x01
cols@3794:   17
col    0[2] @3795: 30
col   1[21] @3798: _SYSSMU30_3379578723$
col    2[2] @3820: 1
col    3[2] @3823: 3
col    4[3] @3826: 432
col    5[1] @3830: 0
col    6[1] @3832: 0
col    7[1] @3834: 0
col    8[1] @3836: 0
col    9[1] @3838: 0
col   10[2] @3840: 1    <===实际展示值undo$.status$
col   11[2] @3843: 6
col   12[0] @3846: *NULL*
col   13[0] @3847: *NULL*
col   14[0] @3848: *NULL*
col   15[0] @3849: *NULL*
col   16[2] @3850: 2

1.修改为02(表示的实际值比修改值大1[1:DELETE,2:OFFLINE, 3:ONLINE,4:UNDEFINED,5:NEEDS RECOVERY,6:PARTLY AVAILABLE,其他表示:UNDEFINED])
2.Offset需要在10列漂移量上+2(或者11列偏移量-1)

修改验证值

BBED> sum apply
Check value for File 1, Block 225:
current = 0x9708, required = 0x9708

补充说明
1)通过dbms_metadata.get_ddl分析DBA_ROLLBACK_SEGS可以知道seg$.status$表示含义

2, 'OFFLINE'
3, 'ONLINE',
4, 'UNDEFINED'
5, 'NEEDS RECOVERY',
6, 'PARTLY AVAILABLE'
other, 'UNDEFINED'

2)该方法一般出现在使用隐含参数屏蔽回滚段无效之后使用,一般不到最后,不建议使用该方法,可能引起不可预知的后果
3)使用该方法启动数据库后,需要手工删除seg$中被bbed处理过的回滚段

因非常规操作导致删除表空间提示ORA-01561解决办法

今天测试手工通过bbed修改undo$中回滚段状态(从status$=5修改为1)[NEEDS RECOVERY 修改为 DELETE],然后尝试删除表空间,发现不能删除
删除表空间提示ORA-01561

SQL> drop tablespace undotbs;
drop tablespace undotbs
*
ERROR at line 1:
ORA-01561: failed to remove all objects in the tablespace specified
SQL> drop tablespace undotbs including contents;
drop tablespace undotbs including contents
*
ERROR at line 1:
ORA-01561: failed to remove all objects in the tablespace specified

查询相关信息

SQL> select ts#,name from v$tablespace;
       TS# NAME
---------- ------------------------------
         0 SYSTEM
         1 SYSAUX
         4 USERS
         6 UNDOTBS
         3 TEMP
SQL> select name,ts#,status$ from undo$;
NAME                                  TS#    STATUS$
------------------------------ ---------- ----------
SYSTEM                                  0          3
_SYSSMU1_3138885392$                    2          1
_SYSSMU2_4228238222$                    2          1
_SYSSMU3_2210742642$                    2          1
_SYSSMU4_1455318006$                    2          1
_SYSSMU5_3787622316$                    2          1
_SYSSMU6_2460248069$                    2          1
_SYSSMU7_1924883037$                    2          1
_SYSSMU8_1909280886$                    2          1
_SYSSMU9_3593450615$                    2          1
_SYSSMU10_2490256178$                   2          1
NAME                                  TS#    STATUS$
------------------------------ ---------- ----------
_SYSSMU11_253524401$                    6          1
_SYSSMU12_842775869$                    6          1
_SYSSMU13_2794767139$                   6          1
_SYSSMU14_2067649841$                   6          1
_SYSSMU15_3270221471$                   6          1
_SYSSMU16_4094338609$                   6          1
_SYSSMU17_709661646$                    6          1
_SYSSMU18_699588262$                    6          1
_SYSSMU19_718640828$                    6          1
_SYSSMU20_3516920665$                   6          1
_SYSSMU21_793796797$                    6          1
NAME                                  TS#    STATUS$
------------------------------ ---------- ----------
_SYSSMU22_3988785920$                   6          1
_SYSSMU23_1828333848$                   6          1
_SYSSMU24_1223218862$                   6          1
_SYSSMU25_2939844199$                   6          1
_SYSSMU26_1317300205$                   6          1
_SYSSMU27_1654033223$                   6          1
_SYSSMU28_3748619502$                   6          1
_SYSSMU29_1868765904$                   6          1
_SYSSMU30_3379578723$                   6          1
31 rows selected.
SQL> select segment_name,status from dba_rollback_segs;
SEGMENT_NAME                   STATUS
------------------------------ ----------------
SYSTEM                         ONLINE

通过这里可以看出,通过bbed的修改,除system回滚段外,其他均已经被标志为delete状态,对于这样的情况,很本能的怀疑是extent或者segment未被清理掉导致

查询EXTENT和SEGMENT

SQL> select SEGMENT_NAME from dba_extents where TABLESPACE_NAME='UNDOTBS';
no rows selected
SQL> select segment_name from dba_segments where  TABLESPACE_NAME='UNDOTBS';
no rows selected
SQL> select count(*) from seg$ where ts#=6;
  COUNT(*)
----------
        10
SQL> select count(*) from seg$ where ts#=2;
  COUNT(*)
----------
         0
SQL> select file#,type# from  seg$ where ts#=6;
     FILE#      TYPE#
---------- ----------
         3         10
         3         10
         3         10
         3         10
         3         10
         3         10
         3         10
         3         10
         3         10
         3         10
10 rows selected.

通过查询我们发现SEG$中含有10条记录,而通过dbms_metadata.get_ddl分析DBA_SEGMENTS是的,得出type为10恰好是TYPE2 UNDO信息.

解决办法
删除掉这些因为手工修改undo$信息导致遗留下来的后遗症对象

SQL> delete from seg$ where ts#=6;
10 rows deleted.
SQL> commit;
Commit complete.
SQL> drop tablespace undotbs ;
Tablespace dropped.

这样的直接修改基表的做法,在一般的情况下非常不建议使用,可能带来系统不稳定.但是在数据库异常处理过程中,可能将成为一个法宝

truncate table强制终止导致ORA-00600[ktspfundo-2]

朋友的金蝶erp系统异常abort,让我帮忙分析原因.
ORA-00600[ktspfundo-2]错误

Fri Jul 27 08:53:33 2012
Errors in file /u01/oracle/admin/finance/udump/finance_ora_7687.trc:
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
ORA-01013: user requested cancel of current operation
Fri Jul 27 08:53:33 2012
Errors in file /u01/oracle/admin/finance/udump/finance_ora_7687.trc:
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
ORA-01013: user requested cancel of current operation
Fri Jul 27 08:54:16 2012
Errors in file /u01/oracle/admin/finance/udump/finance_ora_7687.trc:
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
ORA-01013: user requested cancel of current operation
Fri Jul 27 08:57:12 2012
Errors in file /u01/oracle/admin/finance/bdump/finance_smon_4156.trc:
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
Fri Jul 27 08:57:20 2012
ORACLE Instance finance (pid = 15)-Error 600 encountered while recovering transaction (8, 3) on object 34294107.
Fri Jul 27 08:57:20 2012
Errors in file /u01/oracle/admin/finance/bdump/finance_smon_4156.trc:
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
Fri Jul 27 09:07:14 2012
Errors in file /u01/oracle/admin/finance/bdump/finance_smon_4156.trc:
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
Fri Jul 27 09:07:15 2012
Errors in file /u01/oracle/admin/finance/bdump/finance_pmon_4130.trc:
ORA-00474: SMON process terminated with error

从这里可以大概看出数据库在进行一个参数,然后用户终止该操作导致,导致ORA-00600[ktspfundo-2]错误,然后出现smon回滚,因为回滚失败从而使得数据块down掉

分析trace文件

*** 2012-07-27 08:53:33.293
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
ORA-01013: user requested cancel of current operation
Current SQL statement for this session:
TRUNCATE TABLE VTC3B8DR2G7J926FWOBK839XOR
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FFF41B0EE70 ? 7FFF41B0EED0 ?
                                                   7FFF41B0EE10 ? 000000000 ?
ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
                                                   7FFF41B0EE70 ? 7FFF41B0EED0 ?
                                                   7FFF41B0EE10 ? 000000000 ?
ksfdmp()+21          call     ksedmp()             000000003 ? 000000001 ?
                                                   7FFF41B0EE70 ? 7FFF41B0EED0 ?
                                                   7FFF41B0EE10 ? 000000000 ?
kgerinv()+161        call     ksfdmp()             000000003 ? 000000001 ?
                                                   7FFF41B0EE70 ? 7FFF41B0EED0 ?
                                                   7FFF41B0EE10 ? 000000000 ?
kgeasnmierr()+163    call     kgerinv()            0068966E0 ? 2AE87C6E1168 ?
                                                   7FFF41B0EED0 ? 7FFF41B0EE10 ?
                                                   000000000 ? 000000000 ?
ktspfundo()+3902     call     kgeasnmierr()        0068966E0 ? 2AE87C6E1168 ?
                                                   7FFF41B0EED0 ? 7FFF41B0EE10 ?
                                                   000000010 ? 00689C0C0 ?
kcoubk()+351         call     ktspfundo()          7FFF41B10810 ? 2AE80C800CFA ?
                                                   4D6EE6014 ? 000000002 ?
                                                   000000010 ? 7FFF41B11128 ?
ktundo()+1208        call     kcoubk()             7FFF41B111F8 ? 7FFF41B10810 ?
                                                   2AE87E384024 ? 000000002 ?
                                                   000000002 ? 000000000 ?
ktubko()+499         call     ktundo()             000000001 ? 010E5C341 ?
                                                   2AE87E384020 ? 000000058 ?
                                                   000008430 ? 7657E9990 ?
ktuabt()+810         call     ktubko()             7657E9990 ? 7FFF41B1188C ?
                                                   000000002 ? 7FFF41B11648 ?
                                                   7FFF41B11570 ? 7FFF41B11870 ?
ktcrab()+292         call     ktuabt()             7657E98F8 ? 000000002 ?
                                                   000000002 ? 7FFF41B11648 ?
                                                   7FFF41B11570 ? 7FFF41B11870 ?
ktccle()+516         call     ktcrab()             7657E98F8 ? 000000002 ?
                                                   000000002 ? 7FFF41B11648 ?
                                                   7FFF41B11570 ? 7FFF41B11870 ?
ksepop()+384         call     ktccle()             000000006 ? 000000002 ?
                                                   000000002 ? 7FFF41B11648 ?
                                                   7FFF41B11570 ? 7FFF41B11870 ?
kgepop()+123         call     ksepop()             0068966E0 ? 000000006 ?
                                                   000000002 ? 7FFF41B11648 ?
                                                   7FFF41B11570 ? 7FFF41B11870 ?
kgesev()+315         call     kgepop()             0068966E0 ? 2AE87C6E1168 ?
                                                   0000003F5 ? 7FFF41B11648 ?
                                                   7FFF41B11570 ? 7FFF41B11870 ?
ksesec0()+186        call     kgesev()             0068966E0 ? 2AE87C6E1168 ?
                                                   0000003F5 ? 000000000 ?
                                                   7FFF41B11B30 ? 7FFF41B11870 ?
ksqcmi()+2322        call     ksesec0()            000000000 ? 000000000 ?
                                                   000001000 ? 000000000 ?
                                                   000000013 ? 000000005 ?
ksqcnv()+496         call     ksqcmi()             77E586B88 ? 000000006 ?
                                                   00000FFFF ? 00147AE14 ?
                                                   7FFF41B126A0 ? 7FFF41B128A8 ?
ksqcov()+44          call     ksqcnv()             77E586B88 ? 000000006 ?
                                                   000000000 ? 00147AE14 ?
                                                   7FFF41B128A8 ? 000000004 ?
kcbo_reuse_obj()+14  call     ksqcov()             77E586B68 ? 000000006 ?
09                                                 000000000 ? 00147AE14 ?
                                                   7FFF41B128A8 ? 000000004 ?
kcbrbo()+1126        call     kcbo_reuse_obj()     7FFF41B12F04 ? 7FFF41B12F0C ?
                                                   000000001 ? 00147AE14 ?
                                                   7FFF41B128A8 ? 000000004 ?
ktsstrn_segment()+3  call     kcbrbo()             7FFF41B12F04 ? 7FFF41B12F0C ?
941                                                000000001 ? 00147AE14 ?
                                                   7FFF41B128A8 ? 000000004 ?
kkbtts_trunc_tbl_se  call     ktsstrn_segment()    7FFF41B13180 ? 000000000 ?
g()+1018                                           0020C6444 ? 000000000 ?
                                                   7FFF41B14C00 ? 7FFF00000001 ?
kkbtrn()+8156        call     kkbtts_trunc_tbl_se  735ACA058 ? 77BC78D18 ?
                              g()                  000000000 ? 000000002 ?
                                                   000000000 ? 7FFF41B14C00 ?
opiexe()+15805       call     kkbtrn()             735ACA058 ? 000000000 ?
                                                   718831208 ? 000000000 ?
                                                   000000002 ? 7FFF00000000 ?
opiosq0()+3316       call     opiexe()             000000004 ? 000000000 ?
                                                   7FFF41B15F48 ? 00000000B ?
                                                   000000002 ? 7FFF00000000 ?
kpooprx()+315        call     opiosq0()            000000003 ? 00000000E ?
                                                   7FFF41B160B8 ? 0000000A4 ?
                                                   000000002 ? 7FFF00000000 ?
kpoal8()+799         call     kpooprx()            7FFF41B19264 ? 7FFF41B17280 ?
                                                   000000029 ? 000000001 ?
                                                   000000000 ? 7FFF00000000 ?
opiodr()+984         call     kpoal8()             00000005E ? 000000017 ?
                                                   7FFF41B19260 ? 000000001 ?
                                                   000000001 ? 7FFF00000000 ?
ttcpip()+1012        call     opiodr()             00000005E ? 000000017 ?
                                                   7FFF41B19260 ? 000000000 ?
                                                   0059C09B0 ? 7FFF00000000 ?
opitsk()+1322        call     ttcpip()             00689E3B0 ? 7FFF41B17248 ?
                                                   7FFF41B19260 ? 000000000 ?
                                                   7FFF41B18D58 ? 7FFF41B193C8 ?
opiino()+1026        call     opitsk()             000000003 ? 000000000 ?
                                                   7FFF41B19260 ? 000000001 ?
                                                   000000000 ? 4E58D8C00000001 ?
opiodr()+984         call     opiino()             00000003C ? 000000004 ?
                                                   7FFF41B1A428 ? 000000000 ?
                                                   000000000 ? 4E58D8C00000001 ?
opidrv()+547         call     opiodr()             00000003C ? 000000004 ?
                                                   7FFF41B1A428 ? 000000000 ?
                                                   0059C0460 ? 4E58D8C00000001 ?
sou2o()+114          call     opidrv()             00000003C ? 000000004 ?
                                                   7FFF41B1A428 ? 000000000 ?
                                                   0059C0460 ? 4E58D8C00000001 ?
opimai_real()+163    call     sou2o()              7FFF41B1A400 ? 00000003C ?
                                                   000000004 ? 7FFF41B1A428 ?
                                                   0059C0460 ? 4E58D8C00000001 ?
main()+116           call     opimai_real()        000000002 ? 7FFF41B1A490 ?
                                                   000000004 ? 7FFF41B1A428 ?
                                                   0059C0460 ? 4E58D8C00000001 ?
__libc_start_main()  call     main()               000000002 ? 7FFF41B1A490 ?
+244                                               000000004 ? 7FFF41B1A428 ?
                                                   0059C0460 ? 4E58D8C00000001 ?
_start()+41          call     __libc_start_main()  000723088 ? 000000002 ?
                                                   7FFF41B1A5E8 ? 000000000 ?
                                                   0059C0460 ? 000000002 ?
--------------------- Binary Stack Dump ---------------------

这里可以得到更加准确的信息,数据库在truncate table的时候,有人异常终止程序,导致数据库出现ORA-00600[ktspfundo-2].查询mos未发现相关bug记录,从这些信息初步判断是因为oracle的bug导致在truncate表的时候异常终止,然后出现该对象上的回滚记录异常(当然truncate本身不需要回滚,但是可能记录一些附带的回滚信息),然后出现对象回滚异常是的数据库down.重启数据库,对象回滚段信息已经自动回滚完成,数据库正常.因为truncate表被异常终止的情况本身不多见,引发该bug更不常见,如果只是偶尔发生一次,建议忽略该错误.当然如果有时间和兴趣,可以提交sr

数据库报ORA-00607/ORA-00600[4194]错误

昨天晚上处理一起比较奇特的ORA-00600[4194]错误的数据库恢复案例,客户数据库刚刚上线,因为一时疏忽没有做备份.谁知天有不测风云,就这样的系统也会出问题(数据库文件总共 5g redo log sequence#=9).这个事故告诉我们:作为dba在任何时候都不要有侥幸心理,备份重于一切
数据库报ORA-00607/ORA-00600[4194]错误

Thu Jul 26 13:21:11 2012
SMON: enabling cache recovery
Thu Jul 26 13:21:11 2012
Errors in file /orasvr/admin/mispdata/udump/mispdata_ora_2865.trc:
ORA-00600: internal error code, arguments: [4194], [31], [2], [], [], [], [], []
Thu Jul 26 13:21:11 2012
Doing block recovery for file 1 block 18
Block recovery from logseq 3994, block 3 to scn 89979535
Thu Jul 26 13:21:11 2012
Recovery of Online Redo Log: Thread 1 Group 1 Seq 3994 Reading mem 0
  Mem# 0: /orasvr/mispdata/redo01.log
Block recovery stopped at EOT rba 3994.5.16
Block recovery completed at rba 3994.5.16, scn 0.89979533
Doing block recovery for file 1 block 9
Block recovery from logseq 3994, block 3 to scn 89979532
Thu Jul 26 13:21:11 2012
Recovery of Online Redo Log: Thread 1 Group 1 Seq 3994 Reading mem 0
  Mem# 0: /orasvr/mispdata/redo01.log
Block recovery completed at rba 3994.5.16, scn 0.89979533
Thu Jul 26 13:21:11 2012
Errors in file /orasvr/admin/mispdata/udump/mispdata_ora_2865.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [31], [2], [], [], [], [], []
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 2865
ORA-1092 signalled during: ALTER DATABASE OPEN...

通过alert日志中,我们可以发现是因为ORA-00600[4194]导致数据库不能被正常open,但是这次不同的是在报ORA-00600之前有ORA-00607的错误出现,根据这个提示,应该是一个基本的数据块有问题导致.而ORA-00600[4194]是因为undo和redo不一致导致.对于本错误放在一起分析,大概的评估是因为内部对象的异常出现ora-607,导致undo和redo不一致出现ORA-00600[4194].

trace文件分析

--dump redo
DUMP OF REDO FROM FILE '/orasvr/mispdata/redo02.log'
 Opcodes *.*
 DBAs (file#, block#):
      (1, 18)
 RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
 SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
 Times: creation thru eternity
 FILE HEADER:
	Compatibility Vsn = 169870080=0xa200300
	Db ID=658120234=0x273a1e2a, Db Name='MISPDATA'
	Activation ID=658142762=0x273a762a
	Control Seq=16668=0x411c, File size=102400=0x19000
	File Number=2, Blksiz=512, File Type=2 LOG
 descrip:"Thread 0001, Seq# 0000003992, SCN 0x0000055c5e3c-0x0000055cac62"
 thread: 1 nab: 0x5 seq: 0x00000f98 hws: 0x6 eot: 0 dis: 0
 resetlogs count: 0x2d42646a scn: 0x0000.00000001 (1)
 resetlogs terminal rcv count: 0x0 scn: 0x0000.00000000
 prev resetlogs count: 0x0 scn: 0x0000.00000000
 prev resetlogs terminal rcv count: 0x0 scn: 0x0000.00000000
 Low  scn: 0x0000.055c5e3c (89939516) 07/26/2012 11:17:42
 Next scn: 0x0000.055cac62 (89959522) 07/26/2012 13:16:19
 Enabled scn: 0x0000.00000001 (1) 08/16/2011 11:50:10
 Thread closed scn: 0x0000.055cac61 (89959521) 07/26/2012 11:17:42
 Disk cksum: 0x3088 Calc cksum: 0x3088
 Terminal recovery stop scn: 0x0000.00000000
 Terminal recovery  01/01/1988 00:00:00
 Most recent redo scn: 0x0000.00000000
 Largest LWN: 0 blocks
 End-of-redo stream : No
 Unprotected mode
 Miscellaneous flags: 0x0
 Thread internal enable indicator: thr: 0, seq: 0 scn: 0x0000.00000000
--ORA-00600错误提示
*** 2012-07-26 13:21:11.566
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4194], [31], [2], [], [], [], [], []
Current SQL statement for this session:
update undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,
xactsqn=:8,scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12,spare1=:13 where us#=:1
--ora-607
Error 607 in redo application callback
TYP:0 CLS:16 AFN:1 DBA:0x00400012 OBJ:4294967295 SCN:0x0000.0551610e SEQ:  1 OP:5.1
ktudb redo: siz: 256 spc: 7892 flg: 0x0012 seq: 0x003d rec: 0x02
            xid:  0x0000.026.00000035
ktubl redo: slt: 38 rci: 0 opc: 11.1 objn: 15 objd: 15 tsn: 0
Undo type:  Regular undo        Begin trans    Last buffer split:  No
Temp Object:  No
Tablespace Undo:  No
             0x00000000  prev ctl uba: 0x00400012.003d.01
prev ctl max cmt scn:  0x0000.0550709b  prev tx cmt scn:  0x0000.0550709c
txn start scn:  0xffff.ffffffff  logon user: 0  prev brb: 4194318  prev bcl: 0 KDO undo record:
KTB Redo
op: 0x04  ver: 0x01
op: L  itl: xid:  0x0000.01e.00000035 uba: 0x00400012.003d.01
                      flg: C---    lkc:  0     scn: 0x0000.05511296
KDO Op code: URP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x0040006a  hdba: 0x00400069
itli: 1  ispac: 0  maxfr: 4863
tabn: 0 slot: 1(0x1) flag: 0x2c lock: 0 ckix: 0
ncol: 17 nnew: 12 size: 0
col  1: [ 9]  5f 53 59 53 53 4d 55 31 24
col  2: [ 2]  c1 02
col  3: [ 2]  c1 03
col  4: [ 2]  c1 0a
col  5: [ 5]  c4 5a 12 5a 14
col  6: [ 1]  80
col  7: [ 4]  c3 08 5f 3d
col  8: [ 4]  c3 02 38 52
col  9: [ 1]  80
col 10: [ 2]  c1 04
col 11: [ 2]  c1 02
col 16: [ 2]  c1 02
Block after image is corrupt:
buffer tsn: 0 rdba: 0x00400012 (1/18)
scn: 0x0000.0551610e seq: 0x01 flg: 0x04 tail: 0x610e0201
frmt: 0x02 chkval: 0x65f8 type: 0x02=KTU UNDO BLOCK

这里信息比较多:
1.dump redo部分得到file 1 block 18块可能异常
2.ora-600部分可以得出数据库在执行undo$对象update的回滚操作时候报错
3.通过ora-607信息得到update undo$记录对应的数据块是file 1 block 106(dba 0x00400069),在相同数据库版本数据库中查询.也就是说undo$这个回滚段回滚的时候出现错误.

SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
  3   WHERE FILE_ID = &FILE_ID
  4     AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
Enter value for file_id: 1
old   3:  WHERE FILE_ID = &FILE_ID
new   3:  WHERE FILE_ID = 1
Enter value for block_id: 106
old   4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
new   4:    AND 106 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
OWNER
------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE       TABLESPACE_NAME                PARTITION_NAME
------------------ ------------------------------ ------------------------------
SYS
UNDO$
TABLE              SYSTEM

4.发现dba 0x00400012发现坏块是file 1 block 18,查询坏块对象为

SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
  3   WHERE FILE_ID = &FILE_ID
  4     AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
Enter value for file_id: 1
old   3:  WHERE FILE_ID = &FILE_ID
new   3:  WHERE FILE_ID = 1
Enter value for block_id: 18
old   4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
new   4:    AND 18 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
OWNER
------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE       TABLESPACE_NAME                PARTITION_NAME
------------------ ------------------------------ ------------------------------
SYS
SYSTEM
ROLLBACK           SYSTEM

通过这里的分析,大概可以确定这次故障的原因:
因为ROLLBACK(file 1 block 18)坏块,redo 恢复undo 出现异常出现ORA-607,使得undo和redo不一致从而出现ORA-00600[4194],导致undo$(file 1 block 106)中的一条update事务不能被正常提交或者回滚,从而使得该数据库不能被正常打开.
针对这个库因为ROLLBACK异常,使用隐含参数无法屏蔽该回滚段,因为这个数据量非常小,我们选择了挖数据文件.如果数据量比较大,可以通过bbed尝试提交undo$(file 1 block 106)数据块中事务,看人品是否能够正常启动.

DUL挖ORACLE 8.0数据库

老古董的东西,在你一不小心的时候就可能遇到,测试了dul成功挖ORACLE 8.0数据库
创建模拟环境

SVRMGR> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle8 Release 8.0.5.0.0 - Production
PL/SQL Release 8.0.5.0.0 - Production
CORE Version 4.0.5.0.0 - Production
TNS for 32-bit Windows: Version 8.0.5.0.0 - Production
NLSRTL Version 3.3.2.0.0 - Production
5 rows selected.
SVRMGR> create table t_xifenfei
     2> as
     3> select * from dba_tables;
Statement processed.
SVRMGR> select count(*) from t_xifenfei;
COUNT(*)
----------
       183
1 row selected.
SVRMGR> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.

dul参数配置

    osd_big_endian_flag=false
    osd_dba_file_bits=10
    osd_c_struct_alignment=32
    osd_file_leader_size=1
    dc_columns=2000000
    dc_tables=10000
    dc_objects=1000000
    dc_users=400
    dc_segments=100000
    control_file = control.txt
    db_block_size=2048
    export_mode=true
    compatible=8
    file = dump

dul挖数据

C:\dul>dul8.exe dictv8.ddl
Data UnLoader 8.0.6.8 - Internal Use Only - on Thu Jul 26 20:08:33 2012
with 64-bit io functions
Copyright (c) 1994/2000 Bernard van Duijnen All rights reserved.
DUL: Warning: Recreating file "dul.log"
Parameter altered
Parameter altered
Parameter altered
Parameter altered
. unloading table                      OBJ$    2681 rows unloaded
. unloading table                      TAB$     187 rows unloaded
. unloading table                      COL$   11181 rows unloaded
. unloading table                     USER$      23 rows unloaded
. unloading table                  TABPART$       0 rows unloaded
. unloading table                      IND$     221 rows unloaded
. unloading table                     ICOL$     419 rows unloaded
. unloading table                      LOB$      13 rows unloaded
Life is DUL without it
C:\dul>dul8.exe
Data UnLoader 8.0.6.8 - Internal Use Only - on Thu Jul 26 20:13:44 2012
with 64-bit io functions
Copyright (c) 1994/2000 Bernard van Duijnen All rights reserved.
DUL: Warning: Recreating file "dul.log"
Loaded 23 entries from USER.dat
Loaded 2681 entries from OBJ.dat
Loaded 188 entries from TAB.dat
Loaded 11218 entries from COL.dat
Loaded 0 entries from TABPART.dat
Loaded 221 entries from IND.dat
Loaded 13 entries from LOB.dat
Loaded 419 entries from ICOL.dat
DUL> unload table chf.t_xifenfei;
. unloading table                T_XIFENFEI     183 rows unloaded

启动8.0数据库

C:\oracle\ora80\BIN>SVRMGR30.EXE
Oracle Server Manager Release 3.0.5.0.0 - Production
(c) Copyright 1997, Oracle Corporation.  All Rights Reserved.
Oracle8 Release 8.0.5.0.0 - Production
PL/SQL Release 8.0.5.0.0 - Production
SVRMGR> connect internal/xifenfei
Connected.
SVRMGR> startup
ORACLE instance started.
Total System Global Area                         15077376 bytes
Fixed Size                                          49152 bytes
Variable Size                                    12906496 bytes
Database Buffers                                  2048000 bytes
Redo Buffers                                        73728 bytes
Database mounted.
Database opened.

imp导入数据

C:\dul>imp chf/xifenfei file=dump001.dmp full=y
Import: Release 8.0.5.0.0 - Production on 星期二 11月 1 23:34:36 2011
(c) Copyright 2000 Oracle Corporation.  All rights reserved.
连接到: Oracle8 Release 8.0.5.0.0 - Production
PL/SQL Release 8.0.5.0.0 - Production
经由常规路径导出由EXPORT:V07.00.07创建的文件
警告: 此对象由 Bernard's DUL 导出, 而不是当前用户
. 正在将Bernard's DUL的对象导入到 CHF
. . 正在导入表                    "T_XIFENFEI"        183行被导入
成功终止导入,但出现警告。

测试恢复数据

SVRMGR> connect chf/xifenfei
Connected.
SVRMGR> select count(*) from t_xifenfei;
COUNT(*)
----------
       183
1 row selected.

再次证明了dul确实异常的强大,第三方的工具在某些方面确实不是它的对手

inactive transaction branch等待事件

分析一份awr,发现不太熟悉的等待事件”inactive transaction branch”,awr相关信息如下



分析top 1 sql中的对象

SQL> select * from v$version;
BANNER
-----------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for IBM/AIX RISC System/6000: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production
SQL> select owner,object_type from dba_objects where object_name =upper('wCommonShortMsg');
OWNER                          OBJECT_TYPE
------------------------------ -------------------
PUBLIC                         SYNONYM
OFFONQUERY                     SYNONYM
SQL> COL DB_LINK FOR A12
SQL> select TABLE_OWNER,TABLE_NAME,DB_LINK from dba_SYNONYMS WHERE
   2 SYNONYM_NAME=upper('wCommonShortMsg');
TABLE_OWNER                    TABLE_NAME                     DB_LINK
------------------------------ ------------------------------ ------------
                               WCOMMONSHORTMSG                CRMDB_LINK
                               WCOMMONSHORTMSG                DB_LINK_CRM

通过这里查询,可以确定引起dblink相关等待严重的是关于wCommonShortMsg同义词查询导致(使用dblink连接到其他库),结合数据库版本,大致可以确定inactive transaction branch等待和MOS中的bug 10413418相符

记录一次AIX 4.3.0+ORACLE 8.0.5恢复过程

最近接手一个AIX下面ORACLE 8.0.5恢复需求.这个需求要从2个月前说起.2个月前有客户提出需求,帮他们恢复8.0.5的库(该库被9.2.0.4的ORACLE强制拉过,但是未成功).当我准备去恢复的时候,客户和我说数据库的存储找不到了,暂时不用处理.本来这个事情到此也就算结束了,最多算一个笑话(找人数据库恢复,发现数据库丢了).前几天又接到关于这个恢复的请求,说他们还有磁带的备份(备份方式:begin tablespace backup+cp+archivelog),让我去协助恢复.按照客户的描述,他们购买了国外的读磁带机器可以读取磁带到硬盘,他们解压好数据文件,然后我去恢复,而且数据文件,控制文件,归档日志都存在.我当时认为这个是一个简单的恢复,顺利的话,一个小时就可以搞定.这里告诉我们:哪怕是N久的备份,也可能是最后的救命数据(2004年的磁带备份),一定要做好备份
今天到达现场,客户第一句话:文件没有解压,第二句话:归档日志没有恢复出来.听到这里,我有点感觉情况很不妙.
详细一问:文件没有解压是因为AIX 4.3.0默认的是文件系统格式是jfs,最大支持lv的大小为20g(估计和客户参数有关系),现在如果解压需要分到多个目录中(数据库原始运行就是按照该模式进行的,如:oradata001,oradata002等分别放几个数据文件),需要我去给他们做规划,如果存放这些文件.大脑一晕,dba原来真的什么活都要干啊.本身就对AIX不熟悉,还要去想办法处理这些问题,而且是从来没有见过的AIX 4.3.0.最后通过我和客户的一起努力终于解决了这个问题:使用其他技巧在AIX 4.3中建立jfs2文件系统(先划分小的LV,使用jfs系统,然后修改系统为jfs2,然后增加lv大小),解决以前jfs文件系统限制,解压的时候需要规划文件目录的体力活.这个问题告诉我们:有时候解决问题需要学会变通
归档日志没有恢复出来的原因:因为连续几天的恢复,加上客户本身工作繁忙,可能实在是太累,在最后一盘磁带的恢复的时候(一盘磁带15小时,一共4盘),客户敲错了命令tar -xvf输入成了-cvf,使得磁盘头被覆盖,磁带原则上报废,从而使得归档日志无法恢复出来.这一点点的事故告诉我们:越疲劳越容易出错,越到最后越容易出错,一定要小心谨慎
到这一步,没有解压(已经解决文件系统问题,接下来的解压问题不大),没有归档(修改scn原则上可以解决),这些东西总的来说问题都不大,当我安装好AIX FOR ORACLE 8.0.5,启动数据库到mount状态,核对恢复出来的数据文件和控制文件中的数据文件的时候,发现少了好几个,这下不能容忍了(能够open库,但是可能丢失需要数据,这个太不划算[因为用户还有该备份的前几天的备份]),寻找出现数据文件从磁带中丢失原因:1.因为jfs文件系统限制,不停的mv到其他目录导致丢失.2.uncompress解压丢失.3.最后一盘磁带损坏导致丢失.一切原因都是浮云,解决了jfs2文件系统,客户根据当前的情况,决定使用其他的备份再次从磁带中导出,然后进行恢复
AIX 4.3安装ORACLE 8.0.5

--检查内存
 lsattr -El sys0 -a realmem
--检查交换分区
 lsps -a
--检查临时目录
 df -k /tmp
--检查操作系统位数
getconf HARDWARE_BITMODE
bootinfo -y
--检查操作系统版本号
 oslevel -r
--检查软件包
 lslpp -l bos.adt.base, bos.adt.libm
--检查补丁包
 instfix -i | grep IX71948
--升级aix
smit install_latest
smit update_all
增加用户:
useradd   oracle
增加组:
mkgrp   dba
更改用户所属组:
usermod   -g  dba oracle
更改用户密码:
passwd   oracle
pwdadm   oracle
--关于用户
smit mkuser	建立用户
smit lsuser	列出所有用户的属性
lsuser ALL
smit chuser	改变用户属性
rmuser -p *	删除用户*
smit rmuser	只删除用户,但是不删除所属目录,等于rmuser
smit passwd	修改密码
smit lockuser	给用户加锁
--关于用户组
smit mkgroup	建立新组
smit lsgroup	显示所有组的属性
smit chgroup	修改组的属性
smit rmgroup	删除*组
rmgroup *
配置shell limits( smit chuser)
  soft FILE size -1
  soft CPU time -1
  soft DATA segment -1
  soft STACK size -1
新建目录:
mkdir   /u01
更改目录属主:
chown   oracle   /u01
更改目录所属组:
chgrp   dba   /u01
vi   /home/oracle/.profile
export   LINK_CNTRL=L_PTHREADS_D7
export   NLS_LANG=american_america.zhs16cgb231280
export	 ORACLE_OWNER=oracle
export   ORACLE_TERM=vt100
#export   ORACLE_TERM=xterm
export   ORACLE_BASE=/oracle
export   ORACLE_HOME=$ORACLE_BASE/product/8.0.5
export   ORACLE_SID=ora8
export   LD_LIBRARY_PATH=$/ORACLE_HOME/lib:$LD_LIBRARY_PATH
export   LIBPATH=$ORACLE_HOME/lib:$LIBPATH
export   ORA_NLS32=$ORACLE_HOME/ocommon/nls/admin/data
export   PATH=$ORACLE_HOME/bin:$PATH
export   TMPDIR=/tmp
export   DISPLAY=172.100.1.2:0.0
set -o vi
umask 022
--Mount产品光盘
$   su   root
-查看光驱
$ lsdev -Cc cdrom
#   mkdir   /cdrom
#   chmod   777   /cdrom
#   /etc/mount   -rv   cdrfs   /dev/cd0   /cdrom
#   exit
--运行rootpre.sh脚本。
$   su   root
#   cd   /cdrom/orainst
#   ./rootpre.sh
#   exit
--运行安装程序
$   cd   /cdrom/orainst
-图形
$   ./orainst   /m
-字符
$   ./orainst   /c
# cd $ORACLE_HOME/orainst
# ./root.sh

建立密码文件

cd $ORACLE_HOME/dbs
orapwd file=orapw$ORACLE_SID password=oracle

创建pfile文件

vi $ORACLE_HOME/dbs/init.ora
db_name=ORCL
db_files = 5000
control_files = /oradata/ctl1ORCL.ora
db_file_multiblock_read_count =  8
db_block_buffers =  100000
shared_pool_size =  115343360
log_checkpoint_interval = 10000
processes =  590
parallel_max_servers = 5
log_buffer =  163840
sequence_cache_entries =  100
sequence_cache_hash_buckets =  90
max_dump_file_size = 102400
global_names = TRUE
background_dump_dest=/oracle/trace
user_dump_dest=/oracle/trace
db_block_size = 4196
remote_login_passwordfile = shared
text_enable = TRUE
job_queue_processes = 2
job_queue_interval = 10
job_queue_keep_connections = false
distributed_lock_timeout = 300
distributed_transactions = 5
open_links = 4

操作8.0.5数据库

--结果测试与win,linux/unix使用svrmgrl命令
C:\oracle\ora80\BIN>SVRMGR30.EXE
Oracle Server Manager Release 3.0.5.0.0 - Production
(c) Copyright 1997, Oracle Corporation.  All Rights Reserved.
Oracle8 Release 8.0.5.0.0 - Production
PL/SQL Release 8.0.5.0.0 - Production
SVRMGR> connect system/manager
Connected.
SVRMGR> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle8 Release 8.0.5.0.0 - Production
PL/SQL Release 8.0.5.0.0 - Production
CORE Version 4.0.5.0.0 - Production
TNS for 32-bit Windows: Version 8.0.5.0.0 - Production
NLSRTL Version 3.3.2.0.0 - Production
5 rows selected.
SVRMGR> connect internal/oracle
Connected.
SVRMGR> archive log list
Database log mode              No Archive Mode
Automatic archival             Disabled
Archive destination            %RDBMS80%\
Oldest online log sequence     3
Current log sequence           6
SVRMGR> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SVRMGR> startup mount;
ORACLE instance started.
Total System Global Area                         15077376 bytes
Fixed Size                                          49152 bytes
Variable Size                                    12906496 bytes
Database Buffers                                  2048000 bytes
Redo Buffers                                        73728 bytes
Database mounted.
SVRMGR> alter database archivelog;
Statement processed.
SVRMGR> archive log list;
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            C:\oracle\ora80\RDBMS80\
Oldest online log sequence     3
Next log sequence to archive   6
Current log sequence           6

通过awr指标评估会话建立是否频繁

有朋友问我,通过awr怎么来判断系统这个时间段的会话建立情况,也就是说如果中间件或者客户端程序发生异常,过多的连接数据库.
我这里有个例子,用户的数据库大概在每秒钟建立10个连接左右(相对而言比较频繁)

[oracle@xifenfei tmp]$ grep "18-JUL-2012 18:40:03" listener.log |wc -l
9
[oracle@xifenfei tmp]$ grep "18-JUL-2012 18:40:04" listener.log |wc -l
7
[oracle@xifenfei tmp]$ grep "18-JUL-2012 18:41:04" listener.log |wc -l
12
[oracle@xifenfei tmp]$ grep "18-JUL-2012 18:40" listener.log |wc -l
554
[oracle@xifenfei tmp]$ grep "18-JUL-2012 18:50" listener.log |wc -l
598
[oracle@xifenfei tmp]$ grep "18-JUL-2012 19:00" listener.log |wc -l
597

我们分析这个时间段的awr,看那些地方是可以表明用户会话建立频繁
awr汇总信息
通过这个信息我们可以发现awr报告时间段(120分钟),和数据库在起点和终点的会话数
说明:该数据库问题很多,出现负载高,不全是会话建立频繁导致,这里只分析建立会话相关情况

Load Profile信息
通过这里的Logons为36对于这样的系统来说,明显异常

Top 5 Timed Events
这里的latch: session allocation等待就是比较明显的建立会话时候出现的等待

Time Model Statistics
Time Model Statistics中的connection management call elapsed time大家都明白的,建立会话花费时间

Dictionary Cache Stats
dc_usernames和dc_users请求值偏大

总结说明
在实际工作中:遇到过因为session建立太频繁导致监听繁忙,tnsping延迟比较严重案例,也遇到因为会话建立频繁导致系统内存被消耗完的案例.
在遇到会话建立过于频繁的案例,最有力的说明证据是监听日志,因为awr中的相关数据没有绝对标准(而且awr本身也是一个相对性的东西),而且一般客户对awr中我刚刚列举的数据概念性不强,所以一般只能作为分析的辅助工具,或者为进一步分析监听日志提供理由依据.