证明递归session存在并解释为什么不在v$session中显示

我们在数据库的使用过程中,有时候会遇到类似情况,我会话是登录的,但是我进行某种操作,缺报session不足.这种情况证明该sql后台还产生了其他会话,这里通过试验分析证明了递归session的存在
会话创建表报session超

CDB_PDB@CHF>  create table t_xifenfei(id number) ;
create table t_xifenfei(id number)
ERROR at line 1:
ORA-00018: maximum number of sessions exceeded

这里有个问题:当前会话已经登录成功了,证明当前session是足够的,但是为什么在执行创建表操作之时依然会报ORA-00018呢?通过10046继续分析

CDB_PDB@CHF> alter session set events '10046 TRACE NAME CONTEXT FOREVER, LEVEL 12';
会话已更改。
CDB_PDB@CHF> create table t_xifenfei as select * from dual;
表已创建。
CDB_PDB@CHF> select value from v$diag_info where name='Default Trace File';
VALUE
--------------------------------------------------------------------------------
E:\APP\XIFENFEI\diag\rdbms\cdb\cdb\trace\cdb_ora_6596.trc

分析trace文件

CDB_PDB@CHF> host tkprof  E:\APP\XIFENFEI\diag\rdbms\cdb\cdb\trace\cdb_ora_6596.trc d:/1.txt
--查看trace文件,发现里面有很多基表操作,拿其中的一个tab$表分析,创建表过程有如下insert操作
insert into tab$(obj#,ts#,file#,block#,bobj#,tab#,intcols,kernelcols,clucols,
  audit$,flags,pctfree$,pctused$,initrans,maxtrans,rowcnt,blkcnt,empcnt,
  avgspc,chncnt,avgrln,analyzetime,samplesize,cols,property,degree,instances,
  dataobj#,avgspc_flb,flbcnt,trigflag,spare1,spare6)
values
(:1,:2,:3,:4,decode(:5,0,null,:5),decode(:6,0,null,:6),:7,:8,decode(:9,0,null,
  :9),:10,:11,:12,:13,:14,:15,:16,:17,:18,:19,:20,:21,:22,:23,:24,:25,
  decode(:26,1,null,:26),decode(:27,1,null,:27),:28,:29,:30,:31,:32,:33)

尝试人工插入

CDB_PDB@CHF> insert into sys.tab$ select * from sys.tab$ where rownum=1;
insert into sys.tab$ select * from sys.tab$ where rownum=1
                *
第 1 行出现错误:
ORA-01031: 权限不足

证明当前执行创建表的session无权限直接操作tab$表,证明应该有其他表操作它

v$session视图基表
通过查询V$FIXED_VIEW_DEFINITION视图获得相关sql语句,不同版本可能有出入,但是大体一致

/* Formatted on 2013/11/8 23:09:30 (QP5 v5.227.12220.39754) */
SELECT inst_id,
       addr,
       indx,
       ksuseser,
       ksuudses,
       ksusepro,
       ksuudlui,
       ksuudlna,
       ksuudoct,
       ksusesow,
       DECODE (ksusetrn, HEXTORAW ('00'), NULL, ksusetrn),
       DECODE (ksqpswat, HEXTORAW ('00'), NULL, ksqpswat),
       DECODE (BITAND (ksuseidl, 11),
               1, 'ACTIVE',
               0, DECODE (BITAND (ksuseflg, 4096), 0, 'INACTIVE', 'CACHED'),
               2, 'SNIPED',
               3, 'SNIPED',
               'KILLED'),
       DECODE (ksspatyp,
               1, 'DEDICATED',
               2, 'SHARED',
               3, 'PSEUDO',
               'NONE'),
       ksuudsid,
       ksuudsna,
       ksuseunm,
       ksusepid,
       ksusemnm,
       ksusetid,
       ksusepnm,
       DECODE (BITAND (ksuseflg, 19),
               17, 'BACKGROUND',
               1, 'USER',
               2, 'RECURSIVE',
               '?'),
       ksusesql,
       ksusesqh,
       ksusepsq,
       ksusepha,
       ksuseapp,
       ksuseaph,
       ksuseact,
       ksuseach,
       ksusecli,
       ksusefix,
       ksuseobj,
       ksusefil,
       ksuseblk,
       ksuseslt,
       ksuseltm,
       ksusectm,
       DECODE (BITAND (ksusepfl, 16), 0, 'NO', 'YES'),
       DECODE (ksuseft,
               2, 'SESSION',
               4, 'SELECT',
               8, 'TRANSACTIONAL',
               'NONE'),
       DECODE (ksusefm,
               1, 'BASIC',
               2, 'PRECONNECT',
               4, 'PREPARSE',
               'NONE'),
       DECODE (ksusefs, 1, 'YES', 'NO'),
       ksusegrp,
       DECODE (BITAND (ksusepfl, 16),
               16, 'ENABLED',
               DECODE (BITAND (ksusepfl, 32), 32, 'FORCED', 'DISABLED')),
       DECODE (BITAND (ksusepfl, 64),
               64, 'FORCED',
               DECODE (BITAND (ksusepfl, 128), 128, 'DISABLED', 'ENABLED')),
       DECODE (BITAND (ksusepfl, 512),
               512, 'FORCED',
               DECODE (BITAND (ksusepfl, 256), 256, 'DISABLED', 'ENABLED')),
       ksusecqd,
       ksuseclid
  FROM x$ksuse
 WHERE BITAND (ksspaflg, 1) != 0 AND BITAND (ksuseflg, 1) != 0

注意:v$session查询的肯定是BITAND (ksuseflg, 1)!=0的记录

通过锁住表测试
CDB_PDB@SYS表示sys用户,CDB_PDB@CHF表示chf用户,使用两个session,不同用户测试

CDB_PDB@SYS> show user;
USER 为 "SYS"
--SYS用户锁住表
CDB_PDB@SYS> lock table tab$ IN exclusive MODE;
表已锁定。
CDB_PDB@CHF> show user;
USER 为 "CHF"
CDB_PDB@CHF>  select sid from v$mystat where rownum=1;
       SID
----------
        57
CDB_PDB@CHF> select paddr from v$session where sid=57;
PADDR
----------------
000007FF1E10F228
--CHF用户创建表
CDB_PDB@CHF>  create table t_xifenfei_new as select * from dual;
--SYS用户查询
CDB_PDB@SYS> SELECT s.addr,
  2         s.indx sid,
  3         s.ksuseser SERIAL#,
  4         ksuudsna username,
  5         DECODE (BITAND (ksuseflg, 19),
  6                 17, 'BACKGROUND',
  7                 1, 'USER',
  8                 2, 'RECURSIVE',
  9                 '?')
 10            TYPE
 11    FROM x$ksuse s
 12   WHERE ksusepro = '000007FF1E10F228';
ADDR                    SID    SERIAL# USERNAME                       TYPE
---------------- ---------- ---------- ------------------------------ ----------
000007FF1E1EBEA0         57         23 CHF                            USER
000007FF1E1D7F90         67        183 SYS                            RECURSIVE
CDB_PDB@SYS> SELECT ksuudsna username,
  2         ksuseflg
  3    FROM x$ksuse s
  4   WHERE ksusepro = '000007FF1E10F228';
USERNAME                         KSUSEFLG
------------------------------ ----------
CHF                             135266369
SYS                                     2
--这里我们发现递归sys调用的sql,在v$session视图中被排除了,因此递归sql的session不能在v$session显示
CDB_PDB@SYS> select bitand(2,1) from dual;
BITAND(2,1)
-----------
          0

至此,我们可以验证,我们当前的会话,在创建表的过程中有一个sys的递归session执行了关于基表的操作,但是由于v$session视图对于x$ksuse表中的部分记录进行了过滤因此我们不能在v$session查看到这些递归session

继续分析bitand函数
通过观察v$session的创建语句,我们可以发现如下规律,如果某个session是递归session,那么BITAND (ksuseflg, 19)=2,那当这个值为2的时候,是不是BITAND (ksuseflg, 1)一定为0呢?bitand函数实际上就是把里面的两个参数转换为二进制然后进行and运算,也就是两个对应位都为1的情况才会结果得带1(bitand(3,1)=1,bitand(2,1)=0),这里可以发现19转换为二进制为10011,要使得BITAND (ksuseflg, 19)=2成立,那就是说ksuseflg转换为二进制后,最后一位必须是0;而BITAND (ksuseflg, 1)在这样的情况下,一定为0,因此递归session的一定不会在v$session视图显示.

因RAC的undo_management参数不一致导致数据库mount报ORA-01105 ORA-01606

环境Linux 5.8 10.2.0.5 RAC,两个节点只能一个节点mount,如果尝试mount另外节点就报ORA-01105和ORA-01606错误
数据库版本

SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bi
PL/SQL Release 10.2.0.5.0 - Production
CORE    10.2.0.5.0      Production
TNS for Linux: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production

crs资源情况

[oracle@node1 dbs]$ $ORA_CRS_HOME/bin/crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora....D1.inst application    OFFLINE   OFFLINE
ora....D2.inst application    ONLINE    ONLINE    node2
ora.PROD.db    application    ONLINE    ONLINE    node2
ora....SM1.asm application    ONLINE    ONLINE    node1
ora....E1.lsnr application    ONLINE    ONLINE    node1
ora.node1.gsd  application    ONLINE    ONLINE    node1
ora.node1.ons  application    ONLINE    ONLINE    node1
ora.node1.vip  application    ONLINE    ONLINE    node1
ora....SM2.asm application    ONLINE    ONLINE    node2
ora....E2.lsnr application    ONLINE    ONLINE    node2
ora.node2.gsd  application    ONLINE    ONLINE    node2
ora.node2.ons  application    ONLINE    ONLINE    node2
ora.node2.vip  application    ONLINE    ONLINE    node2

节点1 mount报错

SQL> startup
ORACLE instance started.
Total System Global Area  171966464 bytes
Fixed Size                  2094832 bytes
Variable Size             113248528 bytes
Database Buffers           50331648 bytes
Redo Buffers                6291456 bytes
ORA-01105: mount is incompatible with mounts by other instances
ORA-01606: gc_files_to_locks not identical to that of another mounted instance
Error:    ORA 1105
Text:     mount is incompatible with mounts by other instances
-------------------------------------------------------------------------------
Cause:  An attempt was made to mount the database, but another instance has already mounted
        a database by the same name, and the mounts are not compatible.
        dditional messages will accompany this message to report why the mounts are incompatible.
Action:  See the accompanying messages for the appropriate action to take.
Error:  ORA 1606
Text:   GC_FILES_TO_LOCKS not identical to that of another mounted instance
-------------------------------------------------------------------------------
Cause:  The initialization parameter GC_FILES_TO_LOCKS is not the same as
        another instance mounted in parallel mode.
        This parameter must be the same as that for all shared instances.
Action: Modify the parameter to be compatible with the other instances, then
        shut down and restart the instance.

根据这个错误提示,查询两个节点的gc_files_to_locks参数,均为空值(默认值),也就是值相同

SQL> show parameter gc_files_to_locks;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
gc_files_to_locks                    string

检查两个节点的gc相关隐含参数,发现所有值也均一致

NAME                           DESCRIPTION                                                           VALUE
------------------------------ --------------------------------------------------------------------- -------
_gc_affinity_limit             dynamic affinity limit                                                50
_gc_affinity_minimum           dynamic affinity minimum activity per minute                          6000
_gc_affinity_time              if non zero, enable dynamic object affinity                           10
_gc_async_memcpy               if TRUE, use async memcpy                                             FALSE
_gc_check_bscn                 if TRUE, check for stale blocks                                       TRUE
_gc_coalesce_recovery_reads    if TRUE, coalesce recovery reads                                      TRUE
_gc_defer_time                 how long to defer down converts for hot buffers                       3
_gc_dissolve_undo_affinity     if TRUE, dissolve undo affinity after an offline                      FALSE
_gc_dynamic_affinity_locks     if TRUE, get dynamic affinity locks                                   TRUE
_gc_element_percent            global cache element percent                                          103
_gc_global_lru                 turn global lru off, make it automatic, or turn it on                 AUTO
_gc_initiate_undo_affinity     if TRUE, initiate undo affinity after an online                       TRUE
_gc_integrity_checks           set the integrity check level                                         1
_gc_keep_recovery_buffers      if TRUE, make recovery buffers current                                TRUE
_gc_latches                    number of latches per LMS process                                     8
_gc_maximum_bids               maximum number of bids which can be prepared                          0
_gcs_fast_reconfig             if TRUE, enable fast reconfiguration for gcs locks                    TRUE
_gcs_latches                   number of gcs resource hash latches to be allocated per LMS process   64
_gcs_pkey_history              number of pkey remastering history                                    4000
_gcs_process_in_recovery       if TRUE, process gcs requests during instance recovery                TRUE
_gcs_resources                 number of gcs resources to be allocated
_gcs_shadow_locks              number of pcm shadow locks to be allocated
_gc_statistics                 if TRUE, kcl statistics are maintained                                TRUE
_gcs_testing                   GCS testing parameter                                                 0
_gc_tsn_undo_affinity          if TRUE, use TSN undo affinity                                        TRUE
_gc_undo_affinity              if TRUE, enable dynamic undo affinity                                 TRUE
_gc_undo_affinity_locks        if TRUE, get affinity locks for undo                                  TRUE
_gc_use_cr                     if TRUE, allow CR pins on PI and WRITING buffers                      TRUE
_gc_vector_read                if TRUE, vector read current buffers                                  TRUE

仔细对比数据库参数,发现undo异常

--节点1
SQL>  show parameter undo
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
undo_management                      string      MANUAL
undo_retention                       integer     900
undo_tablespace                      string      SYSTEM
--节点2
SQL>  show parameter undo
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
undo_management                      string      AUTO
undo_retention                       integer     900
undo_tablespace                      string      UNDOTBS1

这里已经明确,因为两个节点的undo_*相关参数配置不正确,导致数据库只能一个节点mount。进一步定位问题发现,原来是因为dba粗心在编辑节点1的参数文件的时候把undo_*相关的参数给弄丢了,从而数据库使用了默认值undo_management=manual,undo_tablespace=system

win平台rman备份和删除dg备库归档日志脚本

总觉得使用windows跑oracle是不靠谱的事情,可以这个世界上总有很多人喜欢做类似这样的事情,对于数据库比较常见的两件事情:rman和删除dg备库归档日志,在linux/unix平台上使用shell实现很简单,可是跑到win里面,就变的烦了,不是因为其麻烦,而是因为用的人少,不知道怎么下手处理该事情,我编写了简单的实现初级功能的win下面rman备份和删除备库归档日志脚本,供大家参考,也更加欢迎朋友提出来更加好的处理方法(win是真心的不懂)
rman备份脚本

--backup_oracle.bat文件
rman target / cmdfile=D:\backup\rman\backup_db.rman
log=d:/backup/rman/logfile/rmanlog%date:~0,4%%date:~5,2%%date:~8,2%.log
--backup_db.rman文件
CONFIGURE RETENTION POLICY TO REDUNDANCY = 2;
CONFIGURE DEVICE TYPE DISK PARALLELISM 2;
CONFIGURE DEFAULT DEVICE TYPE TO DISK;
backup filesperset = 5 as compressed backupset database format 'd:/backup/rman/full_%U.rman';
sql 'alter system archive log current';
backup  filesperset = 50 as compressed backupset archivelog  all format 'd:/backup/rman/arch_%U.rman'  delete input;
DELETE noprompt OBSOLETE;
crosscheck backup;
delete noprompt expired backup;
backup  format 'd:/backup/rman/ctl_%U.rman' current controlfile;
backup spfile format 'd:/backup/rman/spfile_%U.rman' ;
exit;

backup_oracle.bat文件加入到计划任务即可

删除dg备库归档日志(已经应用)

--delete_dg_archivelog.bat
rem 注意修改 部署目录
cd D:\win_xifenfei
d:
rem 注意delete_archive.sql 查询是否有记录
echo delete archivelog staring > delete_archivelog.bak
sqlplus / as sysdba @delete_archive.sql
echo rman target / cmdfile=rman_checkcross.rman>>delete_archivelog.bat
delete_archivelog.bat >>delete_dg_archivelog_%DATE:~0,4%%DATE:~5,2%%DATE:~8,2%".log
exit
--delete_archive.sql
set lines 150
col name for a150
set pagesize 0 feedback off verify off heading off echo off
spool delete_archivelog.bat
select 'del '||name from v$archived_log where APPLIED='YES' AND NAME IS NOT NULL and DEST_ID=1;
spool off
exit;
--rman_checkcross.rman
crosscheck archivelog all;
delete noprompt expired archivelog all;
exit

delete_dg_archivelog.bat加入到计划任务即可

db_block_checksum实质是通过flg_kcbh来控制block checksum

db_block_checksum 用于DBWn和direct loader数据块写入到磁盘时,基于块内的所有字节计算得出一个校验值并将其写入块头。在该参数设置为typical和full时,当读入时候重新计算校验和写出时候的校验对比,如果不同则认为是块损坏。如果设置为FULL模式,则基于update/delete应用程序语句级别的改变发生后,校验值会被重新计算并写入。同时对于日志块,在写入之前,同样会生产校验值并写入到块头。该参数主要是防止IO硬件和IO子系统的错误。
这里提示我们是在系统参数级别使用db_block_checksum来控制block是否进行验证,那在block本身级别,是否有类似的值来控制,实现对block值的checksum?通过dump结合bbed给出相关答案
db_block_checksum为TYPICAL测试

CDB_CDB$ROOT@SYS> show parameter db_block_checksum;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_block_checksum                    string      TYPICAL
CDB_CDB$ROOT@SYS> conn / as sysdba
已连接。
CDB_CDB$ROOT@SYS> alter session set container=pdb;
会话已更改。
CDB_CDB$ROOT@SYS> alter database open;
数据库已更改。
CDB_CDB$ROOT@SYS> alter session set current_schema=chf;
会话已更改。
CDB_CDB$ROOT@SYS> create table t_xifenfei(id number,name varchar2(100));
表已创建。
CDB_CDB$ROOT@SYS> insert into t_xifenfei values(1,'www.xifenfei.com');
已创建 1 行。
CDB_CDB$ROOT@SYS> alter system checkpoint;
系统已更改。
CDB_CDB$ROOT@SYS>  select dbms_rowid.rowid_relative_fno(rowid) file_no,
2  dbms_rowid.rowid_block_number(rowid) block_no  from t_xifenfei;
   FILE_NO   BLOCK_NO
---------- ----------
         9      19229
CDB_CDB$ROOT@SYS> SELECT NAME FROM V$DATAFILE WHERE FILE#=9;
NAME
---------------------------------------------------------------------
E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF

dump 该block

buffer tsn: 3 rdba: 0x02404b1d (9/19229)
scn: 0xb8c.3c232935 seq: 0x01 flg: 0x06 tail: 0x29350601
frmt: 0x02 chkval: 0xe08b type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
tab 0, row 0, @0x1f81
tl: 23 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [16]  77 77 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d
end_of_block_dump
End dump data blocks tsn: 3 file#: 9 minblk 19229 maxblk 19229

这里可以看到因为db_block_checksum=TYPICAL,因此插入记录的时候,dump block发现flg: 0x06 和 chkval: 0xe08b

使用bbed查看相关记录
补充:在win系统中,bbed查看block和block num相差1

BBED> p kcbh
struct kcbh, 20 bytes                       @0
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub1 spare1_kcbh                          @2        0x00
   ub1 spare2_kcbh                          @3        0x00
   ub4 rdba_kcbh                            @4        0x02404b1d
   ub4 bas_kcbh                             @8        0x3c232935
   ub2 wrp_kcbh                             @12       0x0b8c
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x06 (KCBHFDLC, KCBHFCKV)
   ub2 chkval_kcbh                          @16       0xe08b
   ub2 spare3_kcbh                          @18       0x0000

通过对比bbed和dump出来数据,可以得出flg: 0x06(flg_kcbh),chkval: 0xe08b(chkval_kcbh)

使用bbed修改block验证chkval_kcbh
提问:在使用bbed修改block的时候,一般都需要使用sum apply 处理下,为什么呢?

BBED> d /v
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19230   Offsets: 8165 to 8191  Dba:0x00000000
-------------------------------------------------------
 2c010202 c1021077 77772e78 6966656e l ,...?.www.xifen
 6665692e 636f6d01 063529            l fei.com..5)
 <16 bytes per line>
--没有修改任何值,sum的current和required值相同,而且和chkval_kcbh也相同
BBED> sum
Check value for File 0, Block 19230:
current = 0xe08b, required = 0xe08b
--尝试修改值
BBED> m /x 78 offset 8173
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19230            Offsets: 8173 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 78772e78 6966656e 6665692e 636f6d01 063529
 <32 bytes per line>
BBED> d /v offset 8165
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19230   Offsets: 8165 to 8191  Dba:0x00000000
-------------------------------------------------------
 2c010202 c1021077 78772e78 6966656e l ,...?.wxw.xifen
 6665692e 636f6d01 063529            l fei.com..5)
 <16 bytes per line>
--发现current<>required
BBED> sum
Check value for File 0, Block 19230:
current = 0xe08b, required = 0xef8b
--apply把current修改为required值
BBED> sum apply
Check value for File 0, Block 19230:
current = 0xef8b, required = 0xef8b
--发现chkval_kcbh也修改为了required值
BBED> p kcbh.chkval_kcbh
ub2 chkval_kcbh                             @16       0xef8b

通过这里可以发现,在flg_kcbh含(KCBHFCKV)的情况下,如果block发生改变,则运行sum apply之后chkval_kcbh也发生改变

继续dump block

buffer tsn: 3 rdba: 0x02404b1d (9/19229)
scn: 0xb8c.3c232935 seq: 0x01 flg: 0x06 tail: 0x29350601
frmt: 0x02 chkval: 0xef8b type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
tab 0, row 0, @0x1f81
tl: 23 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [16]  77 78 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d
end_of_block_dump
End dump data blocks tsn: 3 file#: 9 minblk 19229 maxblk 19229

证明上述结论正确:dump.flg=bbed.flg_kcbh,dump.chkval=bbed.chkval_kcbh

db_block_checksum为FALSE测试

CDB_CDB$ROOT@SYS> alter system set db_block_checksum=false;
系统已更改。
CDB_CDB$ROOT@SYS> drop table t_xifenfei_new purge;
表已删除。
CDB_CDB$ROOT@SYS> alter session set current_schema=chf;
会话已更改。
CDB_CDB$ROOT@SYS>  create table t_xifenfei_new(id number,name varchar2(100));
表已创建。
CDB_CDB$ROOT@SYS>  insert into t_xifenfei_new values(1,'www.orasos.com');
已创建 1 行。
CDB_CDB$ROOT@SYS> commit;
提交完成。
CDB_CDB$ROOT@SYS>  select dbms_rowid.rowid_relative_fno(rowid) file_no,
 2  dbms_rowid.rowid_block_number(rowid) block_no  from t_xifenfei_new;
   FILE_NO   BLOCK_NO
---------- ----------
         9      19237
CDB_CDB$ROOT@SYS> alter system checkpoint;
系统已更改。
CDB_CDB$ROOT@SYS> alter system dump datafile 9 block 19237;
系统已更改。

dump block分析

buffer tsn: 3 rdba: 0x02404b25 (9/19237)
scn: 0xb8c.3c23c8b2 seq: 0x01 flg: 0x02 tail: 0xc8b20601
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
tab 0, row 0, @0x1f83
tl: 21 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [14]  77 77 77 2e 6f 72 61 73 6f 73 2e 63 6f 6d
end_of_block_dump
End dump data blocks tsn: 3 file#: 9 minblk 19237 maxblk 19237

这里可以发现当设置db_block_checksum=false之时,插入数据,显示flg: 0x02,chkval: 0x0000,由此猜测无对block写入进行部分验证(tailchk依然验证)

使用bbed修改block

BBED> set filename 'E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF'
        FILENAME        E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF
BBED> set blocksize 19238
BBED-00108: illegal BLOCKSIZE (19238) specified
BBED> set blocksize 8192
        BLOCKSIZE       8192
BBED> set block 8192
        BLOCK#          8192
BBED> set blocksize 19238
BBED-00108: illegal BLOCKSIZE (19238) specified
BBED> set block 19238
        BLOCK#          19238
BBED> p kcbh
struct kcbh, 20 bytes                       @0
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub1 spare1_kcbh                          @2        0x00
   ub1 spare2_kcbh                          @3        0x00
   ub4 rdba_kcbh                            @4        0x02404b25
   ub4 bas_kcbh                             @8        0x3c23c8b2
   ub2 wrp_kcbh                             @12       0x0b8c
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x02 (KCBHFDLC)
   ub2 chkval_kcbh                          @16       0x0000
   ub2 spare3_kcbh                          @18       0x0000
--这里看到flg_kcbh与chkval_kcbh和dump结果一致
BBED> sum
Check value for File 0, Block 19238:
current = 0x0000, required = 0x0000
--required为0,表示不验证
BBED> d /v offset 8174
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238   Offsets: 8174 to 8191  Dba:0x00000000
-------------------------------------------------------
 7777772e 6f726173 6f732e63 6f6d0106 l www.orasos.com..
 b2c8                                l 踩
 <16 bytes per line>
BBED> set mode edit
        MODE            Edit
BBED> m /x 78
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 8174 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 7877772e 6f726173 6f732e63 6f6d0106 b2c8
 <32 bytes per line>
BBED> d /v
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238   Offsets: 8174 to 8191  Dba:0x00000000
-------------------------------------------------------
 7877772e 6f726173 6f732e63 6f6d0106 l xww.orasos.com..
 b2c8                                l 踩
 <16 bytes per line>
--修改了block,但是sum依然提示required为0
BBED> sum
Check value for File 0, Block 19238:
current = 0x0000, required = 0x0000
BBED> sum apply
Check value for File 0, Block 19238:
current = 0x0000, required = 0x0000

dump block核对

buffer tsn: 3 rdba: 0x02404b25 (9/19237)
scn: 0xb8c.3c23c8b2 seq: 0x01 flg: 0x02 tail: 0xc8b20601
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
tab 0, row 0, @0x1f83
tl: 21 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [14]  78 77 77 2e 6f 72 61 73 6f 73 2e 63 6f 6d
end_of_block_dump
End dump data blocks tsn: 3 file#: 9 minblk 19237 maxblk 19237

修改flg_kcbh测试

BBED> m /x 06 offset 15
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets:   15 to   30           Dba:0x00000000
------------------------------------------------------------------------
 06000000 00010000 00d86b01 00aec823
 <32 bytes per line>
BBED> sum
Check value for File 0, Block 19238:
current = 0x0000, required = 0x04b3
--修改flg_kcbh=x06后,sum中的required出现了非0值,表示已经启动了block完整性检测
BBED> m /x 79 offset 1876
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 1876 to 1891           Dba:0x00000000
------------------------------------------------------------------------
 79391b01 0000c24e 07000205 c406573f
 <32 bytes per line>
BBED> sum
Check value for File 0, Block 19238:
current = 0x0000, required = 0x049d
--修改了chkval_kcbh值
BBED> sum apply
Check value for File 0, Block 19238:
current = 0x049d, required = 0x049d

再次检查dump block

buffer tsn: 3 rdba: 0x02404b25 (9/19237)
scn: 0xb8c.3c23c8b2 seq: 0x01 flg: 0x06 tail: 0xc8b20601
frmt: 0x02 chkval: 0x049d type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
tl: 21 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [14]  78 77 77 2e 6f 72 61 73 6f 73 2e 63 6f 6d
end_of_block_dump
End dump data blocks tsn: 3 file#: 9 minblk 19237 maxblk 19237

证明修改flg_kcbh后,block的sum验证起效

完整测试

--flg_kcbh=2,chkval_kcbh为04b3,修改block观察变化
BBED> sum
Check value for File 0, Block 19238:
current = 0x04b3, required = 0x04b3
BBED> p kcbh
struct kcbh, 20 bytes                       @0
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub1 spare1_kcbh                          @2        0x00
   ub1 spare2_kcbh                          @3        0x00
   ub4 rdba_kcbh                            @4        0x02404b25
   ub4 bas_kcbh                             @8        0x3c23c8b2
   ub2 wrp_kcbh                             @12       0x0b8c
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x02 (KCBHFDLC)
   ub2 chkval_kcbh                          @16       0x04b3
   ub2 spare3_kcbh                          @18       0x0000
BBED> m /x 11 offset 7184
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 7184 to 7199           Dba:0x00000000
------------------------------------------------------------------------
 110000c2 39b50002 05c40721 07280000
 <32 bytes per line>
BBED> undo
BBED> modify /x 00 filename 'E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF' block 19238. offset 7184.
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 7184 to 7199           Dba:0x00000000
------------------------------------------------------------------------
 000000c2 39b50002 05c40721 07280000
 <32 bytes per line>
BBED> m /x 11 offset 8174
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 8174 to 8189           Dba:0x00000000
------------------------------------------------------------------------
 1177772e 6f726173 6f732e63 6f6d0106
 <32 bytes per line>
BBED> sum
Check value for File 0, Block 19238:
current = 0x04b3, required = 0x04b3
--证明当flg_kcbh=2修改block之后chkval_kcbh依然为04b3,证明flg_kcbh不含(KCBHFCKV),
--修改block不会导致chkval_kcbh改变,也就是说,该block为启用db_block_checksum
--flg_kcbh=6的时候验证修改block导致的chkval_kcbh变化
BBED> m /x 06 offset 15
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets:   15 to   30           Dba:0x00000000
------------------------------------------------------------------------
 06b30400 00010000 00d86b01 00aec823
 <32 bytes per line>
--刚刚修改flg_kcbh=6,马上看到required非0
BBED> sum
Check value for File 0, Block 19238:
current = 0x04b3, required = 0x04da
BBED> sum apply
Check value for File 0, Block 19238:
current = 0x04da, required = 0x04da
--尝试修改block
BBED> m /x 22 offset 8174
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 8174 to 8189           Dba:0x00000000
------------------------------------------------------------------------
 2277772e 6f726173 6f732e63 6f6d0106
 <32 bytes per line>
--required发生改变
BBED> sum
Check value for File 0, Block 19238:
current = 0x04da, required = 0x04e9
BBED> sum apply
Check value for File 0, Block 19238:
current = 0x04e9, required = 0x04e9
--修改为以前值,验证required
BBED> m /x 11 offset 8174
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 8174 to 8189           Dba:0x00000000
------------------------------------------------------------------------
 1177772e 6f726173 6f732e63 6f6d0106
 <32 bytes per line>
BBED> sum
Check value for File 0, Block 19238:
current = 0x04e9, required = 0x04da
BBED> sum apply
Check value for File 0, Block 19238:
current = 0x04da, required = 0x04da
--发现当block修改回来后,required值和以前一致(也就是说chkval_kcbh值还原)
--进步一说明chkval_kcbh取决于block内部值

通过相关测试db_block_checksum是在实例级别启动block checksum,但是具体到每个block是通过flg_kcbh来控制,而具体体现是在chkval_kcbh值上

使用UltraEdit修改oracle二进制文件

对数据库OPEN的过程做10046,因为数据库没有open,那写着语句是写在什么地方的呢?通过使用UltraEdit打开$ORACLE_HOME/bin/oracle文件,可以发现很多sql语句,而且与open过程的10046很多一致.

044acdc0h: 73 65 6C 65 63 74 20 75 2E 6E 61 6D 65 2C 20 6F ; select u.name, o
044acdd0h: 32 2E 6E 61 6D 65 2C 20 6F 32 2E 6F 62 6A 23 20 ; 2.name, o2.obj#
044acde0h: 66 72 6F 6D 20 69 6E 64 24 20 69 2C 20 6F 62 6A ; from ind$ i, obj
044acdf0h: 24 20 6F 31 2C 20 6F 62 6A 24 20 6F 32 2C 20 75 ; $ o1, obj$ o2, u
044ace00h: 73 65 72 24 20 75 20 77 68 65 72 65 20 6F 31 2E ; ser$ u where o1.
044ace10h: 6F 77 6E 65 72 23 20 3D 20 3A 31 20 20 61 6E 64 ; owner# = :1  and
044ace20h: 20 6F 31 2E 74 79 70 65 23 20 3D 20 32 20 61 6E ;  o1.type# = 2 an
044ace30h: 64 20 69 2E 74 79 70 65 23 20 3D 20 39 20 61 6E ; d i.type# = 9 an
044ace40h: 64 20 69 2E 62 6F 23 20 3D 20 6F 31 2E 6F 62 6A ; d i.bo# = o1.obj
044ace50h: 23 20 20 61 6E 64 20 69 2E 6F 62 6A 23 20 3D 20 ; #  and i.obj# =
044ace60h: 6F 32 2E 6F 62 6A 23 20 61 6E 64 20 6F 32 2E 6F ; o2.obj# and o2.o
044ace70h: 77 6E 65 72 23 20 3D 20 75 2E 75 73 65 72 23 20 ; wner# = u.user#
044ace80h: 61 6E 64 20 69 2E 74 79 70 65 23 20 3D 20 39 20 ; and i.type# = 9
044ace90h: 61 6E 64 20 6F 31 2E 6F 77 6E 65 72 23 20 21 3D ; and o1.owner# !=
044acea0h: 20 6F 32 2E 6F 77 6E 65 72 23 00 00 00 00 00 00 ;  o2.owner#......
044aceb0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
044acec0h: 73 65 6C 65 63 74 20 75 2E 6E 61 6D 65 2C 20 6F ; select u.name, o
044aced0h: 2E 6E 61 6D 65 2C 20 6F 2E 6F 62 6A 23 20 66 72 ; .name, o.obj# fr
044acee0h: 6F 6D 20 6F 62 6A 24 20 6F 2C 20 75 73 65 72 24 ; om obj$ o, user$
044acef0h: 20 75 2C 20 69 6E 64 24 20 69 20 77 68 65 72 65 ;  u, ind$ i where
044acf00h: 20 6F 2E 6F 77 6E 65 72 23 3D 3A 31 20 61 6E 64 ;  o.owner#=:1 and
044acf10h: 20 6F 2E 6F 77 6E 65 72 23 3D 75 2E 75 73 65 72 ;  o.owner#=u.user
044acf20h: 23 20 61 6E 64 20 6F 2E 6F 62 6A 23 3D 69 2E 6F ; # and o.obj#=i.o
044acf30h: 62 6A 23 20 61 6E 64 20 69 2E 74 79 70 65 23 3D ; bj# and i.type#=
044acf40h: 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; 9...............
044acf50h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
044acf60h: 73 65 6C 65 63 74 20 76 6E 61 6D 65 20 66 72 6F ; select vname fro
044acf70h: 6D 20 73 79 73 2E 73 6E 61 70 24 20 77 68 65 72 ; m sys.snap$ wher
044acf80h: 65 20 73 6F 77 6E 65 72 20 3D 20 3A 31 20 61 6E ; e sowner = :1 an
044acf90h: 64 20 69 6E 73 74 73 69 74 65 20 3D 20 30 20 20 ; d instsite = 0
044acfa0h: 61 6E 64 20 28 62 69 74 61 6E 64 28 66 6C 61 67 ; and (bitand(flag
044acfb0h: 2C 20 32 36 38 34 33 35 34 35 36 29 20 3D 20 30 ; , 268435456) = 0
044acfc0h: 20 6F 72 20 62 69 74 61 6E 64 28 6F 62 6A 66 6C ;  or bitand(objfl
044acfd0h: 61 67 2C 20 33 32 29 20 3D 20 30 29 00 00 00 00 ; ag, 32) = 0)....

既然数据库OPEN的过程很多sql是在oracle二进制文件中写的,那是否可以考虑通过修改二进制文件中的sql语句来改变执行计划,甚至可以尝试修改这些语句使得异常的数据库能够正常open.这里演示通过修改sql语句来展示改变执行计划(不使用index,而改用全表扫描)
10046 捕获语句和执行计划

PARSING IN CURSOR #2 len=116 dep=2 uid=0 oct=3 lid=0 tim=1345475346332403 hv=854877822 ad='2f2be060'
select o.owner#,o.name,o.namespace,o.remoteowner,o.linkname,o.subname,o.dataobj#,o.flags from obj$ o where o.obj#=:1
END OF STMT
PARSE #2:c=1000,e=1258,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,tim=1345475346332394
BINDS #2:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=b69ef000  bln=22  avl=03  flg=05
  value=893
EXEC #2:c=2000,e=1382,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,tim=1345475346333978
WAIT #2: nam='db file sequential read' ela= 17820 file#=1 block#=220 blocks=1 obj#=-1 tim=1345475346351927
FETCH #2:c=1000,e=18054,p=1,cr=3,cu=0,mis=0,r=1,dep=2,og=4,tim=1345475346352100
STAT #2 id=1 cnt=1 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ$ (cr=3 pr=1 pw=0 time=18047 us)'
STAT #2 id=2 cnt=1 pid=1 pos=1 obj=36 op='INDEX UNIQUE SCAN I_OBJ1 (cr=2 pr=1 pw=0 time=18014 us)'

使用UE查看该sql在二进制文件内容

0459c650h: 3A 35 2C 3A 36 2C 3A 37 29 00 00 00 00 00 00 00 ; :5,:6,:7).......
0459c660h: 73 65 6C 65 63 74 20 6F 2E 6F 77 6E 65 72 23 2C ; select o.owner#,
0459c670h: 6F 2E 6E 61 6D 65 2C 6F 2E 6E 61 6D 65 73 70 61 ; o.name,o.namespa
0459c680h: 63 65 2C 6F 2E 72 65 6D 6F 74 65 6F 77 6E 65 72 ; ce,o.remoteowner
0459c690h: 2C 6F 2E 6C 69 6E 6B 6E 61 6D 65 2C 6F 2E 73 75 ; ,o.linkname,o.su
0459c6a0h: 62 6E 61 6D 65 2C 6F 2E 64 61 74 61 6F 62 6A 23 ; bname,o.dataobj#
0459c6b0h: 2C 6F 2E 66 6C 61 67 73 20 66 72 6F 6D 20 6F 62 ; ,o.flags from ob
0459c6c0h: 6A 24 20 6F 20 77 68 65 72 65 20 6F 2E 6F 62 6A ; j$ o where o.obj
0459c6d0h: 23 3D 3A 31 00 00 00 00 00 00 00 00 00 00 00 00 ; #=:1............
0459c6e0h: 73 65 6C 65 63 74 20 74 79 70 65 23 2C 63 6F 6E ; select type#,con

修改二进制文件中sql语句

0459c650h: 3A 35 2C 3A 36 2C 3A 37 29 00 00 00 00 00 00 00 ; :5,:6,:7).......
0459c660h: 73 65 6C 65 63 74 20 6F 2E 6F 77 6E 65 72 23 2C ; select o.owner#,
0459c670h: 6F 2E 6E 61 6D 65 2C 6F 2E 6E 61 6D 65 73 70 61 ; o.name,o.namespa
0459c680h: 63 65 2C 6F 2E 72 65 6D 6F 74 65 6F 77 6E 65 72 ; ce,o.remoteowner
0459c690h: 2C 6F 2E 6C 69 6E 6B 6E 61 6D 65 2C 6F 2E 73 75 ; ,o.linkname,o.su
0459c6a0h: 62 6E 61 6D 65 2C 6F 2E 64 61 74 61 6F 62 6A 23 ; bname,o.dataobj#
0459c6b0h: 2C 6F 2E 66 6C 61 67 73 20 66 72 6F 6D 20 6F 62 ; ,o.flags from ob
0459c6c0h: 6A 24 20 6F 20 77 68 65 72 65 20 6F 62 6A 23 2B ; j$ o where obj#+
0459c6d0h: 30 3D 3A 31 00 00 00 00 00 00 00 00 00 00 00 00 ; 0=:1............
0459c6e0h: 73 65 6C 65 63 74 20 74 79 70 65 23 2C 63 6F 6E ; select type#,con

这里通过对obj#加上一个常量0,使得该sql在执行之时不会使用obj$.obj#上的index,从而使用全表扫描,来实现我们修改执行计划,屏蔽该index的目的
修改后的10046观察

PARSING IN CURSOR #2 len=116 dep=2 uid=0 oct=3 lid=0 tim=1345475781593851 hv=493726595 ad='2f2ba76c'
select o.owner#,o.name,o.namespace,o.remoteowner,o.linkname,o.subname,o.dataobj#,o.flags from obj$ o where obj#+0=:1
END OF STMT
PARSE #2:c=1000,e=1095,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,tim=1345475781593840
BINDS #2:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=b6a86000  bln=22  avl=03  flg=05
  value=893
EXEC #2:c=2999,e=1603,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,tim=1345475781595754
WAIT #2: nam='db file sequential read' ela= 18417 file#=1 block#=121 blocks=1 obj#=-1 tim=1345475781614344
WAIT #2: nam='db file sequential read' ela= 1000 file#=1 block#=123 blocks=1 obj#=-1 tim=1345475781615726
WAIT #2: nam='db file sequential read' ela= 1462 file#=1 block#=124 blocks=1 obj#=-1 tim=1345475781617720
WAIT #2: nam='db file sequential read' ela= 492 file#=1 block#=125 blocks=1 obj#=-1 tim=1345475781618452
WAIT #2: nam='db file sequential read' ela= 1358 file#=1 block#=126 blocks=1 obj#=-1 tim=1345475781620361
WAIT #2: nam='db file sequential read' ela= 500 file#=1 block#=127 blocks=1 obj#=-1 tim=1345475781621039
WAIT #2: nam='db file sequential read' ela= 63132 file#=1 block#=128 blocks=1 obj#=-1 tim=1345475781684316
WAIT #2: nam='db file sequential read' ela= 989 file#=1 block#=5233 blocks=1 obj#=-1 tim=1345475781685559
WAIT #2: nam='db file sequential read' ela= 792 file#=1 block#=5234 blocks=1 obj#=-1 tim=1345475781686583
FETCH #2:c=7999,e=90994,p=9,cr=13,cu=0,mis=0,r=1,dep=2,og=4,tim=1345475781686865
STAT #2 id=1 cnt=1 pid=0 pos=1 obj=18 op='TABLE ACCESS FULL OBJ$ (cr=13 pr=9 pw=0 time=90977 us)'

很明显,到这里我们通过UE修改oracle二进制文件,实现了sql语句执行计划的更改.
免责申明:本方法仅供在非常特殊或者测试环境下使用,一般生产环境请勿模仿,否则后果自负

误杀进程导致rac hang住

有客户反馈系统hang住,不能归档,需要我们紧急介入分析
节点1日志
出现redo不能归档,redo日志都已经被写满,人工执行了ALTER SYSTEM ARCHIVE LOG CURRENT,数据库就开始把redo全部归档,但是后面产生的redo又不能归档,当redo全部写满之后,数据库有出现大量log file switch (archiving needed)等待

Tue Sep 24 22:05:37 2013
Thread 1 advanced to log sequence 47282 (LGWR switch)
  Current log# 6 seq# 47282 mem# 0: +DATA/q9db/onlinelog/group_6.1244.818697409
Tue Sep 24 22:07:31 2013
ORACLE Instance q9db1 - Can not allocate log, archival required
Thread 1 cannot allocate new log, sequence 47283
All online logs needed archiving
  Current log# 6 seq# 47282 mem# 0: +DATA/q9db/onlinelog/group_6.1244.818697409
Tue Sep 24 22:28:17 2013
ALTER SYSTEM ARCHIVE LOG
Archived Log entry 259646 added for thread 1 sequence 47266 ID 0x354620c2 dest 1:
Tue Sep 24 22:28:18 2013
Thread 1 advanced to log sequence 47283 (LGWR switch)
  Current log# 7 seq# 47283 mem# 0: +DATA/q9db/onlinelog/group_7.1243.818697415
Archived Log entry 259647 added for thread 1 sequence 47267 ID 0x354620c2 dest 1:
Archived Log entry 259648 added for thread 1 sequence 47268 ID 0x354620c2 dest 1:
Archived Log entry 259649 added for thread 1 sequence 47269 ID 0x354620c2 dest 1:
Archived Log entry 259650 added for thread 1 sequence 47270 ID 0x354620c2 dest 1:
Archived Log entry 259651 added for thread 1 sequence 47271 ID 0x354620c2 dest 1:
Archived Log entry 259652 added for thread 1 sequence 47272 ID 0x354620c2 dest 1:
Tue Sep 24 22:28:28 2013
Archived Log entry 259653 added for thread 1 sequence 47273 ID 0x354620c2 dest 1:
Archived Log entry 259654 added for thread 1 sequence 47274 ID 0x354620c2 dest 1:
Archived Log entry 259655 added for thread 1 sequence 47275 ID 0x354620c2 dest 1:
Archived Log entry 259656 added for thread 1 sequence 47276 ID 0x354620c2 dest 1:
Archived Log entry 259657 added for thread 1 sequence 47277 ID 0x354620c2 dest 1:
Archived Log entry 259658 added for thread 1 sequence 47278 ID 0x354620c2 dest 1:
Archived Log entry 259659 added for thread 1 sequence 47279 ID 0x354620c2 dest 1:
Tue Sep 24 22:28:39 2013
Archived Log entry 259660 added for thread 1 sequence 47280 ID 0x354620c2 dest 1:
Archived Log entry 259661 added for thread 1 sequence 47281 ID 0x354620c2 dest 1:
Archived Log entry 259662 added for thread 1 sequence 47282 ID 0x354620c2 dest 1:
Tue Sep 24 22:29:39 2013
Thread 1 advanced to log sequence 47284 (LGWR switch)
  Current log# 8 seq# 47284 mem# 0: +DATA/q9db/onlinelog/group_8.1242.818697417
Tue Sep 24 22:31:18 2013
Thread 1 advanced to log sequence 47285 (LGWR switch)
  Current log# 16 seq# 47285 mem# 0: +DATA/q9db/onlinelog/group_16.1884.827003545
Thread 1 advanced to log sequence 47286 (LGWR switch)
  Current log# 17 seq# 47286 mem# 0: +DATA/q9db/onlinelog/group_17.1885.827003587

节点2日志
节点2中出现大量的IPC Send timeout

Tue Sep 24 15:22:19 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
…………
Tue Sep 24 18:51:55 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
Tue Sep 24 18:57:54 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
Receiver: inst 1 binc 464003926 ospid 1566
Tue Sep 24 19:03:57 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
Receiver: inst 1 binc 464003926 ospid 1566
Tue Sep 24 19:09:53 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
…………
Tue Sep 24 20:22:00 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]

节点1因为不能归档hang住,节点2紧接着也就hang住。对节点1hang住之时对两个节点分别做systemstate dump,使用ass进行分析得到节点1和节点2的记录大体如下:
节点1

393:waiting for 'log file switch (archiving needed)'
394:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
395:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
397:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
398:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
451:waiting for 'SQL*Net message from client'
469:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
470:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
471:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
618:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
626:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
NO BLOCKING PROCESSES FOUND

节点2

515:waiting for 'gc buffer busy acquire'
     Cmd: Insert
516:waiting for 'gc buffer busy acquire'
     Cmd: Insert
517:waiting for 'gc buffer busy acquire'
     Cmd: Insert
518:waiting for 'gc buffer busy acquire'
     Cmd: Insert
519:waiting for 'gc buffer busy acquire'
     Cmd: Insert
520:waiting for 'gc buffer busy acquire'
     Cmd: Select
521:waiting for 'gc current request'
     Cmd: Insert
522:waiting for 'enq: TX - row lock contention'[Enq TX-00BA0020-001E3E3C]
     Cmd: Select
523:waiting for 'gc buffer busy acquire'
     Cmd: Insert
524:waiting for 'SQL*Net message from client'
525:waiting for 'gc buffer busy acquire'
     Cmd: Insert
526:waiting for 'gc buffer busy acquire'
     Cmd: Insert
527:waiting for 'enq: TX - row lock contention'[Enq TX-00BA0020-001E3E3C]
     Cmd: Select
528:waiting for 'SQL*Net message from client'
529:waiting for 'gc buffer busy acquire'
     Cmd: Select
                    Resource Holder State
    Enq TX-0005001E-0022374F   223: waiting for 'gc current request'
    Enq TX-0047001B-002BCEB2   247: waiting for 'gc current request'
    Enq TX-015B001E-000041FF   330: waiting for 'gc current request'
    Enq TX-00010010-002EA7CD   179: waiting for 'gc current request'
    Enq TX-00BA0020-001E3E3C    ??? Blocker
Object Names
~~~~~~~~~~~~
Enq TX-0005001E-0022374F
Enq TX-0047001B-002BCEB2
Enq TX-015B001E-000041FF
Enq TX-00010010-002EA7CD
Enq TX-00BA0020-001E3E3C

通过这里,我们可以明白,节点2的很多事务hang住是因为请求gc current request,而该等待是因为节点1无法归档,有些block无法正常传输到节点2,导致节点2一直hang在这里,然后就出现IPC Send timeout;节点1上的事务阻塞甚至hang住是因为无法归档导致.到此需要定位的问题是为什么节点1不能归档
继续分析节点1 alert日志

Tue Sep 24 15:18:20 2013
opidrv aborting process O000 ospid (7332) as a result of ORA-28
Immediate Kill Session#: 1904, Serial#: 1065
Immediate Kill Session: sess: 0x24a2522a38  OS pid: 7338
Immediate Kill Session#: 3597, Serial#: 11107
Immediate Kill Session: sess: 0x24c27cf498  OS pid: 7320
Tue Sep 24 15:18:23 2013
opidrv aborting process W000 ospid (7980) as a result of ORA-28
Tue Sep 24 15:18:23 2013
opidrv aborting process W001 ospid (8560) as a result of ORA-28
Tue Sep 24 15:18:35 2013
LGWR: Detected ARCH process failure
LGWR: Detected ARCH process failure
LGWR: Detected ARCH process failure
LGWR: Detected ARCH process failure
LGWR: STARTING ARCH PROCESSES
Tue Sep 24 15:18:35 2013
ARC0 started with pid=66, OS id=10793
Tue Sep 24 15:18:35 2013
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db1/trace/q9db1_nsa2_12635.trc:
ORA-00028: your session has been killed
LNS: Failed to archive log 8 thread 1 sequence 47156 (28)
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
Thread 1 advanced to log sequence 47157 (LGWR switch)
ARC0: STARTING ARCH PROCESSES
  Current log# 9 seq# 47157 mem# 0: +DATA/q9db/onlinelog/group_9.1241.818697421
Tue Sep 24 15:18:36 2013
ARC1 started with pid=81, OS id=10805
Tue Sep 24 15:18:36 2013
ARC2 started with pid=84, OS id=10807
Tue Sep 24 15:18:36 2013
ARC3 started with pid=87, OS id=10809
ARC1: Archival started
ARC2: Archival started
ARC2: Becoming the 'no FAL' ARCH
ARC2: Becoming the 'no SRL' ARCH
ARC1: Becoming the heartbeat ARCH
Error 1031 received logging on to the standby
PING[ARC1]: Heartbeat failed to connect to standby 'q9adgdg'. Error is 1031.
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Archived Log entry 259135 added for thread 1 sequence 47156 ID 0x354620c2 dest 1:
Error 1031 received logging on to the standby
FAL[server, ARC3]: Error 1031 creating remote archivelog file 'q9adgdg'
FAL[server, ARC3]: FAL archive failed, see trace file.
ARCH: FAL archive failed. Archiver continuing
ORACLE Instance q9db1 - Archival Error. Archiver continuing.
Tue Sep 24 15:18:46 2013
opidrv aborting process O001 ospid (9605) as a result of ORA-28
Tue Sep 24 15:18:46 2013
opidrv aborting process O000 ospid (10813) as a result of ORA-28
Tue Sep 24 15:18:46 2013
Immediate Kill Session#: 2909, Serial#: 369
Immediate Kill Session: sess: 0x24226c7200  OS pid: 9091
Immediate Kill Session#: 3380, Serial#: 30271
Immediate Kill Session: sess: 0x2422782c58  OS pid: 10265
Immediate Kill Session#: 3597, Serial#: 11109
Immediate Kill Session: sess: 0x24c27cf498  OS pid: 10267
Tue Sep 24 15:20:14 2013
Restarting dead background process DIAG
Tue Sep 24 15:20:14 2013
DIAG started with pid=64, OS id=20568
Restarting dead background process PING
Tue Sep 24 15:20:14 2013
PING started with pid=68, OS id=20570
Restarting dead background process LMHB
Tue Sep 24 15:20:14 2013
LMHB started with pid=70, OS id=20572
Restarting dead background process SMCO
…………
Tue Sep 24 15:23:13 2013
ARC0: Detected ARCH process failure
Tue Sep 24 15:23:13 2013
Thread 1 advanced to log sequence 47158 (LGWR switch)
  Current log# 10 seq# 47158 mem# 0: +DATA/q9db/onlinelog/group_10.1240.818697423
ARC0: STARTING ARCH PROCESSES
ARC0: STARTING ARCH PROCESSES COMPLETE
ARC0: Becoming the heartbeat ARCH
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance q9db1 - Archival Error
ORA-00028: your session has been killed

查看ARCn进程

[oracle@q9db01 ~]$ ps -ef|grep ora_ar
oracle   20718 12870  0 22:07 pts/14   00:00:00 grep ora_ar
[oracle@q9db01 ~]$ ps -ef|grep ora_ar
oracle   25998 12870  0 22:07 pts/14   00:00:00 grep ora_ar

这里基本上明白了,因为客户的系统从15:15开始由于中间件程序异常,导致大量会话连接数据库,然后dba为了防止其他业务不受影响,然后开始大量通过alter system kill session,误杀了不少系统进程,包括ARCn(0,1,2,3)进程,在后面ARCn进程因为某种原因无法正常启动,导致redo无法归档,所有的redo组写满系统即hang住,该系统由于大量kill session已经导致了实例本身异常(正常情况ARCn进程kill之后会自动重启),处理方案:先增加redo组配合定时人工归档,等待业务低峰重启节点1,解决问题。温馨提醒:kill进程请小心

ALERT: Disable Transparent HugePages on SLES11, RHEL6, OEL6 and UEK2 Kernels

随着去ioe的潮流越来越汹涌,随着内存越来越便宜,使用Liunx大内存跑核心业务的系统越来越多。因为大内存,多会话,使得配置HugePages也成了必然的选择,在以前的文章中介绍过怎么去配置HugePages(Liunx系统中Oracle使用HugePages配置)以及因为没有配置Hugepage导致的故障(因未配置Hugepage会话数添增悲剧案例).但是随着Linux 6(RedHat 6, OEL 6, SLES 11 and UEK2 kernels等)的广泛应用,他们所在HugePages方面所具有的新特性(Transparent HugePages)很多人可能还没有注意到,Oracle强烈建议无论是rac还是单实例都关闭该特性.
检查是否启用Transparent HugePages方法

To check if the Transparent HugePages are enabled in your server execute the following:
Default/Enabled setting is  [always]:
# cat /sys/kernel/mm/transparent_hugepage/enabled
[always] never
 Disabled setting is [never]:
# cat /sys/kernel/mm/transparent_hugepage/enabled
always [never]
If "enabled" is NOT set to "[never]", the Transparent HugePages are being used.
You can also issue:
# grep AnonHugePages /proc/meminfo
If the output contains a line like "AnonHugepages: xxxx kB", with a value > 0kB the kernel is using Transparent HugePages.

禁用Transparent HugePages方法

Add the following to the kernel boot line in /etc/grub.conf (this is the preferred method) and reboot the server:
transparent_hugepage=never
Once modified the line will read similar to the following example:
title Oracle Linux Server (2.6.32-300.25.1.el6uek.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-300.25.1.el6uek.x86_64 ro root=LABEL=/ transparent_hugepage=never
        initrd /initramfs-2.6.32-300.25.1.el6uek.x86_64.img
OR
Add the following lines in /etc/rc.local and reboot the server:
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
   echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
   echo never > /sys/kernel/mm/transparent_hugepage/defrag
fi

具体见文档:ALERT Disable Transparent HugePages on SLES11 RHEL6 OEL6 and UEK2 Kernels(Doc ID 1557478.1)

oerr ora code本质探讨

oerr ora code 命令

[oracle@q9-gg02 ~]$ oerr ora 10047
10047, 00000, "trace switching of sessions"
// *Cause:
// *Action:

unix/linux系统下$ORACLE_HOME/rdbms/mesg/oraus.msg包含ora-记录

[oracle@q9-gg02 ~]$ vi $ORACLE_HOME/rdbms/mesg/oraus.msg
…………
10047, 00000, "trace switching of sessions"
// *Cause:
// *Action:
10048, 00000, "Undo segment shrink"
// *Cause:
// *Action:
10049, 00000, "protect library cache memory heaps"
// *Cause:
// *Action: Use the OS memory protection (if available) to protect library
//          cache memory heaps that are pinned.

oerr ora code本质

[oracle@q9-gg02 ~]$ strace oerr ora 10047
execve("/u01/oracle/app/product/10.2/db_1/bin/oerr", ["oerr", "ora", "10047"], [/* 34 vars */]) = 0
brk(0)                                  = 0x1c239000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b60db3fe000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b60db3ff000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/u01/oracle/app/product/10.2/db_1/lib/tls/x86_64/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/u01/oracle/app/product/10.2/db_1/lib/tls/x86_64", 0x7fff7a3cb830) = -1 ENOENT (No such file or directory)
open("/u01/oracle/app/product/10.2/db_1/lib/tls/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/u01/oracle/app/product/10.2/db_1/lib/tls", 0x7fff7a3cb830) = -1 ENOENT (No such file or directory)
open("/u01/oracle/app/product/10.2/db_1/lib/x86_64/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/u01/oracle/app/product/10.2/db_1/lib/x86_64", 0x7fff7a3cb830) = -1 ENOENT (No such file or directory)
open("/u01/oracle/app/product/10.2/db_1/lib/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/u01/oracle/app/product/10.2/db_1/lib", {st_mode=S_IFDIR|0750, st_size=12288, ...}) = 0
open("tls/x86_64/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("tls/libtermcap.so.2", O_RDONLY)   = -1 ENOENT (No such file or directory)
open("x86_64/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("libtermcap.so.2", O_RDONLY)       = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=154281, ...}) = 0
mmap(NULL, 154281, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b60db400000
close(3)                                = 0
open("/lib64/libtermcap.so.2", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\20 \214>\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=15840, ...}) = 0
mmap(0x3e8c200000, 2108944, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3e8c200000
mprotect(0x3e8c203000, 2093056, PROT_NONE) = 0
mmap(0x3e8c402000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x3e8c402000
close(3)                                = 0
open("/u01/oracle/app/product/10.2/db_1/lib/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("tls/x86_64/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("tls/libdl.so.2", O_RDONLY)        = -1 ENOENT (No such file or directory)
open("x86_64/libdl.so.2", O_RDONLY)     = -1 ENOENT (No such file or directory)
open("libdl.so.2", O_RDONLY)            = -1 ENOENT (No such file or directory)
open("/lib64/libdl.so.2", O_RDONLY)     = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\16`\214>\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=23360, ...}) = 0
mmap(0x3e8c600000, 2109696, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3e8c600000
mprotect(0x3e8c602000, 2097152, PROT_NONE) = 0
mmap(0x3e8c802000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x3e8c802000
close(3)                                = 0
open("/u01/oracle/app/product/10.2/db_1/lib/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("tls/x86_64/libc.so.6", O_RDONLY)  = -1 ENOENT (No such file or directory)
open("tls/libc.so.6", O_RDONLY)         = -1 ENOENT (No such file or directory)
open("x86_64/libc.so.6", O_RDONLY)      = -1 ENOENT (No such file or directory)
open("libc.so.6", O_RDONLY)             = -1 ENOENT (No such file or directory)
open("/lib64/libc.so.6", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\332\341\213>\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1722328, ...}) = 0
mmap(0x3e8be00000, 3502424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3e8be00000
mprotect(0x3e8bf4f000, 2093056, PROT_NONE) = 0
mmap(0x3e8c14e000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14e000) = 0x3e8c14e000
mmap(0x3e8c153000, 16728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3e8c153000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b60db426000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b60db427000
arch_prctl(ARCH_SET_FS, 0x2b60db426fe0) = 0
mprotect(0x3e8c802000, 4096, PROT_READ) = 0
mprotect(0x3e8c14e000, 16384, PROT_READ) = 0
mprotect(0x3e8bc1c000, 4096, PROT_READ) = 0
munmap(0x2b60db400000, 154281)          = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
open("/dev/tty", O_RDWR|O_NONBLOCK)     = 3
close(3)                                = 0
brk(0)                                  = 0x1c239000
brk(0x1c25a000)                         = 0x1c25a000
open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=56458208, ...}) = 0
mmap(NULL, 56458208, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b60db428000
close(3)                                = 0
getuid()                                = 1000
getgid()                                = 1000
geteuid()                               = 1000
getegid()                               = 1000
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
open("/proc/meminfo", O_RDONLY)         = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b60dea00000
read(3, "MemTotal:     132051308 kB\nMemFr"..., 1024) = 781
close(3)                                = 0
munmap(0x2b60dea00000, 4096)            = 0
rt_sigaction(SIGCHLD, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGCHLD, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGQUIT, {0x1, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
uname({sys="Linux", node="q9-gg02.800best.com", ...}) = 0
stat("/home/oracle", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
stat(".", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
getpid()                                = 25928
open("/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=25464, ...}) = 0
mmap(NULL, 25464, PROT_READ, MAP_SHARED, 3, 0) = 0x2b60dea00000
close(3)                                = 0
getppid()                               = 25927
getpgrp()                               = 25927
rt_sigaction(SIGCHLD, {0x436360, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
open("/u01/oracle/app/product/10.2/db_1/bin/oerr", O_RDONLY) = 3
ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff7a3cc060) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "#!/bin/sh\n#\n# $Id: oerr 28-aug-2"..., 80) = 80
lseek(3, 0, SEEK_SET)                   = 0
getrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=64*1024}) = 0
dup2(3, 255)                            = 255
close(3)                                = 0
fcntl(255, F_SETFD, FD_CLOEXEC)         = 0
fcntl(255, F_GETFL)                     = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fstat(255, {st_mode=S_IFREG|0755, st_size=2365, ...}) = 0
lseek(255, 0, SEEK_CUR)                 = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "#!/bin/sh\n#\n# $Id: oerr 28-aug-2"..., 2365) = 2365
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
pipe([3, 4])                            = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
lseek(255, -973, SEEK_CUR)              = 1392
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b60db427070) = 25929
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGCHLD, {0x436360, [], SA_RESTORER, 0x3e8be302d0}, {0x436360, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
close(4)                                = 0
read(3, "ora:rdbms:*:\n", 128)          = 13
read(3, "", 128)                        = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25929
wait4(-1, 0x7fff7a3cb804, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn(0)                         = 0
close(3)                                = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "if [ $? -ne 0 ]\nthen\n\techo \"oerr"..., 2365) = 973
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
pipe([3, 4])                            = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
lseek(255, -645, SEEK_CUR)              = 1720
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b60db427070) = 25931
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGCHLD, {0x436360, [], SA_RESTORER, 0x3e8be302d0}, {0x436360, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
close(4)                                = 0
read(3, "Facility=ora\nComponent=rdbms\n", 128) = 29
read(3, "", 128)                        = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25931
wait4(-1, 0x7fff7a3cb904, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn(0)                         = 0
close(3)                                = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "if [ -z \"$Facility\" -o -z \"$Comp"..., 2365) = 645
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
--检查oraus.msg文件是否存在并且有权限访问
stat("/u01/oracle/app/product/10.2/db_1/rdbms/mesg/oraus.msg", {st_mode=S_IFREG|0644, st_size=3789794, ...}) = 0
geteuid()                               = 1000
getegid()                               = 1000
getuid()                                = 1000
getgid()                                = 1000
--访问oraus.msg文件
access("/u01/oracle/app/product/10.2/db_1/rdbms/mesg/oraus.msg", R_OK) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
stat(".", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
--查找awk命令位置
stat("/u01/oracle/app/product/10.2/db_1/jre/1.4.2/bin/awk", 0x7fff7a3cbe80) = -1 ENOENT (No such file or directory)
stat("/u01/oracle/app/product/10.2/db_1/bin/awk", 0x7fff7a3cbe80) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/qt-3.3/bin/awk", 0x7fff7a3cbe80) = -1 ENOENT (No such file or directory)
stat("/usr/kerberos/bin/awk", 0x7fff7a3cbe80) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/awk", 0x7fff7a3cbe80) = -1 ENOENT (No such file or directory)
--找到awk命令
stat("/bin/awk", {st_mode=S_IFREG|0755, st_size=338744, ...}) = 0
access("/bin/awk", X_OK)                = 0
access("/bin/awk", R_OK)                = 0
stat("/bin/awk", {st_mode=S_IFREG|0755, st_size=338744, ...}) = 0
access("/bin/awk", X_OK)                = 0
access("/bin/awk", R_OK)                = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [INT CHLD], 8) = 0
rt_sigprocmask(SIG_SETMASK, [INT CHLD], NULL, 8) = 0
lseek(255, -8, SEEK_CUR)                = 2357
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b60db427070) = 25934
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
--通过awk命令获取oraus.msg中相关命令
wait4(-1, 10047, 00000, "trace switching of sessions"
// *Cause:
// *Action:
[{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 25934
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, 0x7fff7a3cbb54, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn(0xffffffffffffffff)        = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "\nexit 0\n", 2365)           = 8
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
exit_group(0)                           = ?

通过这里可以知道oerr其实就是通过awk对$ORACLE_HOME/rdbms/mesg/oraus.msg文件进行筛选,显示出来对应ORA-错误号的提示

记录一次数据库异常导致ipc未释放案例

很多时候,linux/unix中的ipc未正常释放,表现的现象是再次启动实例的时候表现出来内存不足等类似现象,但是今天在数据库启动过程中因为数据库后台报ora-600[6002]错误,导致数据库启动失败,但是后续的一些列奇怪现象让我吃惊:
1.启动数据库异常的session一直处于卡主状态,任何办法无法终止
2.开打新会话,登录数据库提示进行操作提示not logging,但是abort会出现和1中一样的hang住
数据库open过程卡主 会话无法退出,crtl+c依然无效

SQL> startup
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
Database mounted.
^C
^C^C^C^C^C

alert报错

Tue Aug 27 20:33:57 2013
SMON: enabling cache recovery
Archived Log entry 29 added for thread 1 sequence 29 ID 0xfb920b77 dest 1:
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_4992.trc  (incident=201905):
ORA-00600: internal error code, arguments: [6002], [6], [6], [1], [0], [], [], [], [], [], [], []
Incident details in: /u01/oracle/diag/rdbms/ora11g/ora11g/incident/incdir_201905/ora11g_ora_4992_i201905.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Undo initialization errored: err:600 serial:0 start:1524754 end:1528104 diff:3350 (33 seconds)
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_4992.trc:
ORA-00600: internal error code, arguments: [6002], [6], [6], [1], [0], [], [], [], [], [], [], []
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_4992.trc:
ORA-00600: internal error code, arguments: [6002], [6], [6], [1], [0], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 4992): terminating the instance due to error 600
Instance terminated by USER, pid = 4992
ORA-1092 signalled during: alter database Open...

新开启会话登录

[oracle@xifenfei ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Tue Aug 27 20:27:38 2013
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected.
SQL> startup mount
ORA-01012: not logged on
SQL> conn / as sysdba
Connected to an idle instance.
SQL> startup mount
ORA-01012: not logged on

alert日志提示

Tue Aug 27 20:53:18 2013
ORA-1092 : opitsk aborting process
Tue Aug 27 20:53:44 2013
ORA-1092 : opitsk aborting process
Tue Aug 27 20:54:21 2013

shutdown abort 依然卡主

SQL> shutdown abort
^C
^C^C

查看ipc
根据alert日志提示,数据库已经异常关闭了,但是sqlplus中显示异常,怀疑是ipc未被正常释放

[oracle@xifenfei ~]$ ipcs -m
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 2228224    oracle    640        8388608    1
0x00000000 2260993    oracle    640        167772160  1
0xee5c29c4 2293762    oracle    640        2097152    1
[oracle@xifenfei ~]$ ps -ef|grep pmon
oracle    4843  4785  0 20:29 pts/3    00:00:00 grep pmon

释放ipc

[oracle@xifenfei ~]$ ipcrm -m 2260993
[oracle@xifenfei ~]$ ipcs -m
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 2228224    oracle    640        8388608    1
0x00000000 2260993    oracle    640        167772160  1          dest
0xee5c29c4 2293762    oracle    640        2097152    1
[oracle@xifenfei ~]$ ipcrm -m 2228224
[oracle@xifenfei ~]$ ipcs -m
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 2228224    oracle    640        8388608    1          dest
0x00000000 2260993    oracle    640        167772160  1          dest
0xee5c29c4 2293762    oracle    640        2097152    1
[oracle@xifenfei ~]$ ipcrm -m 2293762
[oracle@xifenfei ~]$ ipcs -m
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 2228224    oracle    640        8388608    1          dest
0x00000000 2260993    oracle    640        167772160  1          dest
0x00000000 2293762    oracle    640        2097152    1          dest

重新启动数据库

[oracle@xifenfei ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Tue Aug 27 20:33:35 2013
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup mount;
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
Database mounted.

再次查看ipc

[oracle@xifenfei ~]$ ipcs -m
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 2228224    oracle    640        8388608    1          dest
0x00000000 2260993    oracle    640        167772160  1          dest
0x00000000 2293762    oracle    640        2097152    1          dest
0x00000000 2490374    oracle    640        8388608    29
0x00000000 2523143    oracle    640        167772160  29
0xee5c29c4 2555912    oracle    640        2097152    29

模拟基表事务未提交数据库crash,undo丢失恢复异常恢复

对于基表特别是bootstrap$中的表,如果进行了dml操作,但是没有提交,这个时候数据库crash,而且undo异常.对于类似的场景,都是很多数据库恢复的人都有畏惧,因为_corrupted_rollback_segments参数不能起作用.本实验dml修改seq$表,最终却引起了undo$和I_UNDO1异常(他们都在bootstrap$中)
模拟seq$表事务未提交

SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> select sysdate "www.xifenfei.com" from dual;
www.xifenfei
------------
31-AUG-13
SQL> CREATE SEQUENCE seq_xifenfei INCREMENT BY 1 START WITH 1 CACHE 3;
Sequence created.
SQL> select object_id from dba_objects where object_name='SEQ_XIFENFEI';
 OBJECT_ID
----------
     77282
SQL> SELECT MINVALUE,HIGHWATER,INCREMENT$, CYCLE#,ORDER$,CACHE from seq$ where obj#=77282;
  MINVALUE  HIGHWATER INCREMENT$     CYCLE#     ORDER$      CACHE
---------- ---------- ---------- ---------- ---------- ----------
         1          1          1          0          0          3
SQL> update seq$ set  CYCLE#=3 where obj#=77282;
1 row updated.
--另外一个会话abort数据库
SQL> shutdown abort;
ORACLE instance shut down.
--人工rm掉undotbs01.dbf文件
--启动数据库
SQL> startup
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 3 - see DBWR trace file
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
--离线数据文件
SQL> alter database datafile 3 offline;
Database altered.
--尝试open数据库
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
Process ID: 17864
Session ID: 125 Serial number: 5

alert日志

SMON: enabling cache recovery
Mon Aug 26 17:08:49 2013
ARC3 started with pid=25, OS id=17902
ARC1: Archival started
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Archived Log entry 13 added for thread 1 sequence 13 ID 0xfb920b77 dest 1:
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_17864.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_17864.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
Error 704 happened during db open, shutting down database
USER (ospid: 17864): terminating the instance due to error 704
Instance terminated by USER, pid = 17864
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (17864) as a result of ORA-1092

做10046跟踪

SQL> conn / as sysdba
Connected to an idle instance.
SQL> startup nomount;
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
SQL> alter database mount;
Database altered.
SQL> alter session set events  '10046 trace name context forever,level 12';
Session altered.
SQL> alter session set db_file_multiblocK_read_count=1;
Session altered.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
Process ID: 17979
Session ID: 125 Serial number: 3

trace文件内容

PARSING IN CURSOR #3064524800 len=142 dep=3 uid=0 oct=3 lid=0 tim=1377508386082495 hv=361892850 ad='265c66d0' sqlid='7bd391hat42zk'
select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1
END OF STMT
PARSE #3064524800:c=2999,e=2162,p=0,cr=0,cu=0,mis=1,r=0,dep=3,og=3,plh=0,tim=1377508386082492
BINDS #3064524800:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=b6a8ebd0  bln=22  avl=02  flg=05
  value=27
EXEC #3064524800:c=1999,e=2030,p=0,cr=0,cu=0,mis=1,r=0,dep=3,og=3,plh=906473769,tim=1377508386084833
WAIT #3064524800: nam='db file sequential read' ela= 65 file#=1 block#=321 blocks=1 obj#=34 tim=1377508386085083
WAIT #3064524800: nam='db file sequential read' ela= 21 file#=1 block#=225 blocks=1 obj#=15 tim=1377508386085208
FETCH #3064524800:c=0,e=375,p=2,cr=2,cu=0,mis=0,r=1,dep=3,og=3,plh=906473769,tim=1377508386085319
STAT #3064524800 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=2 pw=0 time=362 us)'
STAT #3064524800 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=1 pw=0 time=223 us)'
CLOSE #3064524800:c=0,e=12,dep=3,type=0,tim=1377508386085657
WAIT #3064511164: nam='control file sequential read' ela= 33 file#=0 block#=1 blocks=1 obj#=15 tim=1377508386085816
WAIT #3064511164: nam='control file sequential read' ela= 18 file#=0 block#=15 blocks=1 obj#=15 tim=1377508386085889
WAIT #3064511164: nam='control file sequential read' ela= 18 file#=0 block#=17 blocks=1 obj#=15 tim=1377508386085994
WAIT #3064511164: nam='control file sequential read' ela= 18 file#=0 block#=23 blocks=1 obj#=15 tim=1377508386086058
WAIT #3064511164: nam='control file sequential read' ela= 17 file#=0 block#=1 blocks=1 obj#=15 tim=1377508386086135
WAIT #3064511164: nam='control file sequential read' ela= 15 file#=0 block#=15 blocks=1 obj#=15 tim=1377508386086196
WAIT #3064511164: nam='control file sequential read' ela= 15 file#=0 block#=17 blocks=1 obj#=15 tim=1377508386086255
WAIT #3064511164: nam='control file sequential read' ela= 18 file#=0 block#=32 blocks=1 obj#=15 tim=1377508386086317
DDE rules only execution for: ORA 1110
----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
----- START DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (Async) -----
Successfully dispatched
----- END DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (SUCCESS, 0 csec) -----
Executing ASYNC actions
----- END DDE Actions Dump (total 0 csec) -----
FETCH #3064511164:c=8998,e=9239,p=5,cr=6,cu=0,mis=0,r=0,dep=2,og=4,plh=2853959010,tim=1377508386088336
STAT #3064511164 id=1 cnt=0 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ$ (cr=0 pr=0 pw=0 time=82 us)'
STAT #3064511164 id=2 cnt=1 pid=1 pos=1 obj=37 op='INDEX RANGE SCAN I_OBJ2 (cr=3 pr=3 pw=0 time=1166 us)'
=====================
PARSE ERROR #3064512008:len=60 dep=1 uid=0 oct=3 lid=0 tim=1377508386088729 err=604
SELECT NULL FROM PROPS$ WHERE NAME='BOOTSTRAP_UPGRADE_ERROR'
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'

_corrupted_rollback_segments无法打开数据库

SQL> show parameter _corrupted_rollback_segments;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
_corrupted_rollback_segments         string      _SYSSMU22_1366820418$, _SYSSMU
                                                 23_2165404374$, _SYSSMU24_1842
                                                 376483$, _SYSSMU25_2373003110$
                                                 , _SYSSMU26_3635237253$, _SYSS
                                                 MU27_1730246918$, _SYSSMU28_31
                                                 59173804$, _SYSSMU29_141006113
                                                 4$, _SYSSMU30_2299069988$, _SY
                                                 SSMU31_290584066$
SQL> startup pfile='/tmp/pfile' mount
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
Database mounted.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
Process ID: 17979
Session ID: 125 Serial number: 3

dump block225

Block header dump:  0x004000e1
 Object id on Block? Y
 seg/obj: 0xf  csc: 0xb8c.3c0c297b  itc: 1  flg: O  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01
 Itl           Xid                  Uba        Flag  Lck        Scn/Fsc
0x01   0x0000.001.00000034  0x00400221.0034.08  --U-    1  fsc 0x0000.3c0c297c
bdba: 0x004000e1
data_block_dump,data header at 0xb6b8a644

bbed 提交225事务

[oracle@xifenfei ~]$ bbed password=blockedit filename='/u01/oracle/oradata/ora11g/system01.dbf' blocksize=8192
BBED: Release 2.0.0.0.0 - Limited Production on Mon Aug 26 17:31:09 2013
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
************* !!! For Oracle Internal Use only !!! ***************
BBED> set block 225
        BLOCK#          225
BBED> map
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 225                                   Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Table/Cluster)
 struct kcbh, 20 bytes                      @0
 struct ktbbh, 48 bytes                     @20
 struct kdbh, 14 bytes                      @68
 struct kdbt[1], 4 bytes                    @82
 sb2 kdbr[32]                               @86
 ub1 freespace[59]                          @150
 ub1 rowdata[7979]                          @209
 ub4 tailchk                                @8188
BBED> p ktbbh
struct ktbbh, 48 bytes                      @20
   ub1 ktbbhtyp                             @20       0x01 (KDDBTDATA)
   union ktbbhsid, 4 bytes                  @24
      ub4 ktbbhsg1                          @24       0x0000000f
      ub4 ktbbhod1                          @24       0x0000000f
   struct ktbbhcsc, 8 bytes                 @28
      ub4 kscnbas                           @28       0x3c0c297b
      ub2 kscnwrp                           @32       0x0b8c
   b2 ktbbhict                              @36      -2047
   ub1 ktbbhflg                             @38       0x03 (KTBFONFL)
   ub1 ktbbhfsl                             @39       0x00
   ub4 ktbbhfnx                             @40       0x00000000
   struct ktbbhitl[0], 24 bytes             @44
      struct ktbitxid, 8 bytes              @44
         ub2 kxidusn                        @44       0x0000
         ub2 kxidslt                        @46       0x0001
         ub4 kxidsqn                        @48       0x00000034
      struct ktbituba, 8 bytes              @52
         ub4 kubadba                        @52       0x00400221
         ub2 kubaseq                        @56       0x0034
         ub1 kubarec                        @58       0x08
      ub2 ktbitflg                          @60       0x2001 (KTBFUPB)
      union _ktbitun, 2 bytes               @62
         b2 _ktbitfsc                       @62       0
         ub2 _ktbitwrp                      @62       0x0000
      ub4 ktbitbas                          @64       0x3c0c297c
BBED> set mode edit
        MODE            Edit
BBED> m /x 0180 offset 60
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 225              Offsets:   60 to   91           Dba:0x00000000
------------------------------------------------------------------------
 01800000 7c290c3c 00012000 ffff5200 8d003017 30170000 20008e1f 4106f805
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 225:
current = 0x55db, required = 0x55db
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/oracle/oradata/ora11g/system01.dbf
BLOCK = 225
Block Checking: DBA = 4194529, Block Type = KTB-managed data block
data header at 0xb6773244
kdbchk: row locked by non-existent transaction
        table=0   slot=20
        lockid=1   ktbbhitc=1
Block 225 failed with check code 6101
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 1
Total Blocks Failing   (Data) : 1
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0
BBED> p *kdbr[20]
rowdata[608]
------------
ub1 rowdata[608]                            @817      0x2c
BBED> dump
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 225              Offsets:  817 to  848           Dba:0x00000000
------------------------------------------------------------------------
 2c011102 c115145f 53595353 4d553230 5f343035 34303430 31322402 c10202c1
 <32 bytes per line>
BBED> m /x 002c
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 225              Offsets:  817 to  848           Dba:0x00000000
------------------------------------------------------------------------
 002c1102 c115145f 53595353 4d553230 5f343035 34303430 31322402 c10202c1
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 225:
current = 0x79f6, required = 0x79f6
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/oracle/oradata/ora11g/system01.dbf
BLOCK = 225
Block Checking: DBA = 4194529, Block Type = KTB-managed data block
data header at 0xb6773244
kdbchk: row locked by non-existent transaction
        table=0   slot=20
        lockid=44   ktbbhitc=1
Block 225 failed with check code 6101
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 1
Total Blocks Failing   (Data) : 1
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0
BBED> p *kdbr[20]
rowdata[608]
------------
ub1 rowdata[608]                            @817      0x00
BBED> d
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 225              Offsets:  817 to  848           Dba:0x00000000
------------------------------------------------------------------------
 002c1102 c115145f 53595353 4d553230 5f343035 34303430 31322402 c10202c1
 <32 bytes per line>
BBED> m /x 2c00
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 225              Offsets:  817 to  848           Dba:0x00000000
------------------------------------------------------------------------
 2c001102 c115145f 53595353 4d553230 5f343035 34303430 31322402 c10202c1
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 225:
current = 0x55da, required = 0x55da
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/oracle/oradata/ora11g/system01.dbf
BLOCK = 225
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 1
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0

dump 321 block

SQL> alter system dump datafile '/u01/oracle/oradata/ora11g/system01.dbf' block 321;
System altered.
SQL> select value from v$diag_info where name='Default Trace File';
VALUE
--------------------------------------------------------------------------------
/u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_19704.trc
--trace文件
Block header dump:  0x00400141
 Object id on Block? Y
 seg/obj: 0x22  csc: 0xb8c.3c0c28ec  itc: 2  flg: -  typ: 2 - INDEX
     fsl: 0  fnx: 0x0 ver: 0x01
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x02   0x0000.038.00000033  0x0040021e.0033.3d  --U-    1  fsc 0x0000.3c0c28ed

bbed提交321 block 事务

BBED> set block 321
        BLOCK#          321
BBED> map
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 321                                   Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Index Leaf)
 struct kcbh, 20 bytes                      @0
 struct ktbbh, 72 bytes                     @20
 struct kdxle, 32 bytes                     @92
 b2 kd_off[32]                              @124
 ub1 freespace[7581]                        @188
 ub1 rowdata[351]                           @7769
 ub4 tailchk                                @8188
BBED> p ktbbh
struct ktbbh, 72 bytes                      @20
   ub1 ktbbhtyp                             @20       0x02 (KDDBTINDEX)
   union ktbbhsid, 4 bytes                  @24
      ub4 ktbbhsg1                          @24       0x00000022
      ub4 ktbbhod1                          @24       0x00000022
   struct ktbbhcsc, 8 bytes                 @28
      ub4 kscnbas                           @28       0x3c0c28ec
      ub2 kscnwrp                           @32       0x0b8c
   b2 ktbbhict                              @36      -2046
   ub1 ktbbhflg                             @38       0x02 (NONE)
   ub1 ktbbhfsl                             @39       0x00
   ub4 ktbbhfnx                             @40       0x00000000
   struct ktbbhitl[0], 24 bytes             @44
      struct ktbitxid, 8 bytes              @44
         ub2 kxidusn                        @44       0x0000
         ub2 kxidslt                        @46       0x0000
         ub4 kxidsqn                        @48       0x00000000
      struct ktbituba, 8 bytes              @52
         ub4 kubadba                        @52       0x00000000
         ub2 kubaseq                        @56       0x0000
         ub1 kubarec                        @58       0x00
      ub2 ktbitflg                          @60       0x0000 (NONE)
      union _ktbitun, 2 bytes               @62
         b2 _ktbitfsc                       @62       0
         ub2 _ktbitwrp                      @62       0x0000
      ub4 ktbitbas                          @64       0x00000000
   struct ktbbhitl[1], 24 bytes             @68
      struct ktbitxid, 8 bytes              @68
         ub2 kxidusn                        @68       0x0000
         ub2 kxidslt                        @70       0x0038
         ub4 kxidsqn                        @72       0x00000033
      struct ktbituba, 8 bytes              @76
         ub4 kubadba                        @76       0x0040021e
         ub2 kubaseq                        @80       0x0033
         ub1 kubarec                        @82       0x3d
      ub2 ktbitflg                          @84       0x2001 (KTBFUPB)
      union _ktbitun, 2 bytes               @86
         b2 _ktbitfsc                       @86       0
         ub2 _ktbitwrp                      @86       0x0000
      ub4 ktbitbas                          @88       0x3c0c28ed
BBED> m /x 0180 offset 84
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 321              Offsets:   84 to  595           Dba:0x00000000
------------------------------------------------------------------------
 01800000 ed280c3c 00008001 00000000 20006400 011e9d1d 00000000 00000000
 00000000 06000000 601f0000 561f4b1f 401f351f 2a1f1f1f 141f091f fe1ef31e
 e81edd1e d21ec71e bc1eb11e a61e9b1e 901e851e 7a1e6f1e 641e591e 4e1e431e
 381e2d1e 221e171e 0c1e011e 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 321:
current = 0x990e, required = 0x990e
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/oracle/oradata/ora11g/system01.dbf
BLOCK = 321
Block Checking: DBA = 4194625, Block Type = KTB-managed data block
**** actual rows locked by itl 2  = 1 != # in trans. header = 0
---- end index block validation
Block 321 failed with check code 6401
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 1
Total Blocks Failing   (Index): 1
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0
BBED> p kdxle
struct kdxle, 32 bytes                      @92
   struct kdxlexco, 16 bytes                @92
      ub1 kdxcolev                          @92       0x00
      ub1 kdxcolok                          @93       0x00
      ub1 kdxcoopc                          @94       0x80
      ub1 kdxconco                          @95       0x01
      ub4 kdxcosdc                          @96       0x00000000
      sb2 kdxconro                          @100      32
      b2 kdxcofbo                           @102      100
      b2 kdxcofeo                           @104      7681
      b2 kdxcoavs                           @106      7581
   b2 kdxlespl                              @108      0
   sb2 kdxlende                             @110      0
   ub4 kdxlenxt                             @112      0x00000000
   ub4 kdxleprv                             @116      0x00000000
   ub1 kdxledsz                             @120      0x06
   ub1 kdxleunuse                           @121      0x00
BBED> m /x 00 offset 95
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 321              Offsets:   95 to  606           Dba:0x00000000
------------------------------------------------------------------------
 00000000 00200064 00011e9d 1d000000 00000000 00000000 00060000 00601f00
 00561f4b 1f401f35 1f2a1f1f 1f141f09 1ffe1ef3 1ee81edd 1ed21ec7 1ebc1eb1
 1ea61e9b 1e901e85 1e7a1e6f 1e641e59 1e4e1e43 1e381e2d 1e221e17 1e0c1e01
 1e000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 321:
current = 0x380e, required = 0x380e
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/oracle/oradata/ora11g/system01.dbf
BLOCK = 321
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 1
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0

bbed提交事务open依旧报错

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
Process ID: 20229
Session ID: 125 Serial number: 5

做10046的trace文件错误依旧

=====================
PARSING IN CURSOR #3065238148 len=142 dep=3 uid=0 oct=3 lid=0 tim=1377519352653391 hv=361892850 ad='269a8064' sqlid='7bd391hat42zk'
select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1
END OF STMT
PARSE #3065238148:c=2999,e=3152,p=0,cr=0,cu=0,mis=1,r=0,dep=3,og=3,plh=0,tim=1377519352653388
BINDS #3065238148:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=b6b32268  bln=22  avl=02  flg=05
  value=27
EXEC #3065238148:c=2999,e=3182,p=0,cr=0,cu=0,mis=1,r=0,dep=3,og=3,plh=906473769,tim=1377519352657030
WAIT #3065238148: nam='db file sequential read' ela= 169 file#=1 block#=321 blocks=1 obj#=34 tim=1377519352657400
WAIT #3065238148: nam='db file sequential read' ela= 25 file#=1 block#=225 blocks=1 obj#=15 tim=1377519352657546
FETCH #3065238148:c=1000,e=578,p=2,cr=2,cu=0,mis=0,r=1,dep=3,og=3,plh=906473769,tim=1377519352657738
STAT #3065238148 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=2 pw=0 time=538 us)'
STAT #3065238148 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=1 pw=0 time=338 us)'
CLOSE #3065238148:c=0,e=13,dep=3,type=0,tim=1377519352657942
WAIT #3065201888: nam='control file sequential read' ela= 111 file#=0 block#=1 blocks=1 obj#=15 tim=1377519352658170
WAIT #3065201888: nam='control file sequential read' ela= 19 file#=0 block#=16 blocks=1 obj#=15 tim=1377519352658269
WAIT #3065201888: nam='control file sequential read' ela= 18 file#=0 block#=18 blocks=1 obj#=15 tim=1377519352658329
WAIT #3065201888: nam='control file sequential read' ela= 19 file#=0 block#=24 blocks=1 obj#=15 tim=1377519352658391
WAIT #3065201888: nam='control file sequential read' ela= 17 file#=0 block#=1 blocks=1 obj#=15 tim=1377519352658469
WAIT #3065201888: nam='control file sequential read' ela= 16 file#=0 block#=16 blocks=1 obj#=15 tim=1377519352658528
WAIT #3065201888: nam='control file sequential read' ela= 28 file#=0 block#=18 blocks=1 obj#=15 tim=1377519352658787
WAIT #3065201888: nam='control file sequential read' ela= 20 file#=0 block#=32 blocks=1 obj#=15 tim=1377519352658883
DDE rules only execution for: ORA 1110
----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
----- START DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (Async) -----
Successfully dispatched
----- END DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (SUCCESS, 0 csec) -----
Executing ASYNC actions
----- END DDE Actions Dump (total 0 csec) -----
FETCH #3065201888:c=11998,e=12032,p=5,cr=6,cu=0,mis=0,r=0,dep=2,og=4,plh=2853959010,tim=1377519352661044
STAT #3065201888 id=1 cnt=0 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ$ (cr=0 pr=0 pw=0 time=19 us)'
STAT #3065201888 id=2 cnt=1 pid=1 pos=1 obj=37 op='INDEX RANGE SCAN I_OBJ2 (cr=3 pr=3 pw=0 time=1001 us)'

通过观察每次的10046文件,发现总是在查询select /*+ rule */ name,file#, block#,status$,user#,undosqn,xactsqn, scnbas,scnwrp,DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1并且总是us#=27号回滚段上过不去,通过使用bbed或者dul修改数据文件,让数据库启动时候不再扫描27号回滚段(主要是修改undo$.status$=1,block位置可以通过10046确定,也可以通过odu来确定,亦或者找个相同版本的库查看)

继续open数据库

SQL> startup pfile='/tmp/pfile' mount
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
Database mounted.
SQL> alter database open;
Database altered.

删除有问题undo,重建新undo

SQL> drop tablespace undotbs1;
drop tablespace undotbs1
*
ERROR at line 1:
ORA-01561: failed to remove all objects in the tablespace specified
SQL> !oerr ora 01561
01561, 00000, "failed to remove all objects in the tablespace specified"
// *Cause: Failed to remove all objects when dropping a tablespace
// *Action: Retry the drop tablespace until all objects are dropped
SQL> drop tablespace undotbs1 including contents;
drop tablespace undotbs1 including contents
*
ERROR at line 1:
ORA-01561: failed to remove all objects in the tablespace specified
SQL> drop tablespace undotbs1 including contents and datafiles;
drop tablespace undotbs1 including contents and datafiles
*
ERROR at line 1:
ORA-01561: failed to remove all objects in the tablespace specified
-------------------------------------------------
因为前面使用了bbed/dul之类的工具,修改了undo$表,
导致undo$中的标记27号回滚段与seg$中不一致
解决方法使得修改undo$.status$为3或者2
-------------------------------------------------
SQL> update undo$ set status$=3 where us#=27;
1 row updated.
SQL> commit;
Commit complete.
SQL> drop tablespace undotbs1 ;
drop tablespace undotbs1
*
ERROR at line 1:
ORA-01561: failed to remove all objects in the tablespace specified
SQL> shutdown abort;
ORACLE instance shut down.
SQL> startup pfile='/tmp/pfile'
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
Database mounted.
Database opened.
SQL> drop tablespace undotbs1 ;
Tablespace dropped.
SQL> create undo tablespace undotbs1 datafile '/u01/oracle/oradata/ora11g/undotbs1.dbf' size 10M;
Tablespace created.

直接修改基表,bbed/dul修改数据文件等操作都是危险级别非常高的操作,而且oracle not support,在没有十足把握之前,一定不要在生产环境中执行,如果数据库挂了需要类似方法恢复,也请一定保留现场,如果你搞不定能够还原现场
部分操作可以参考:通过bbed修改回滚段状态解决ORA-00704故障