using backup controlfile 两种使用情况区别

使用备份控制文件和重建控制文件恢复,都需要使用到using backup controlfile命令,但是两种情况下却有着本质的区别
试验准备条件

SQL> select * from v$version;
BANNER
-------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> select to_char(sysdate,'yyyy-mm-dd')"www.xifenfei.com" from dual;
www.xifenfei.com
--------------------
2012-07-13
SQL> alter database open resetlogs;
Database altered.
SQL> select name from v$controlfile;
NAME
----------------------------------------------------
/u01/oracle/oradata/ora11g/control01.ctl
SQL> !cp /u01/oracle/oradata/ora11g/control01.ctl /tmp/xff.ctl
SQL> alter system checkpoint;
System altered.
SQL> /
System altered.
SQL> /
System altered.
SQL> alter system switch logfile;
System altered.
SQL> /
System altered.
SQL> /
System altered.
SQL> /
System altered.
SQL> alter system checkpoint;
System altered.
SQL> shutdown abort;
ORACLE instance shut down.
SQL> !rm /u01/oracle/oradata/ora11g/control01.ctl
SQL> ! cp /tmp/xff.ctl /u01/oracle/oradata/ora11g/control01.ctl
SQL> startup
ORACLE instance started.
Total System Global Area  523108352 bytes
Fixed Size                  1346052 bytes
Variable Size             432014844 bytes
Database Buffers           83886080 bytes
Redo Buffers                5861376 bytes
Database mounted.
ORA-01122: database file 1 failed verification check
ORA-01110: data file 1: '/u01/oracle/oradata/ora11g/system01.dbf'
ORA-01207: file is more recent than control file - old control file

相关说明:
1.通过resetlogs使得试验更加清晰
2.通过多次的checkpoint实现增加scn,switch logfile实现日志组切换
3.通过模拟备份控制文件恢复

查询相关SCN

SQL> set linesize 150
SQL> select file#,to_char(checkpoint_change#,'9999999999999999') "SCN",
  2  to_char(RESETLOGS_CHANGE#,'9999999999999999') "RESETLOGS SCN",FUZZY
  3  from v$datafile_header;
     FILE# SCN                                RESETLOGS SCN                      FUZZY
---------- ---------------------------------- ---------------------------------- ------
         1           2118981                            2118577                  YES
         2           2118981                            2118577                  YES
         3           2118981                            2118577                  YES
         4           2118981                            2118577                  YES
         6           2118981                            2118577                  YES
SQL> select file#,to_char(checkpoint_change#,'999999999999999') "SCN",
  2  to_char(last_change#,'999999999999999')"STOP_SCN" from v$datafile;
     FILE# SCN                              STOP_SCN
---------- -------------------------------- --------------------------------
         1          2118580
         2          2118580
         3          2118580
         4          2118580
         6          2118580
SQL> select CONTROLFILE_CHANGE#   from v$database;
CONTROLFILE_CHANGE#
-------------------
            2118713

做关于控制文件和数据文件dump

SQL> oradebug setmypid;
Statement processed.
SQL> oradebug dump file_hdrs 3;
Statement processed.
SQL> oradebug tracefile_name;
/u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_593.trc
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@xifenfei ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Fri Jul 13 03:05:48 2012
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> oradebug setmypid;
Statement processed.
SQL>  oradebug dump controlf 3;
Statement processed.
SQL> oradebug tracefile_name;
/u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_775.trc

分析file_hdrs 3 dump文件

--datafile 1的datafile header信息
Tablespace #0 - SYSTEM  rel_fn:1
Creation   at   scn: 0x0000.00000007 09/18/2011 17:33:47
Backup taken at scn: 0x0000.00000000 01/01/1988 00:00:00 thread:0
 reset logs count:0x2eff82e3 scn: 0x0000.002053b1
 prev reset logs count:0x2e9e8451 scn: 0x0000.0016eaab
 recovered at 07/13/2012 02:56:18
 status:0x2004 root dba:0x00400208 chkpt cnt: 760 ctl cnt:759
begin-hot-backup file size: 0
Checkpointed at scn:  0x0000.00205545 07/13/2012 03:01:42
--datafile 1的控制文件中信息
DATA FILE #1:
  name #7: /u01/oracle/oradata/ora11g/system01.dbf
creation size=0 block size=8192 status=0xe head=7 tail=7 dup=1
 tablespace 0, index=1 krfil=1 prev_file=0
 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
 Checkpoint cnt:752 scn: 0x0000.002053b4 07/13/2012 02:59:18
 Stop scn: 0xffff.ffffffff 07/13/2012 02:58:43
 Creation Checkpointed at scn:  0x0000.00000007 09/18/2011 17:33:47

通过这里可以看出来:
datafile header的Checkpointed scn(00205545)>controfile datafile的Checkpoint scn(002053b4)
datafile header的checkpiont count(760)>controfile datafile的checkpiont count(752)
所以在数据库open的时候会报ORA-01207错误

尝试恢复数据库

SQL> recover database using backup controlfile;
ORA-00279: change 2118713 generated at 07/13/2012 02:58:43 needed for thread 1
ORA-00289: suggestion : /u01/oracle/oradata/archivelog/ora11g/1_1_788497123.dbf
ORA-00280: change 2118713 for thread 1 is in sequence #1
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
ORA-00308: cannot open archived log
'/u01/oracle/oradata/archivelog/ora11g/1_1_788497123.dbf'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
SQL>
SQL> select to_char(2118713,'xxxxxxx') from dual;
TO_CHAR(2118713,
----------------
  205439

分析controlf 3 dump文件

***************************************************************************
DATABASE ENTRY
***************************************************************************
 (size = 316, compat size = 316, section max = 1, section in-use = 1,
  last-recid= 0, old-recno = 0, last-recno = 0)
 (extent = 1, blkno = 1, numrecs = 1)
 03/12/2012 22:17:06
 DB Name "ORA11G"
 Database flags = 0x00404000 0x00001000
 Controlfile Creation Timestamp  03/12/2012 22:17:07
 Incmplt recovery scn: 0x0000.00000000
 Resetlogs scn: 0x0000.002053b1 Resetlogs Timestamp  07/13/2012 02:58:43
 Prior resetlogs scn: 0x0000.0016eaab Prior resetlogs Timestamp  05/01/2012 13:14:57
 Redo Version: compatible=0xb200000
 #Data files = 5, #Online files = 5
 Database checkpoint: Thread=1 scn: 0x0000.002053b4
 Threads: #Enabled=1, #Open=1, Head=1, Tail=1
 Max log members = 3, Max data members = 1
 Arch list: Head=0, Tail=0, Force scn: 0x0000.00000000scn: 0x0000.00000000
 Activation ID: 4184707968
 Controlfile Checkpointed at scn:  0x0000.00205439 07/13/2012 02:59:25 <==控制文件checkpiont,控制文件每3秒的一次checkpiont
 thread:0 rba:(0x0.0.0)

通过recover的提示和对于controlf 3 dump文件的分析,可以确定数据库使用备份控制文件恢复,需要改控制文件备份之时开始的所有归档日志

查询数据库当前redo情况

SQL> select member from v$logfile;
MEMBER
-------------------------------------------------------------
/u01/oracle/oradata/ora11g/redo03.log
/u01/oracle/oradata/ora11g/redo02.log
/u01/oracle/oradata/ora11g/redo01.log
SQL> select SEQUENCE#,STATUS,FIRST_CHANGE# ,NEXT_CHANGE# from v$log;
 SEQUENCE# STATUS                           FIRST_CHANGE# NEXT_CHANGE#
---------- -------------------------------- ------------- ------------
         1 CURRENT                                2118577   2.8147E+14
         0 UNUSED                                       0            0
         0 UNUSED                                       0            0
SQL> archive log list;
Database log mode              No Archive Mode
Automatic archival             Disabled
Archive destination            /u01/oracle/oradata/archivelog/ora11g
Oldest online log sequence     1
Current log sequence           1
SQL> alter system dump logfile '/u01/oracle/oradata/ora11g/redo03.log';
System altered.
SQL> alter system dump logfile '/u01/oracle/oradata/ora11g/redo02.log';
System altered.
SQL> alter system dump logfile '/u01/oracle/oradata/ora11g/redo01.log';
System altered.

因为数据库处于非归档模式,而得到的redo信息主要都是来自控制文件,所以只能通过dump redo来分析当前redo的情况

分析redo log dump

DUMP OF REDO FROM FILE '/u01/oracle/oradata/ora11g/redo03.log'
 Opcodes *.*
 RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
 SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
 Times: creation thru eternity
 FILE HEADER:
	Compatibility Vsn = 186646528=0xb200000
	Db ID=4173966754=0xf8c9ada2, Db Name='ORA11G'
	Activation ID=4184707968=0xf96d9380
	Control Seq=7660=0x1dec, File size=30720=0x7800
	File Number=3, Blksiz=512, File Type=2 LOG
 descrip:"Thread 0001, Seq# 0000000003, SCN 0x00000020553d-0x000000205540"
 thread: 1 nab: 0x2 seq: 0x00000003 hws: 0x2 eot: 0 dis: 0
 resetlogs count: 0x2eff82e3 scn: 0x0000.002053b1 (2118577)
 prev resetlogs count: 0x2e9e8451 scn: 0x0000.0016eaab (1501867)
 Low  scn: 0x0000.0020553d (2118973) 07/13/2012 03:01:34
 Next scn: 0x0000.00205540 (2118976) 07/13/2012 03:01:35
 Enabled scn: 0x0000.002053b1 (2118577) 07/13/2012 02:58:43
 Thread closed scn: 0x0000.0020553d (2118973) 07/13/2012 03:01:34
 Disk cksum: 0xa716 Calc cksum: 0xa716
DUMP OF REDO FROM FILE '/u01/oracle/oradata/ora11g/redo02.log'
 Opcodes *.*
 RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
 SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
 Times: creation thru eternity
 FILE HEADER:
	Compatibility Vsn = 186646528=0xb200000
	Db ID=4173966754=0xf8c9ada2, Db Name='ORA11G'
	Activation ID=4184707968=0xf96d9380
	Control Seq=7663=0x1def, File size=30720=0x7800
	File Number=2, Blksiz=512, File Type=2 LOG
 descrip:"Thread 0001, Seq# 0000000005, SCN 0x000000205543-0xffffffffffff"
 thread: 1 nab: 0xffffffff seq: 0x00000005 hws: 0x1 eot: 1 dis: 0
 resetlogs count: 0x2eff82e3 scn: 0x0000.002053b1 (2118577)
 prev resetlogs count: 0x2e9e8451 scn: 0x0000.0016eaab (1501867)
 Low  scn: 0x0000.00205543 (2118979) 07/13/2012 03:01:36
 Next scn: 0xffff.ffffffff 01/01/1988 00:00:00
 Enabled scn: 0x0000.002053b1 (2118577) 07/13/2012 02:58:43
 Thread closed scn: 0x0000.00205543 (2118979) 07/13/2012 03:01:36
 Disk cksum: 0xc3f9 Calc cksum: 0xc3f9
 Terminal recovery stop scn: 0x0000.00000000
 Terminal recovery  01/01/1988 00:00:00
 Most recent redo scn: 0x0000.00000000
DUMP OF REDO FROM FILE '/u01/oracle/oradata/ora11g/redo01.log'
 Opcodes *.*
 RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
 SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
 Times: creation thru eternity
 FILE HEADER:
	Compatibility Vsn = 186646528=0xb200000
	Db ID=4173966754=0xf8c9ada2, Db Name='ORA11G'
	Activation ID=4184707968=0xf96d9380
	Control Seq=7663=0x1def, File size=30720=0x7800
	File Number=1, Blksiz=512, File Type=2 LOG
 descrip:"Thread 0001, Seq# 0000000004, SCN 0x000000205540-0x000000205543"
 thread: 1 nab: 0x2 seq: 0x00000004 hws: 0x2 eot: 0 dis: 0
 resetlogs count: 0x2eff82e3 scn: 0x0000.002053b1 (2118577)
 prev resetlogs count: 0x2e9e8451 scn: 0x0000.0016eaab (1501867)
 Low  scn: 0x0000.00205540 (2118976) 07/13/2012 03:01:35
 Next scn: 0x0000.00205543 (2118979) 07/13/2012 03:01:36
 Enabled scn: 0x0000.002053b1 (2118577) 07/13/2012 02:58:43
 Thread closed scn: 0x0000.00205540 (2118976) 07/13/2012 03:01:35
 Disk cksum: 0xaa26 Calc cksum: 0xaa26
 Terminal recovery stop scn: 0x0000.00000000
 Terminal recovery  01/01/1988 00:00:00
 Most recent redo scn: 0x0000.00000000

通过对redo dump的分析可以得到:
1.最小的sequence#=3是redo03.log
2.current redo为redo02.log

继续尝试恢复

SQL> recover database using backup controlfile;
ORA-00279: change 2118713 generated at 07/13/2012 02:58:43 needed for thread 1
ORA-00289: suggestion : /u01/oracle/oradata/archivelog/ora11g/1_1_788497123.dbf
ORA-00280: change 2118713 for thread 1 is in sequence #1
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/u01/oracle/oradata/ora11g/redo02.log
ORA-00326: log begins at change 2118979, need earlier change 2118713
ORA-00334: archived log: '/u01/oracle/oradata/ora11g/redo02.log'

关于备份控制文件使用using backup controlfile总结:恢复的启动控制文件备份的scn,需要该控制文件备份后的所有归档日志.对于当前这个非归档,而且redo被覆盖的库,该方法无法正常恢复

重建控制文件并做controlf 3 dump

SQL> alter database backup controlfile to trace as '/tmp/ctl.trace';
Database altered.
SQL> shutdown immediate
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
SQL> STARTUP NOMOUNT
CREATE CONTROLFILE REUSE DATABASE "ORA11G" NORESETLOGS  NOARCHIVELOG
    MAXLOGFILES 16
    MAXLOGMEMBERS 3
    MAXDATAFILES 100
    MAXINSTANCES 8
    MAXLOGHISTORY 292
LOGFILE
  GROUP 1 '/u01/oracle/oradata/ora11g/redo01.log'  SIZE 15M BLOCKSIZE 512,
  GROUP 2 '/u01/oracle/oradata/ora11g/redo02.log'  SIZE 15M BLOCKSIZE 512,
  GROUP 3 '/u01/oracle/oradata/ora11g/redo03.log'  SIZE 15M BLOCKSIZE 512
DATAFILE
  '/u01/oracle/oradata/ora11g/system01.dbf',
  '/u01/oracle/oradata/ora11g/sysaux01.dbf',
  '/u01/oracle/oradata/ora11g/undotbs01.dbf',
  '/u01/oracle/oradata/ora11g/users01.dbf',
  '/u01/oracle/oradata/ora11g/xifenfei02.dbf'
CHARACTER SET AL32UTF8
;
ORACLE instance started.
Total System Global Area  523108352 bytes
Fixed Size                  1346052 bytes
Variable Size             432014844 bytes
Database Buffers           83886080 bytes
Redo Buffers                5861376 bytes
SQL>   2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18
Control file created.
SQL> oradebug setmypid;
Statement processed.
SQL> oradebug dump controlf 3;
Statement processed.
SQL> oradebug tracefile_name;
/u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_1867.trc

尝试数据库恢复

SQL>  recover database using backup controlfile;
ORA-00279: change 2118981 generated at 07/13/2012 03:01:42 needed for thread 1
ORA-00289: suggestion : /u01/oracle/oradata/archivelog/ora11g/1_5_788497123.dbf
ORA-00280: change 2118981 for thread 1 is in sequence #5
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
ORA-00308: cannot open archived log
'/u01/oracle/oradata/archivelog/ora11g/1_5_788497123.dbf'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
SQL>
SQL> select to_char(2118981,'xxxxxxx') from dual;
TO_CHAR(2118981,
----------------
  205545
SQL> set linesize 150
SQL> select file#,to_char(checkpoint_change#,'9999999999999999') "SCN",
  2  to_char(RESETLOGS_CHANGE#,'9999999999999999') "RESETLOGS SCN",FUZZY
  3  from v$datafile_header;
     FILE# SCN                                RESETLOGS SCN                      FUZZY
---------- ---------------------------------- ---------------------------------- ------
         1           2118981                            2118577                  YES
         2           2118981                            2118577                  YES
         3           2118981                            2118577                  YES
         4           2118981                            2118577                  YES
         6           2118981                            2118577                  YES
SQL> select file#,to_char(checkpoint_change#,'999999999999999') "SCN",
  2  to_char(last_change#,'999999999999999')"STOP_SCN" from v$datafile;
     FILE# SCN                              STOP_SCN
---------- -------------------------------- --------------------------------
         1          2118981
         2          2118981
         3          2118981
         4          2118981
         6          2118981

分析 controlf 3 dump文件

***************************************************************************
DATABASE ENTRY
***************************************************************************
 (size = 316, compat size = 316, section max = 1, section in-use = 1,
  last-recid= 0, old-recno = 0, last-recno = 0)
 (extent = 1, blkno = 1, numrecs = 1)
 07/13/2012 03:24:51
 DB Name "ORA11G"
 Database flags = 0x00400102 0x00001000
 Controlfile Creation Timestamp  07/13/2012 03:24:51
 Incmplt recovery scn: 0x0000.00000000
 Resetlogs scn: 0x0000.002053b1 Resetlogs Timestamp  07/13/2012 02:58:43
 Prior resetlogs scn: 0x0000.0016eaab Prior resetlogs Timestamp  05/01/2012 13:14:57
 Redo Version: compatible=0xb200000
 #Data files = 5, #Online files = 5
 Database checkpoint: Thread=1 scn: 0x0000.00205543
***************************************************************************
DATA FILE RECORDS
***************************************************************************
 (size = 520, compat size = 520, section max = 100, section in-use = 6,
  last-recid= 0, old-recno = 0, last-recno = 0)
 (extent = 1, blkno = 11, numrecs = 100)
DATA FILE #1:
  name #8: /u01/oracle/oradata/ora11g/system01.dbf
creation size=0 block size=8192 status=0x12 head=8 tail=8 dup=1
 tablespace 0, index=1 krfil=1 prev_file=0
 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
 Checkpoint cnt:760 scn: 0x0000.00205545 07/13/2012 03:01:42
 Stop scn: 0xffff.ffffffff 07/13/2012 03:24:51
 Creation Checkpointed at scn:  0x0000.00000007 09/18/2011 17:33:47

完成恢复

SQL> recover database using backup controlfile;
ORA-00279: change 2118981 generated at 07/13/2012 03:01:42 needed for thread 1
ORA-00289: suggestion : /u01/oracle/oradata/archivelog/ora11g/1_5_788497123.dbf
ORA-00280: change 2118981 for thread 1 is in sequence #5
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/u01/oracle/oradata/ora11g/redo02.log
Log applied.
Media recovery complete.

关于重建控制文件后使用using backup总结:重建控制文件后,恢复的起点是datafile header scn 最小值,需要改scn之后的所有日志

未收集统计信息对象—执行sql动态采样

在一次ORA-7445导致oracle数据库down掉故障分析中,发现一条类似的sql非常大(通过复制到文档确定该sql大小是5M左右)

SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB)
opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE
*/ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("DCREDITMSG_00")
 FULL("DCREDITMSG_00") NO_PARALLEL_INDEX("DCREDITMSG_00") */ 1 AS C1, CASE WHEN
 "DCREDITMSG_00"."PHONE_NO"='具体电话号码' OR "DCREDITMSG_00"."PHONE_NO"='具体电话号码' OR
……………………N多OR "DCREDITMSG_00"."PHONE_NO"='具体电话号码'
"DCREDITMSG_00"."PHONE_NO"='具体电话号码' THEN 1 ELSE 0 END AS C2 FROM
"BSSADMIN"."DCREDITMSG_00" SAMPLE BLOCK (0.032410 , 1) SEED (1) "DCREDITMSG_00") SAMPLESUB

当时该sql因某种原因导致大量的sql area中很多内存泄露,最终导致数据库down掉.通过实验找出类此奇怪SQL.

创建模拟表

SQL> create table t_xifenfei
  2  as
  3  select  * from dba_objects;
Table created.
SQL> select count(*) from t_xifenfei;
  COUNT(*)
----------
     74605
SQL> select NUM_ROWS,LAST_ANALYZED from dba_tables
  2  where  table_name='T_XIFENFEI' and owner='CHF';
  NUM_ROWS LAST_ANALYZE
---------- ------------

得出信息:
1.该表一共有记录数74605条
2.该表未收集统计信息

查看执行计划

SQL>  set autotrace trace exp
SQL> select /*+ dynamic_sampling(t 0) */ * from t_xifenfei t;
Execution Plan
----------------------------------------------------------
Plan hash value: 548923532
--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            | 88868 |    17M|   299   (2)| 00:00:04 |
|   1 |  TABLE ACCESS FULL| T_XIFENFEI | 88868 |    17M|   299   (2)| 00:00:04 |
--------------------------------------------------------------------------------
--通过hint指定动态采样sql相关对象统计信息,
可以看到我们实际的表记录是74605而数据库采样出来的记录为88868,原则上还是可以接受
SQL> select * from t_xifenfei;
Execution Plan
----------------------------------------------------------
Plan hash value: 548923532
--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            | 73449 |    14M|   298   (1)| 00:00:04 |
|   1 |  TABLE ACCESS FULL| T_XIFENFEI | 73449 |    14M|   298   (1)| 00:00:04 |
--------------------------------------------------------------------------------
Note
-----
   - dynamic sampling used for this statement (level=2)
--因为没有统计信息,数据库动态采样sql相关对象统计信息
可以看到我们实际的表记录是74605而数据库采样出来的记录为73449,比手工指定采样准确

对自动采样进行10046跟踪

SQL> conn / as sysdba
Connected.
SQL> oradebug  setmypid
Statement processed.
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
Statement processed.
SQL> select count(*) from CHF.t_xifenfei;
  COUNT(*)
----------
     74605
SQL>  oradebug TRACEFILE_NAME
/u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_27967.trc

分析trace文件

*** 2012-07-12 15:42:34.991
WAIT #0:nam='SQL*Net message from client'ela= 56716427 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1342078954991525
=====================
PARSING IN CURSOR #3063864268 len=404 dep=1 uid=0 oct=3 lid=0 tim=1342078955037387
hv=4184780033 ad='385d3708' sqlid='3gjvvxzwqxb81'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB)
opt_param('parallel_execution_enabled', 'false')
NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM
(SELECT /*+ NO_PARALLEL("T_XIFENFEI") FULL("T_XIFENFEI") NO_PARALLEL_INDEX("T_XIFENFEI") */ 1 AS C1,
1 AS C2 FROM "CHF"."T_XIFENFEI" SAMPLE BLOCK (5.790441 , 1) SEED (1) "T_XIFENFEI") SAMPLESUB
END OF STMT
PARSE #3063864268:c=6000,e=5404,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=205916192,tim=1342078955037303
EXEC #3063864268:c=0,e=206,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=205916192,tim=1342078955037901
FETCH #3063864268:c=4998,e=4759,p=0,cr=65,cu=0,mis=0,r=1,dep=1,og=1,plh=205916192,tim=1342078955042730
STAT #3063864268 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=65 pr=0 pw=0 time=4795 us)'
STAT #3063864268 id=2 cnt=4253 pid=1 pos=1 obj=76370 op='TABLE ACCESS SAMPLE T_XIFENFEI
(cr=65 pr=0 pw=0 time=8247 us cost=19 size=61752 card=5146)'
CLOSE #3063864268:c=0,e=7,dep=1,type=0,tim=1342078955043024
=====================
PARSING IN CURSOR #3063864784 len=35 dep=0 uid=0 oct=3 lid=0 tim=1342078955043465 hv=2174183953 ad='3ed2d700'
sqlid='fadutqq0tfuhj'
select count(*) from CHF.t_xifenfei
END OF STMT
PARSE #3063864784:c=51991,e=51648,p=0,cr=66,cu=0,mis=1,r=0,dep=0,og=1,plh=2715729601,tim=1342078955043464
EXEC #3063864784:c=0,e=75,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2715729601,tim=1342078955043645
WAIT #3063864784: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1342078955043705

通过这个trace的分析,果然发现在执行我们需要的sql前,执行了SELECT /* OPT_DYN_SAMP */组成的一个复杂的采样sql语句.

收集统计信息查看执行计划

SQL> EXEC DBMS_STATS.gather_table_stats('CHF','T_XIFENFEI');
PL/SQL procedure successfully completed.
SQL> select NUM_ROWS,LAST_ANALYZED from dba_tables
  2  where  table_name='T_XIFENFEI' and owner='CHF';
  NUM_ROWS LAST_ANALYZE
---------- ------------
     74605 12-JUL-12
SQL>  set autotrace trace exp
SQL> select * from t_xifenfei;
Execution Plan
----------------------------------------------------------
Plan hash value: 548923532
--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            | 74605 |  7139K|   298   (1)| 00:00:04 |
|   1 |  TABLE ACCESS FULL| T_XIFENFEI | 74605 |  7139K|   298   (1)| 00:00:04 |
--------------------------------------------------------------------------------
--执行计划未提示dynamic sampling

继续做10046

SQL> conn / as sysdba
Connected.
SQL> oradebug  setmypid
Statement processed.
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
Statement processed.
SQL> select count(*) from CHF.t_xifenfei;
  COUNT(*)
----------
     74605
SQL> oradebug TRACEFILE_NAME
/u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_29780.trc

分析trace文件

*** 2012-07-12 16:14:53.914
Oradebug command 'EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12' console output: <none>
WAIT #0: nam='SQL*Net message to client' ela= 8 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1342080893914307
*** 2012-07-12 16:14:59.376
WAIT #0: nam='SQL*Net message from client' ela= 5461608 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1342080899376008
=====================
PARSING IN CURSOR #3063709248 len=35 dep=0 uid=0 oct=3 lid=0 tim=1342080899379562 hv=2174183953
ad='3ed2d700' sqlid='fadutqq0tfuhj'
select count(*) from CHF.t_xifenfei
END OF STMT

通过这里可以发现,当有了统计信息后,数据库不能再使用那条N多hint的sql去动态采样统计信息.

总结建议
动态采样(Dynamic Sampling)技术的最初提出是在Oracle 9i R2,在段(表,索引,分区)没有分析的情况下,为了使CBO 优化器得到足够的信息以保证做出正确的执行计划而发明的一种技术,可以把它看做分析手段的一种补充。当段对象没有统计信息时(即没有做分析),动态采样技术可以通过直接从需要分析的对象上收集数据块(采样)来获得CBO需要的统计信息。为了cbo,oracle引进了该功能,原则上说是一个很不错的东西,但是偶尔也是会出现一些意外,所以如果发现数据库中有表未做统计分析,建议手工处理下,ORACLE的自动收集统计信息程序也有不靠谱的时候(发现多次10g的库中有部分表未收集统计信息)

Oracle 11g增加列,并带默认值的新特性

在11g以前,如果要在一个大表中增加一列,并设置默认值,那将是一个非常悲剧的事情.有些时候不得不选择在线重定义功能来实现该需求.而在11g中增加新列并设置默认值,只是简单的修改数据字典来实现该功能,大大提供效率
10g加列(默认值)

SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production
SQL> create table t_xifenfei
  2  as select object_id,object_name from dba_objects;
Table created.
SQL> select count(*) from t_xifenfei;
  COUNT(*)
----------
     49827
SQL> desc t_xifenfei
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OBJECT_ID                                          NUMBER
 OBJECT_NAME                                        VARCHAR2(128)
SQL> set timing on
SQL> alter table t_xifenfei add c_xff varchar2(100) default 'www.xifenfei.com' not null;
Table altered.
Elapsed: 00:00:06.13
--使用了6秒钟
SQL> select   rowid,
  2   dbms_rowid.rowid_relative_fno(rowid)rel_fno,
  3   dbms_rowid.rowid_block_number(rowid)blockno,
  4   dbms_rowid.rowid_row_number(rowid) rowno
  5  from t_xifenfei where object_name='OBJ$';
ROWID                 REL_FNO    BLOCKNO      ROWNO
------------------ ---------- ---------- ----------
AAAMwJAAEAAAAB8AAr          4        124         43
SQL> alter system dump datafile 4 block 124;
System altered.
Elapsed: 00:00:00.08

11g增加列(默认值)

SQL> select * from v$version;
BANNER
-----------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> create table t_xifenfei
  2  as select object_id,object_name from dba_objects;
Table created.
SQL> select count(*) from t_xifenfei;
  COUNT(*)
----------
     74605
SQL> select   rowid,
  2   dbms_rowid.rowid_relative_fno(rowid)rel_fno,
  3   dbms_rowid.rowid_block_number(rowid)blockno,
  4   dbms_rowid.rowid_row_number(rowid) rowno
  5  from t_xifenfei where object_name='OBJ$';
ROWID                 REL_FNO    BLOCKNO      ROWNO
------------------ ---------- ---------- ----------
AAASpRAAEAAAACrAAu          4        171         46
SQL> alter system dump datafile 4 block 171;
System altered.
SQL> set timing on
SQL> alter table t_xifenfei add c_xff varchar2(100) default 'www.xifenfei.com' not null;
Table altered.
Elapsed: 00:00:00.19
--只是使用了0.19秒
SQL> select   rowid,
  2   dbms_rowid.rowid_relative_fno(rowid)rel_fno,
  3   dbms_rowid.rowid_block_number(rowid)blockno,
  4   dbms_rowid.rowid_row_number(rowid) rowno
  5  from t_xifenfei where object_name='OBJ$';
ROWID                 REL_FNO    BLOCKNO      ROWNO
------------------ ---------- ---------- ----------
AAASpRAAEAAAACrAAu          4        171         46
Elapsed: 00:00:00.04
SQL> alter system dump datafile 4 block 171;
System altered.

通过10g和11g的增加相同列和默认值的对比可以发现,11g的速度要比10g快很多很多,下面我们通过上面dump出来相关的数据块来分析原因
dump分析
11g增加列之前dump

tab 0, row 0, @0x1f74
tl: 12 fb: --H-FL-- lb: 0x0  cc: 2
col  0: [ 2]  c1 15
col  1: [ 5]  49 43 4f 4c 24
tab 0, row 1, @0x1f66
tl: 14 fb: --H-FL-- lb: 0x0  cc: 2
col  0: [ 2]  c1 2f
col  1: [ 7]  49 5f 55 53 45 52 31
tab 0, row 2, @0x1f5b
tl: 11 fb: --H-FL-- lb: 0x0  cc: 2
col  0: [ 2]  c1 1d
col  1: [ 4]  43 4f 4e 24

11g增加列之后dump

tab 0, row 0, @0x1f74
tl: 12 fb: --H-FL-- lb: 0x0  cc: 2
col  0: [ 2]  c1 15
col  1: [ 5]  49 43 4f 4c 24
tab 0, row 1, @0x1f66
tl: 14 fb: --H-FL-- lb: 0x0  cc: 2
col  0: [ 2]  c1 2f
col  1: [ 7]  49 5f 55 53 45 52 31
tab 0, row 2, @0x1f5b
tl: 11 fb: --H-FL-- lb: 0x0  cc: 2
col  0: [ 2]  c1 1d
col  1: [ 4]  43 4f 4e 24
tab 0, row 3, @0x1f4f
tl: 12 fb: --H-FL-- lb: 0x0  cc: 2
col  0: [ 2]  c1 10
col  1: [ 5]  55 4e 44 4f 24

10g增加列之后dump
因为10g在没有增加列之前的dump和11g未增加列之前类似,所以未dump出来

tab 0, row 0, @0x1f63
tl: 29 fb: --H-FL-- lb: 0x2  cc: 3
col  0: [ 2]  c1 15
col  1: [ 5]  49 43 4f 4c 24
col  2: [16]  77 77 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d
tab 0, row 1, @0x1f44
tl: 31 fb: --H-FL-- lb: 0x2  cc: 3
col  0: [ 2]  c1 2d
col  1: [ 7]  49 5f 55 53 45 52 31
col  2: [16]  77 77 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d
tab 0, row 2, @0x1f28
tl: 28 fb: --H-FL-- lb: 0x2  cc: 3
col  0: [ 2]  c1 1d
col  1: [ 4]  43 4f 4e 24
col  2: [16]  77 77 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d
tab 0, row 3, @0x1f0b
tl: 29 fb: --H-FL-- lb: 0x2  cc: 3
col  0: [ 2]  c1 10
col  1: [ 5]  55 4e 44 4f 24
col  2: [16]  77 77 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d

对比发现11g在增加列之后,以前的数据dump出来的内容未有任何改变.也就是说:在10g中,我们增加一个列和默认值,会自动的增加到真实的数据中,而在11g中增加列和默认值并未真的加到11g的表中已经存在的数据中.
11g中插入新数据dump测试

SQL>  insert into chf.t_xifenfei(object_id,object_name)
  2  select object_id,object_name FROM DBA_OBJECTS;
74605 rows created.
SQL> commit;
Commit complete.
SQL> ALTER SYSTEM FLUSH BUFFER_CACHE;
System altered.
SQL> /
System altered.
--建议刷新
SQL>
SQL>
SQL> select   rowid,
  2   dbms_rowid.rowid_relative_fno(rowid)rel_fno,
  3   dbms_rowid.rowid_block_number(rowid)blockno,
  4   dbms_rowid.rowid_row_number(rowid) rowno
  5  from chf.t_xifenfei where object_name='OBJ$';
ROWID                 REL_FNO    BLOCKNO      ROWNO
------------------ ---------- ---------- ----------
AAASpRAAEAAAACrAAu          4        171         46
AAASpRAAEAAAB5TAAu          4       7763         46
SQL> ALTER SYSTEM DUMP DATAFILE 4 BLOCK 7763;
System altered.
--dump内容
tab 0, row 0, @0x4e3
tl: 29 fb: --H-FL-- lb: 0x1  cc: 3
col  0: [ 2]  c1 15
col  1: [ 5]  49 43 4f 4c 24
col  2: [16]  77 77 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d
tab 0, row 1, @0x500
tl: 31 fb: --H-FL-- lb: 0x1  cc: 3
col  0: [ 2]  c1 2f
col  1: [ 7]  49 5f 55 53 45 52 31
col  2: [16]  77 77 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d
tab 0, row 2, @0x51f
tl: 28 fb: --H-FL-- lb: 0x1  cc: 3
col  0: [ 2]  c1 1d
col  1: [ 4]  43 4f 4e 24
col  2: [16]  77 77 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d
tab 0, row 3, @0x53b
tl: 29 fb: --H-FL-- lb: 0x1  cc: 3
col  0: [ 2]  c1 10
col  1: [ 5]  55 4e 44 4f 24
col  2: [16]  77 77 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d

通过这里看看出:在11g中后续插入的数据,默认值也插入到数据文件中

Oracle 11G的DDL_LOCK_TIMEOUT参数

今天kaums给客户做培训11g新特性,发现还真的有不少挺好的新东西,但是以前没有怎么去关注的他们,在后续的几篇中,陆续整理处理.
DDL_LOCK_TIMEOUT specifies a time limit for how long DDL statements will wait in a DML lock queue. The default value of zero indicates a status of NOWAIT. The maximum value of 1,000,000 seconds will result in the DDL statement waiting forever to acquire a DML lock.
If a lock is not acquired before the timeout period expires, then an error is returned.
数据库ddl_lock_timeout参数

SQL> select * from v$version;
BANNER
---------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> show parameter ddl_lock_timeout;
NAME                                 TYPE       VALUE
------------------------------------ ---------- --------------
ddl_lock_timeout                     integer    0

ddl_lock_timeout默认值测试(同以前版本)

--会话1
SQL> set time on
10:27:57 SQL> create table t_xifenfei as
10:28:05   2  select * from dba_users;
Table created.
Elapsed: 00:00:00.08
10:28:17 SQL> delete from t_xifenfei where username='CHF';
1 row deleted.
--会话2
SQL> set timing on
SQL> ALTER TABLE T_XIFENFEI DROP COLUMN AUTHENTICATION_TYPE;
ALTER TABLE T_XIFENFEI DROP COLUMN AUTHENTICATION_TYPE
            *
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired
Elapsed: 00:00:00.01

设置ddl_lock_timeout超时测试

--会话1
10:28:17 SQL> delete from t_xifenfei where username='CHF';
1 row deleted.
--会话2
SQL>  ALTER SESSION SET ddl_lock_timeout=10;
Session altered.
Elapsed: 00:00:00.00
SQL> ALTER TABLE T_XIFENFEI DROP COLUMN AUTHENTICATION_TYPE;
ALTER TABLE T_XIFENFEI DROP COLUMN AUTHENTICATION_TYPE
            *
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired
Elapsed: 00:00:10.01

设置ddl_lock_timeout未超时测试

--会话2
SQL> ALTER SESSION SET ddl_lock_timeout=30;
Session altered.
Elapsed: 00:00:00.00
--会话1
10:30:00 SQL> COMMIT;
Commit complete.
Elapsed: 00:00:00.00
--会话2
SQL> ALTER TABLE T_XIFENFEI DROP COLUMN AUTHENTICATION_TYPE;
Table altered.
Elapsed: 00:00:07.91

在以前的版本中ddl操作是nowait等待,通过实验可以发现ddl_lock_timeout可以在一定程度上解决因为我们不清楚这个表是否有dml操作而导致ddl操作不能进行的情况,从一定程度上减少了自己去尝试ddl操作,或者查询相关视图然后找出相关会话,然后kill掉对应数据的情况,可以说是在修改表结构的时候一个很不错的新特性.

使用flashback query恢复被删除plsql

今天一个朋友在11g中误删除生产中的一个过程,让他通过对dba_source视图的flashback query找回该过程.
从10g及其以后的版本中,如果被删除的plsql被及时发现(undo 未被覆盖掉)可以使用flashback query功能实现恢复.
创建plsql并删除

SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production
SQL> create or replace PROCEDURE  p_test_del(in_put varchar2) as
  2  begin
  3  dbms_output.put_line(in_put);
  4  end;
  5  /
Procedure created.
SQL> set serveroutput on
SQL> exec p_test_del('www.xifenfei.com');
www.xifenfei.com
PL/SQL procedure successfully completed.
SQL> select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') from dual;
TO_CHAR(SYSDATE,'YY
-------------------
2012-07-02 08:34:45
SQL> drop  PROCEDURE  p_test_del;
Procedure dropped.

恢复plsql

SQL> set pagesize 0
SQL> column text format a4000
SQL> spool /tmp/get_delete_proc.xff
SQL> SELECT text
  2         FROM DBA_source AS OF TIMESTAMP TO_TIMESTAMP('2012-07-02 08:34:45', 'YYYY-MM-DD HH24:MI:SS')
  3        WHERE OWNER = 'CHF' AND NAME = 'P_TEST_DEL' ORDER BY LINE;
PROCEDURE  p_test_del(in_put varchar2) as
begin
dbms_output.put_line(in_put);
end;
SQL> spool off;
SQL> !more /tmp/get_delete_proc.xff
SQL> SELECT text
  2         FROM DBA_source AS OF TIMESTAMP TO_TIMESTAMP('2012-07-02 08:34:45', 'YYYY-MM-DD HH24:MI:SS')
  3        WHERE OWNER = 'CHF' AND NAME = 'P_TEST_DEL' ORDER BY LINE;
create PROCEDURE  p_test_del(in_put varchar2) as
begin
dbms_output.put_line(in_put);
end;
SQL> spool off;

重建plsql

SQL> create PROCEDURE  p_test_del(in_put varchar2) as
  2  begin
  3  dbms_output.put_line(in_put);
  4  end;
  5  /
Procedure created.
SQL> set serveroutput on
SQL> exec p_test_del('惜分飞');
惜分飞
PL/SQL procedure successfully completed.

ORA-00600[kcbshlc_1]导致数据库 down 案例

一台服务器因为ORA-00600[kcbshlc_1]错误引起PMON异常导致数据库down掉

Sun Jul  8 17:20:10 2012
Errors in file /opt/oracle/admin/xff/bdump/xff_pmon_16412.trc:
ORA-00600: internal error code, arguments: [kcbshlc_1], [33], [], [], [], [], [], []
Sun Jul  8 17:20:12 2012
Errors in file /opt/oracle/admin/xff/bdump/xff_pmon_16412.trc:
ORA-00600: internal error code, arguments: [kcbshlc_1], [33], [], [], [], [], [], []
Sun Jul  8 17:20:12 2012
PMON: terminating instance due to error 472

分析trace文件

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /opt/oracle/product/10.2.0
System name:	Linux
Node name:	localhost.localdomain
Release:	2.6.9-89.ELsmp
Version:	#1 SMP Mon Apr 20 10:33:05 EDT 2009
Machine:	x86_64
Instance name: xff
Redo thread mounted by this instance: 1
Oracle process number: 2
Unix process pid: 16412, image: oracle@localhost.localdomain (PMON)
*** 2012-07-08 03:00:11.351
*** SERVICE NAME:(SYS$BACKGROUND) 2012-07-08 03:00:11.338
*** SESSION ID:(1105.1) 2012-07-08 03:00:11.338
 wsd 0x1f8169a6c8, sbuf (nil), setid 9, op 0
lcuridx 0, lasz (nil)
freeing in-flux r/w latch for process state: 1fc165d248
... in-flux r/w latch  1fc1fcc9b0 Child cache buffers chains level=1 child#=4753
        Location from where latch is held: kcbgtcr: kslbegin excl:
        Context saved from call: 113266196
        state=busy(exclusive) (val=0x2000000000000071) holder orapid = 113
    waiters [orapid (seconds since: put on list, posted, alive check)]:
     139 (2, 1341687611, 2)
     192 (2, 1341687611, 2)
     191 (2, 1341687611, 2)
     173 (2, 1341687611, 2)
     185 (2, 1341687611, 2)
     176 (2, 1341687611, 2)
     174 (2, 1341687611, 2)
     118 (2, 1341687611, 2)
     190 (2, 1341687611, 2)
     179 (2, 1341687611, 2)
     184 (1, 1341687611, 1)
     189 (1, 1341687611, 1)
     177 (1, 1341687611, 1)
     195 (1, 1341687611, 1)
     187 (1, 1341687611, 1)
     194 (1, 1341687611, 1)
     147 (1, 1341687611, 1)
     183 (1, 1341687611, 1)
     143 (1, 1341687611, 1)
     144 (1, 1341687611, 1)
     186 (1, 1341687611, 1)
     188 (1, 1341687611, 1)
     196 (1, 1341687611, 1)
     145 (1, 1341687611, 1)
     193 (1, 1341687611, 1)
     waiter count=25
*** 2012-07-08 03:50:06.228
 wsd 0x1f8169ac20, sbuf 0xac1ffafe8, setid 10, op 3
lcuridx 1, lasz 0x3c1ffc110
*** 2012-07-08 16:30:05.294
freeing in-flux r/w latch for process state: 20406507f0
... in-flux r/w latch  1f81265f28 Child cache buffers chains level=1 child#=14180
        Location from where latch is held: kcbgtcr: kslbegin excl:
        Context saved from call: 71341989
        state=busy(exclusive) (val=0x2000000000000066) holder orapid = 102
    waiters [orapid (seconds since: put on list, posted, alive check)]:
     121 (2, 1341736205, 2)
     116 (2, 1341736205, 2)
     125 (2, 1341736205, 2)
     140 (2, 1341736205, 2)
     145 (2, 1341736205, 2)
     waiter count=5
freeing in-flux r/w latch for process state: 1fc165f9d0
... in-flux r/w latch  1f813aec18 Child cache buffers chains level=1 child#=20914
        Location from where latch is held: kcbrls: kslbegin:
        Context saved from call: 96505705
        state=busy(exclusive) (val=0x200000000000007b) holder orapid = 123
*** 2012-07-08 17:20:10.876
 wsd 0x1f8169a6c8, sbuf (nil), setid 9, op 0
lcuridx 0, lasz (nil)
*** 2012-07-08 17:20:10.876
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kcbshlc_1], [33], [], [], [], [], [], []
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FBFFFCEB0 ? 7FBFFFCF10 ?
                                                   7FBFFFCE50 ? 000000000 ?
ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
                                                   7FBFFFCEB0 ? 7FBFFFCF10 ?
                                                   7FBFFFCE50 ? 000000000 ?
ksfdmp()+21          call     ksedmp()             000000003 ? 000000001 ?
                                                   7FBFFFCEB0 ? 7FBFFFCF10 ?
                                                   7FBFFFCE50 ? 000000000 ?
kgerinv()+161        call     ksfdmp()             000000003 ? 000000001 ?
                                                   7FBFFFCEB0 ? 7FBFFFCF10 ?
                                                   7FBFFFCE50 ? 000000000 ?
kgeasnmierr()+163    call     kgerinv()            0066876E0 ? 2A97200260 ?
                                                   7FBFFFCF10 ? 7FBFFFCE50 ?
                                                   000000000 ? 000000000 ?
kcbshlc()+239        call     kgeasnmierr()        0066876E0 ? 2A97200260 ?
                                                   7FBFFFCF10 ? 7FBFFFCE50 ?
                                                   000000000 ? 000000021 ?
kslilcr()+770        call     kcbshlc()            0066876E0 ? 1F801DDB28 ?
                                                   7FBFFFCF10 ? 7FBFFFCE50 ?
                                                   000000000 ? 000000021 ?
ksl_cleanup()+1567   call     kslilcr()            7FBFFFCE50 ? 000000000 ?
                                                   7FBFFFDCE0 ? 1F801DDB28 ?
                                                   0066876E0 ? 000000021 ?
ksuxfl()+492         call     ksl_cleanup()        000000000 ? 000000000 ?
                                                   000000000 ? 1F801DDB28 ?
                                                   0066876E0 ? 000000021 ?
ksuxda()+55          call     ksuxfl()             1FC165B8E0 ? 000000000 ?
                                                   000000000 ? 1F801DDB28 ?
                                                   0066876E0 ? 000000021 ?
ksucln()+1390        call     ksuxda()             1FC165B8E0 ? 000000000 ?
                                                   000000000 ? 1F801DDB28 ?
                                                   0066876E0 ? 000000021 ?
ksbrdp()+794         call     ksucln()             060008100 ? 000000000 ?
                                                   FFFFFFFF9720ED9F ?
                                                   1F801DDB28 ? 0066876E0 ?
                                                   000000021 ?
opirip()+616         call     ksbrdp()             060008100 ? 000000000 ?
                                                   000000001 ? 060008100 ?
                                                   0066876E0 ? 000000021 ?
opidrv()+582         call     opirip()             000000032 ? 000000004 ?
                                                   7FBFFFF698 ? 060008100 ?
                                                   0066876E0 ? 000000021 ?
sou2o()+114          call     opidrv()             000000032 ? 000000004 ?
                                                   7FBFFFF698 ? 060008100 ?
                                                   0066876E0 ? 000000021 ?
opimai_real()+317    call     sou2o()              7FBFFFF670 ? 000000032 ?
                                                   000000004 ? 7FBFFFF698 ?
                                                   0066876E0 ? 000000021 ?
main()+116           call     opimai_real()        000000003 ? 7FBFFFF700 ?
                                                   000000004 ? 7FBFFFF698 ?
                                                   0066876E0 ? 000000021 ?
__libc_start_main()  call     main()               000000003 ? 7FBFFFF700 ?
+219                                               000000004 ? 7FBFFFF698 ?
                                                   0066876E0 ? 000000021 ?
_start()+42          call     __libc_start_main()  000713984 ? 000000001 ?
                                                   7FBFFFF848 ? 005288D00 ?
                                                   000000000 ? 000000003 ?

通过这个trace可以看出数据库运行在LINUX 64操作系统,版本是10.2.0.4。
出现错误的原因:
PMON在清理1fc165d248的时候,因为被orapid = 102持有,导致清理失败.
PMON在清理20406507f0的时候,因为被orapid = 102持有,导致清理失败.
PMON在清理1fc165f9d0的时候,因为被orapid = 123持有,导致清理失败.

查询MOS[443909.1]
发现是unpublished Bug 4723109.处理方法打上Patch 4723109.

dul 10 export_mode=true功能增强

在有次8i的库恢复中,因为硬盘损坏导致几个表出现很多诡异性坏块,尝试使用dul对其进行挖掘数据,当时使用dul 9 遇到一个难题:当一张表中有lob类型,同时又有varchar2类型,而且varchar2类型数据中包含回车键,使得解决起来很麻烦(因为export_mode=false支持lob,但是不支持字符串含回车;export_mode=true支持字符串含回车,但是不支持lob),最后放弃了对部分数据的挖掘.这个问题让我一直不甘心,今天测试dul 10 发现是用export_mode=true可以完美解决该问题
创建模拟表和插入数据

SQL> desc t_xff
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 C_BLOB                                             BLOB
 C_VARCHAR                                          VARCHAR2(4000)
SQL> declare
  2  a_blob BLOB;
  3  bfile_name BFILE := BFILENAME('ULTLOBDIR','awr_ora11g_2012-06-01_174_175.html');
  4  begin
  5  insert into t_xff(C_BLOB,C_VARCHAR) values (
  6  empty_blob(),
  7  'www.xifenfei.com
  8  WWW.XIFENFEI.COM
  9  惜分飞
 10  欢迎访问惜分飞博客
 11  提供数据库异常恢复技术支持')
 12  returning C_BLOB into a_blob;
 13  dbms_lob.fileopen(bfile_name);
 14  dbms_lob.loadfromfile(a_blob, bfile_name, dbms_lob.getlength(bfile_name));
 15  dbms_lob.fileclose(bfile_name);
 16  commit;
 17  end;
 18  /
PL/SQL procedure successfully completed.
SQL> select length(c_varchar),dbms_lob.getlength(c_blob) from t_xff;
LENGTH(C_VARCHAR) DBMS_LOB.GETLENGTH(C_BLOB)
----------------- --------------------------
               61                    4282573
SQL>  select c_varchar from t_xff;
C_VARCHAR
---------------------------------------------------------------
www.xifenfei.com
WWW.XIFENFEI.COM
惜分飞
欢迎访问惜分飞博客
提供数据库异常恢复技术支持

dul 挖数据

[oracle@xifenfei dul]$ ./dul
Data UnLoader: 10.2.0.5.13 - Internal Only - on Mon Jul  2 04:29:10 2012
with 64-bit io functions
Copyright (c) 1994 2012 Bernard van Duijnen All rights reserved.
 Strictly Oracle Internal Use Only
DUL> bootstrap;
DUL> desc chf.t_xff;
Table CHF.T_XFF
obj#= 51353, dataobj#= 51353, ts#= 4, file#= 4, block#=67
      tab#= 0, segcols= 2, clucols= 0
Column information:
icol# 01 segcol# 01       C_BLOB len 4000 type 113 BLOB
  LOB Segment: dataobj#= 51354, ts#= 4, file#= 4, block#=75 chunk=1
  LOB Index: dataobj#= 51355, ts#= 4, file#= 4, block#=83
icol# 02 segcol# 02    C_VARCHAR len 4000 type  1 VARCHAR2 cs 852(ZHS16GBK)
--export_mode=false
DUL> unload table chf.t_xff;
. unloading (index organized) table     LOB01000053      65 rows unloaded
Preparing lob metadata from lob index
Reading LOB01000053.dat 65 entries loaded and sorted 65 entries
. unloading table                     T_XFF       1 row  unloaded
--导出数据文件
-rw-r--r-- 1 oracle oinstall 6.1K Jul  2 04:15 LOB01000053.dat
-rw-r--r-- 1 oracle oinstall  335 Jul  2 04:15 LOB01000053.ctl
-rw-r--r-- 1 oracle oinstall 8.2M Jul  2 04:15 CHF_T_XFF.dat
-rw-r--r-- 1 oracle oinstall  263 Jul  2 04:15 CHF_T_XFF.ctl
----export_mode=true
DUL> unload table chf.t_xff;
. unloading (index organized) table     LOB01000053
DUL: Warning: Recreating file "LOB01000053.ctl"
      65 rows unloaded
Preparing lob metadata from lob index
Reading LOB01000053.dat 65 entries loaded and sorted 65 entries
. unloading table                     T_XFF       1 row  unloaded
--导出数据文件
-rw-r--r-- 1 oracle oinstall    6229 Jul  2 04:29 LOB01000053.dat
-rw-r--r-- 1 oracle oinstall     335 Jul  2 04:29 LOB01000053.ctl
-rw-r--r-- 1 oracle oinstall 4285027 Jul  2 04:29 CHF_T_XFF.dmp

导入数据测试
sqlldr导入

SQL> truncate table chf.t_xff;
Table truncated.
[oracle@xifenfei dul]$ sqlldr chf/xifenfei control=CHF_T_XFF.ctl
SQL*Loader: Release 10.2.0.1.0 - Production on Mon Jul 2 04:23:18 2012
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
SQL*Loader-510: Physical record in data file (CHF_T_XFF.dat) is longer than the maximum(1048576)
SQL*Loader-2026: the load was aborted because SQL Loader cannot continue.
[oracle@xifenfei dul]$ sqlldr chf/xifenfei control=CHF_T_XFF.ctl readsize=20971520
SQL*Loader: Release 10.2.0.1.0 - Production on Mon Jul 2 04:26:50 2012
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
SQL> select length(c_varchar),dbms_lob.getlength(c_blob) from chf.t_xff;
no rows selected
--试验结果证明在出现表中同时有lob和varchar2列(含回车)时,export_mode=false不能正常工作

imp导入

SQL> drop table chf.t_xff;
Table dropped.
[oracle@xifenfei dul]$ imp chf/xifenfei file=CHF_T_XFF.dmp full=y
Import: Release 10.2.0.1.0 - Production on Mon Jul 2 04:30:30 2012
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
Export file created by EXPORT:V07.00.07 via conventional path
Warning: the objects were exported by Bernard's DUL, not by you
. importing Bernard's DUL's objects into CHF
. importing Bernard's DUL's objects into CHF
. . importing table                        "T_XFF"          1 rows imported
SQL> select length(c_varchar),dbms_lob.getlength(c_blob) from t_xff;
LENGTH(C_VARCHAR) DBMS_LOB.GETLENGTH(C_BLOB)
----------------- --------------------------
               61                    4282573
SQL>  select c_varchar from t_xff;
C_VARCHAR
---------------------------------------------------------------
www.xifenfei.com
WWW.XIFENFEI.COM
惜分飞
欢迎访问惜分飞博客
提供数据库异常恢复技术支持
--试验结果证明在出现表中同时有lob和varchar2列(含回车)时,export_mode=true正常工作

DBCA Fails With ORA-15243

今天接到朋友的电话说他们装ORACLE 11G R1 RAC的时候遇到ORA-12801/ORA-15243错误,请求我帮忙解决
具体情况
AIX系统以前装过11G R2 RAC,现因为项目要求11G R1,已经重装了系统,然后安装R1,在安装到DBCA配置ASM的时候,出现ORA-12801/ORA-15243错误

ORA-12801: error signaled in parallel query server PZ99, instance wmsdb1:+ASM1(1)
ORA-15243: 11.2.0.0.0 is not a valid version number


通过SQLPLUS登录ASM1实例查询发现该有一个ORADATA磁盘组,包含了一个/dev/rhdisk1.通过询问,得出结论是这个磁盘组以前是安装R2的时候作为存储OCR和VOTINGDISK使用,重装系统的时候未对该磁盘进行处理.

处理思路[想办法清除磁盘中asm信息]
1.尝试通过sqlplus 删除该磁盘组,报该磁盘组处于dismount状态
2.尝试mount该磁盘组,提示版本无效(ORA-15243)[当前的asm程序是11.1而磁盘组信息是11.2 程序当然不一致了]
3.直接使用dd清理该asm disk header信息(dd if=/dev/zero of=/dev/rhdisk1 bs=4096 count=1)
4.重新运行dbca一切工作正常

MOS中相关文章[1460997.1]只适合linux asmlib情况

Applies to:
Oracle Server - Enterprise Edition - Version 11.1.0.7 and later
Information in this document applies to any platform.
Symptoms
On : 11.1.0.7 version, STORAGE
When attempting to create database or query gv$asm_diskgroup,
the following error occurs.
ERROR
-----------------------
ORA-12801: error signaled in parallel query server PZ99, instance dchilcmsdb2.hq.navteq.com:+ASM2 (2)
ORA-15243: 11.2.0.0.0 is not a valid version number
STEPS
-----------------------
The issue can be reproduced at will with the following steps:
1. Previously had 11GR2 installed and configured. Removed this installation then installed
   11.1.0.7  and created diskgroups using some of the same disks previously used.
2. Attempt to create database and receive the errors. Drop the newly
   created diskgroups and query the view still get same errors.
BUSINESS IMPACT
-----------------------
The issue has the following business impact:
Due to this issue, users cannot create new database.
Changes
 Removed 11.2.0.1 installation and installed 11.1.0.7 software without cleaning up all of
 the diskgroup information from previous installation.
Cause
All the current information shows that we are using correct binaries and
that the diskgroups that are being used have correct comparability settings.
HTML shows that the disks for the old diskgroup are still being discovered.
This in conjunction with the text of the error as follows shows that
we are picking up 11.2.0.0.0 as version from somewhere.
ORA-15243: 11.2.0.0.0 is not a valid version number
Problem was caused by the disks that had been used for the
OCR/Voting disk diskgroup in 11GR2 installation still being present and accessible.
Solution
As the root user execute /etc/init.d/oracleasm/deletedisk command against all the disks
that were previously used for the OCR/Voting disk diskgroup then try the operation again.

11g DirectPath Reads 噩梦案例

DirectPath Reads 说明
在oracle 11g以前的版本中,如果对大表进行全表扫描,wait event是:db file scattered read;在11g中,如果对大表进行全表扫描,wait event是:direct path read。在11g中,大表全表扫描时数据块不经过sga而直接进pga,这样会造成每次进行大表全表扫描,物理读都是很大,而在10g中,由于全表扫描的数据块在sga中已经存在,所以执行全表扫描时,它的物理读为0。但是这里主要是oracle在优化策略上的进步,即假定大表频繁全表扫描这种现象,在生产库上不会太多,通过把数据直接读入pga,进而减少了cache buffer的繁忙交换程度,提高了cache buffer的使用效率.

DirectPath Reads 优势
1. 减少了对栓的使用,避免可能的栓争用
2. 物理IO的大小不再取决于buffer_cache中所存在的块;试想某个8个块的extent中1,3,5,7号块在高速缓存中,而2,4,6,8块没有被缓存,传统的方式在读取该extent时将会是对2,4,6,8块进行4次db file sequential read,这是一种十分可怕的状况,其效率往往要比单次读取这个区间的所有8个块还要低得多,虽然Oracle为了避免这种情况总是尽可能的不缓存大表的块(读入后总是放在队列最冷的一端);而direct path read则可以完全避免这类问题,尽可能地单次读入更多的物理块。

DirectPath Reads 噩梦
这一切听起来都很美好,但是在大并发的OLTP系统中,这东西简直是一个噩梦.通过一个awr来说明该问题:这个是一个系统的awr报告,朋友反馈说系统有段运行缓慢,请求帮忙找出原因
分析总体信息

系统这段时间会话临时大幅度增加(从102增加到223),系统出现异常繁忙(60.62*16=969.92<2,454.52)
分析Load Profile信息

通过这个截图发现系统的业务不是很大,但是Physical reads参数异常
1.物理读大小:25071.1*8192/1024/1024=195.86796875M/S
2.物理读将近逻辑读一半,这个在一般系统中很难得到这个比例,进一步说明物理读过高

分析Top 5信息

这里可以发现direct path read等待很多

分析Host CPU

可以发现iowait很大占40.5%,io等待异常高(195M/S能不高吗?)
补充说明:在这里我们看到的%Idle=1-%System-%User不包括%WIO

处理建议
通过上面的评估,可以确定大部分是由于 导致了数据库的物理读过高,从而使得系统反应变慢,处理方法就是关闭掉11g该新特性
alter system set event= ‘10949 trace name context forever, level 1’ scope=spfile;
重启数据库

undo segment header坏块异常恢复

alert日志报ORA-00600[4137]与ORA-00600 [4198]错误
数据库报如下错误,运行一段时间数据库自动down掉

Fri Jul  6 18:00:40 2012
SMON: ignoring slave err,downgrading to serial rollback
Fri Jul  6 18:00:41 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_smon_16636.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
ORACLE Instance techdb (pid = 8) - Error 600 encountered while recovering transaction (3, 17).
Fri Jul  6 18:00:41 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_smon_16636.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Fri Jul  6 18:05:53 2012
SMON: Restarting fast_start parallel rollback
Fri Jul  6 18:05:54 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_p000_17124.trc:
ORA-00600: internal error code, arguments: [4198], [9], [], [], [], [], [], []
…………
Wed Jul  6 18:50:38 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_pmon_4473.trc:
ORA-00474: SMON process terminated with error
Wed Jul  6 18:50:38 2012
PMON: terminating instance due to error 474

从三个地方得出3号回滚段异常
1.trace文件

SMON: about to recover undo segment 3
Parallel Transaction recovery caught exception 12801
Parallel Transaction recovery caught error 30317
*** 2012-07-06 17:55:19.042
SMON: Restarting fast_start parallel rollback
SMON: about to recover undo segment 3
SMON: mark undo segment 3 as available
SMON: about to recover undo segment 3
SMON: mark undo segment 3 as available
Parallel Transaction recovery caught exception 12801
Parallel Transaction recovery caught error 607
*** 2012-07-06 17:55:19.761
SMON: ignoring slave err,downgrading to serial rollback
SMON: about to recover undo segment 3
XID passed in =xid: 0x0003.011.00003c2b
XID from Undo block =xid: 0x0004.020.00002b35

2.alert中提示while recovering transaction (3, 17)
3.查询dba_rollback_segs发现_SYSSMU3$是NEED RECOVERY状态

尝试删除_SYSSMU3$
使用隐含参数_offline_rollback_segments= _SYSSMU3$

Fri Jul  6 18:16:19 2012
Completed: ALTER DATABASE OPEN
Fri Jul  6 18:16:56 2012
drop rollback segment "_SYSSMU3$"
Fri Jul  6 18:16:57 2012
Errors in file /usr/local/oracle/admin/techdb/udump/techdb_ora_17381.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [2], [41], [38508], [], [], [], []
Fri Jul  6 18:16:57 2012
Doing block recovery for file 2 block 41
Block recovery from logseq 209591, block 183 to scn 7788878085
Fri Jul  6 18:16:57 2012
Recovery of Online Redo Log: Thread 1 Group 1 Seq 209591 Reading mem 0
  Mem# 0 errs 0: /usr/local/oracle/oradata/techdb/redo01.log
Block recovery completed at rba 209591.225.16, scn 1.3493910790
ORA-607 signalled during: drop rollback segment "_SYSSMU3$"...
Fri Jul  6 18:16:57 2012
Corrupt Block Found
         TSN = 1, TSNAME = UNDOTBS1
         RFN = 2, BLK = 41, RDBA = 8388649
         OBJN = 0, OBJD = -1, OBJECT = _NEXT_OBJECT, SUBOBJECT =
         SEGMENT OWNER = SYS, SEGMENT TYPE = Invalid Type
Fri Jul  6 18:16:57 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_smon_17367.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [2], [41], [38508], [], [], [], []
Doing block recovery for file 2 block 41
Block recovery from logseq 209591, block 183 to scn 7788878085
Fri Jul  6 18:17:46 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_pmon_17355.trc:
ORA-00474: SMON process terminated with error
Fri Jul  6 18:17:46 2012
PMON: terminating instance due to error 474
Fri Jul  6 18:17:46 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_dbw0_17361.trc:
ORA-00474: SMON process terminated with error
Fri Jul  6 18:17:46 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_lgwr_17363.trc:
ORA-00474: SMON process terminated with error
Instance terminated by PMON, pid = 17355

这里可以看出在使用隐含参数删除异常回滚段的时候,因为该回滚段有坏块出现ORA-00600[kddummy_blkchk]使得数据库donw掉,重启过几次该库都因为这个错误直接down.
查看trace文件发现

SMON: about to recover undo segment 3
SMON: mark undo segment 3 as needs recovery
*** 2012-07-06 18:16:57.734
Block Checking: DBA = 8388649, Block Type = System Managed Segment Header Block
ERROR: SMU Segment Header Corrupted.  Error Code = 38508
ktu4smck: starting extent(0x77) of txn slot #0x11 is  invalid.
  valid value (0 - 0x76)
  TRN CTL:: seq: 0xed38 chd: 0x0020 ctl: 0x002a inc: 0x00000000 nfb: 0x0000
            mgc: 0x8201 xts: 0x0068 flg: 0x0001 opt: 2147483646 (0x7ffffffe)
            uba: 0x00a6610a.ed38.1d scn: 0x0001.d030de86
            Version: 0x01

因为该库是因为undo的3号回滚段的header出现坏块,即使使用了隐含参数屏蔽该回滚段恢复,smon进程依然会去读回滚段header,从而出现该错误导致直接down掉.

处理方案
1.使用隐含参数屏蔽异常回滚段_offline_rollback_segments= _SYSSMU3$
2.修改undo_tablespace=SYSTEM/undo_management=MANUAL
3.启动数据库,快速删除包含_SYSSMU3$ undo表空间
4.新建undo表空间
5.修改undo_tablespace=new_undo/undo_management=AUTO,除掉隐含参数
6.使用新参数文件重启数据库
7.建议:使用逻辑导出导入重建数据库