root 用户操作 ORACLE 数据库导致悲剧

接到同事请求,说客户的linux redhat 5.8平台部署的11.2.0.3 RAC 节点2挂掉了,报磁盘IO异常,数据库hang住

Fri Jun 14 12:01:22 2013
Thread 2 advanced to log sequence 369 (LGWR switch)
  Current log# 49 seq# 369 mem# 0: +DATA/q9db/onlinelog/group_49.861.817830099
Fri Jun 14 12:01:22 2013
Archived Log entry 89300 added for thread 2 sequence 368 ID 0x35324053 dest 1:
Fri Jun 14 14:26:18 2013
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_11788.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun2"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_11788.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun2"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
WARNING: failed to read mirror side 1 of virtual extent 441 logical extent 0 of file 625
  in group [2.3857217523] from disk DATA_0001
  allocation unit 377890 reason error; if possible, will try another mirror side
Fri Jun 14 14:31:17 2013
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_13767.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun2"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_13767.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun2"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
WARNING: failed to read mirror side 1 of virtual extent 441 logical extent 0 of file 625
  in group [2.3857217523] from disk DATA_0001
  allocation unit 377890 reason error; if possible, will try another mirror side

在12点钟数据库运行正常,无任何错误,突然到了14多出现ORA-15025/ORA-27041,并且重启ORACLE 数据库恢复正常。该错误很明显是数据库无权限访问ASM DISK,检查ASM实例日志

Thu Jun 13 19:01:21 2013
ASMB started with pid=25, OS id=25066
Thu Jun 13 19:01:22 2013
NOTE: client +ASM2:+ASM registered, osid 25068, mbr 0x0
WARNING: failed to online diskgroup resource ora.DATA.dg (unable to communicate with CRSD/OHASD)
Thu Jun 13 19:01:24 2013
WARNING: failed to online diskgroup resource ora.OCR_VOTE.dg (unable to communicate with CRSD/OHASD)
Thu Jun 13 19:01:57 2013
NOTE: client q9db2:q9db registered, osid 25732, mbr 0x1
Thu Jun 13 19:02:31 2013
ALTER SYSTEM SET local_listener=' (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.8.8.33)
 (PORT=1521))))' SCOPE=MEMORY SID='+ASM2';
Fri Jun 14 14:53:09 2013
SQL> ALTER DISKGROUP OCR_VOTE DISMOUNT  /* asm agent *//* {2:61929:97} */
Fri Jun 14 14:53:10 2013
SQL> ALTER DISKGROUP ARCH DISMOUNT  /* asm agent *//* {2:61929:97} */
Fri Jun 14 14:53:10 2013
SQL> ALTER DISKGROUP DATA DISMOUNT  /* asm agent *//* {2:61929:97} */

这里可以明显的看到,ASM实例在该时间点无任何错误,证明一切运行正常,查看系统日志,在该故障点,message中无任何记录,查看asm disk权限

[oracle@q9db02 trace]$ ll /dev/mapper/
total 0
crw------- 1 root root    10, 60 Jun  9 11:08 control
brw-rw---- 1 grid asmdba 253, 15 Jun 14 16:20 q9datalun1
brw-rw---- 1 grid asmdba 253, 16 Jun 14 16:20 q9datalun2
brw-rw---- 1 grid asmdba 253, 17 Jun 14 16:20 q9datalun3
brw-rw---- 1 grid asmdba 253, 18 Jun 14 16:19 q9datalun4
brw-rw---- 1 grid asmdba 253, 19 Jun 14 16:20 q9datalun5
brw-rw---- 1 grid asmdba 253, 20 Jun 14 16:20 q9datalun6
brw-rw---- 1 grid asmdba 253, 21 Jun 14 16:19 q9datalun7
brw-rw---- 1 grid asmdba 253,  4 Jun 14 16:20 q9datalun8
brw-rw---- 1 grid asmdba 253,  5 Jun 14 16:20 q9votelun1

所有文件权限没有任何问题,和当初部署之时完全相同而且运行了一段时间都正常,部署之时权限

[oracle@q9db02 trace]$ more /etc/rc.local
chown grid:asmdba /dev/mapper/q9votelun1
chmod 660 /dev/mapper/q9votelun1
chown grid:asmdba /dev/mapper/q9datalun1
chmod 660 /dev/mapper/q9datalun1
chown grid:asmdba /dev/mapper/q9datalun2
chmod 660 /dev/mapper/q9datalun2
chown grid:asmdba /dev/mapper/q9datalun3
chmod 660 /dev/mapper/q9datalun3
chown grid:asmdba /dev/mapper/q9datalun4
chmod 660 /dev/mapper/q9datalun4
chown grid:asmdba /dev/mapper/q9datalun5
chmod 660 /dev/mapper/q9datalun5
chown grid:asmdba /dev/mapper/q9datalun6
chmod 660 /dev/mapper/q9datalun6
chown grid:asmdba /dev/mapper/q9datalun7
chmod 660 /dev/mapper/q9datalun7
chown grid:asmdba /dev/mapper/q9datalun8
chmod 660 /dev/mapper/q9datalun8
chown grid:asmdba /dev/mapper/q9datalun8
chmod 660 /dev/mapper/q9datalun8

因为这里权限没有任何改变,而且asm disk权限正确,系统日志无任何日志,证明该问题不是因为ASM DISK权限改变导致,那我怀疑是人做了不该做的操作,比喻临时性修改了ASM DISK权限,然后有修改回来了,或者是不正常的用户操作了数据库,而这些操作更加可能是root用户操作,分析root用户操作记录

--history部分记录
  803  su  oracle
  804  exit
  805  cd /tmp
  806  ls
  807  cd sysbench/
  808  cd bin/
  809  ls
  810  ORACLE_SID=q9db2
  811  export ORACLE_BASE
  812  export ORACLE_HOME
  813  ./sysbench --test=oltp --oltp-table-name=sysbench --oltp-table-size=1 --oracle-db=Q9DB
       --oracle-user=sysbench --oracle-password=sysbench --db-driver=oracle  prepare
  814  syssql
  815  sqlplus system/sysbench@q9db02
  816  sqlplus system/q9db@q9db02
  817  echo $ORACLE_HOME
  818  cd $ORACLE_HOME/network/
  819  vi admin/tnsnames.ora
  820  sqlplus system/NEWQ9DB
  821   echo $ORACLE_HOME
  822  vi ~/.bash_profile
  823   echo $ORACLE_SID
  824  ps -ef | grep smon
  825  sqlplus system/NEWQ9DB
  826  exit

这里很明显的看到,由于SA想使用sysbench做系统基线测试,使用了root用户登录数据库并进行了相关操作,从而出现了该问题,因为ASM DISK 所有者是grid:asmdba,权限是660,root用户无法对ASM DISK进行读写操作,从而出现了上述错误。让同事协助SA重现上述操作,果然出现完全相同的错误,而且退出root session,数据库恢复正常

Fri Jun 14 15:44:24 2013
Archived Log entry 89330 added for thread 2 sequence 389 ID 0x35324053 dest 1:
Fri Jun 14 15:50:42 2013
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_29404.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun2"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_29404.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun2"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
WARNING: failed to read mirror side 1 of virtual extent 473 logical extent 0 of file 625
  in group [2.3857045540] from disk DATA_0001
  allocation unit 377894 reason error; if possible, will try another mirror side
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_29404.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun4"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
ORA-00604: error occurred at recursive SQL level 2
ORA-01115: IO error reading block from file  (block # )
ORA-01110: data file 1: '+DATA/q9db/datafile/system.625.817825255'
ORA-15081: failed to submit an I/O operation to a disk
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_29404.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun4"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
ORA-00604: error occurred at recursive SQL level 2
ORA-01115: IO error reading block from file  (block # )
ORA-01110: data file 1: '+DATA/q9db/datafile/system.625.817825255'
ORA-15081: failed to submit an I/O operation to a disk
WARNING: failed to read mirror side 1 of virtual extent 652 logical extent 0 of file 625
  in group [2.3857045540] from disk DATA_0003
  allocation unit 377939 reason error; if possible, will try another mirror side
Fri Jun 14 15:55:58 2013
Thread 2 advanced to log sequence 391 (LGWR switch)
  Current log# 41 seq# 391 mem# 0: +DATA/q9db/onlinelog/group_41.853.817830085
Fri Jun 14 15:55:58 2013
Archived Log entry 89331 added for thread 2 sequence 390 ID 0x35324053 dest 1:
Thread 2 advanced to log sequence 392 (LGWR switch)
  Current log# 42 seq# 392 mem# 0: +DATA/q9db/onlinelog/group_42.854.817830087

在ASM ORACLE RAC环境中,使用root操作oracle 数据库导致该错误,强烈建议:操作oracle数据库,请使用oracle数据库安装用户(最少也是同一个所属组用户)运行,超级用户root对于oracle来说也不是万能的

ORACLE 12C CDB中PDB参数管理机制

在ORACLE 12C中参数文件只是记录了cdb的参数信息,没有记录任何的pdb的信息,那ORACLE是如何管理使得各个pdb有自己的参数,这里通过试验的出来ORACLE 12C CDB环境中是通过参数文件结合PDB_SPFILE$来实现参数管理
数据库版本

SQL> select * from v$version;
BANNER                                                                               CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production              0
PL/SQL Release 12.1.0.1.0 - Production                                                    0
CORE    12.1.0.1.0      Production                                                        0
TNS for Linux: Version 12.1.0.1.0 - Production                                            0
NLSRTL Version 12.1.0.1.0 - Production                                                    0

pdb信息

SQL>  select PDB_NAME,CON_UID,pdb_id,status from dba_pdbs;
PDB_NAME      CON_UID     PDB_ID STATUS
---------- ---------- ---------- -------------
PDB1       3313918585          3 NORMAL
PDB$SEED   4048821679          2 NORMAL
PDB2       3872456618          4 NORMAL
SQL> select con_id,dbid,NAME,OPEN_MODE from v$pdbs;
    CON_ID       DBID NAME                           OPEN_MODE
---------- ---------- ------------------------------ ----------
         2 4048821679 PDB$SEED                       READ ONLY
         3 3313918585 PDB1                           READ WRITE
         4 3872456618 PDB2                           MOUNTED

CDB$ROOT中修改参数

--指定container=all
SQL> show con_name
CON_NAME
------------------------------
CDB$ROOT
SQL> alter system set open_cursors=500 container=all;
System altered.
SQL> show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     500
SQL> alter session set container=pdb1;
Session altered.
SQL> show con_name
CON_NAME
------------------------------
PDB1
SQL> show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     500
--在CDB$ROOT中修改不指定container参数表示全部pdb生效
SQL> alter session set container=CDB$ROOT;
Session altered.
SQL> alter system set open_cursors=100;
System altered.
SQL>  show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     100
SQL> alter session set container=pdb1;
Session altered.
SQL> show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     100
--指定container=current
SQL> alter system set open_cursors=120 container=current;
System altered.
SQL> show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     120
SQL> alter session set container=pdb2 ;
Session altered.
SQL> show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     120

这里可以看出来,在ROOT中修改参数,默认情况和指定container=all/current均是所有open的pdb都生效.
这里有个疑问ORACLE的参数文件只是记录的cdb的sid的参数,并未记录各个pdb的参数,那是如何实现cdb中各个pdb参数不一致的呢?继续分析

修改pdb参数做10046

SQL> show con_name;
CON_NAME
------------------------------
PDB1
SQL> oradebug setmypid
Statement processed.
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
Statement processed.
SQL> oradebug TRACEFILE_NAME
/u01/app/oracle/diag/rdbms/cdb/cdb/trace/cdb_ora_18377.trc
SQL> alter system set sessions=100;
System altered.
SQL> oradebug EVENT 10046 trace name context off
Statement processed.
--继续修改pdb参数
SQL> alter session set container=pdb1;
Session altered.
SQL>  oradebug setmypid
Statement processed.
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
Statement processed.
SQL>  oradebug TRACEFILE_NAME
/u01/app/oracle/diag/rdbms/cdb/cdb/trace/cdb_ora_20275.trc
SQL> alter system set sessions=101;
System altered.
SQL> oradebug EVENT 10046 trace name context off
Statement processed.

分析trace文件

--第一次修改pdb参数值
insert into pdb_spfile$(db_uniq_name, pdb_uid, sid, name, value$, comment$)  values(:1,:2,:3,:4,:5,:6)
END OF STMT
PARSE #140085118752824:c=3999,e=3397,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=99767937623
BINDS #140085118752824:
 Bind#0
  oacdty=01 mxl=32(03) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=7fffcfaa5842  bln=32  avl=03  flg=09
  value="cdb"
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f681bbb2170  bln=22  avl=06  flg=05
  value=3313918585
 Bind#2
  oacdty=01 mxl=32(01) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=7fffcfaa46f8  bln=32  avl=01  flg=09
  value="*"
 Bind#3
  oacdty=01 mxl=32(08) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=0bc220d8  bln=32  avl=08  flg=09
  value="sessions"
 Bind#4
  oacdty=01 mxl=32(03) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=7fffcfaa474c  bln=32  avl=03  flg=09
  value="100"
 Bind#5
  oacdty=01 mxl=32(00) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=00000000  bln=32  avl=00  flg=09
--第二次修改pdb参数值(相同参数)
update pdb_spfile$ set value$=:5, comment$=:6  where name=:1 and pdb_uid=:2 and db_uniq_name=:3 and sid=:4
BINDS #140603847818408:
 Bind#0
  oacdty=01 mxl=32(03) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=7ffff6477dcc  bln=32  avl=03  flg=09
  value="101"
 Bind#1
  oacdty=01 mxl=32(00) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=00000000  bln=32  avl=00  flg=09
 Bind#2
  oacdty=01 mxl=32(08) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=0bc220d8  bln=32  avl=08  flg=09
  value="sessions"
 Bind#3
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fe0e2638320  bln=22  avl=06  flg=05
  value=3313918585
 Bind#4
  oacdty=01 mxl=32(03) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=7ffff6478ec2  bln=32  avl=03  flg=09
  value="cdb"
 Bind#5
  oacdty=01 mxl=32(01) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=7ffff6477d78  bln=32  avl=01  flg=09
  value="*"

通过这里我们发现在独立修改pdb参数之时,其本质是在pdb_spfile$基表中插入或者修改相关记录(第一次修改插入,后续修改是更新)

关于pdb_spfile$基表分析

SQL> SHOW CON_NAME;
CON_NAME
------------------------------
CDB$ROOT
SQL> COL OWNER FOR A10
SQL> select con_id,owner,object_type from cdb_objects where object_name='PDB_SPFILE$';
    CON_ID OWNER      OBJECT_TYPE
---------- ---------- -----------------------
         2 SYS        TABLE
         1 SYS        TABLE
         3 SYS        TABLE
SQL> COL DB_UNIQ_NAME FOR A10
SQL> COL NAME FOR A15
SQL> COL VALUE$ FOR A10
SQL> SELECT DB_UNIQ_NAME,PDB_UID,NAME,VALUE$ FROM PDB_SPFILE$;
DB_UNIQ_NA    PDB_UID NAME            VALUE$
---------- ---------- --------------- ----------
cdb        3313918585 sessions        101
SQL> ALTER SESSION SET CONTAINER=pdb1;
Session altered.
SQL>  SELECT DB_UNIQ_NAME,PDB_UID,NAME,VALUE$ FROM PDB_SPFILE$;
no rows selected

证明pdb中不同于root的参数是记录在root的PDB_SPFILE$基表中.
整个CDB的工作原理是如果在PDB_SPFILE$中无相关参数记录,则继承cdb的参数文件中值,如果PDB_SPFILE$中有记录则使用该值覆盖cdb参数文件值.

删除PDB_SPFILE$验证

SQL> SHOW CON_NAME;
CON_NAME
------------------------------
CDB$ROOT
SQL> show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     100
SQL>  select con_id,dbid,NAME,OPEN_MODE from v$pdbs;
    CON_ID       DBID NAME                           OPEN_MODE
---------- ---------- ------------------------------ ----------
         2 4048821679 PDB$SEED                       READ ONLY
         3 3313918585 PDB1                           MOUNTED
         4 3872456618 PDB2                           READ WRITE
SQL> alter session set container=pdb2;
Session altered.
SQL> show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     100
SQL> alter system set open_cursors=110;
System altered.
SQL> show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     110
SQL> conn / as sysdba
Connected.
SQL> select value$ from pdb_spfile$ where name='open_cursors';
VALUE$
--------------------------------------------------------------------------------
110
SQL> delete from  pdb_spfile$ where name='open_cursors';
1 row deleted.
SQL> commit;
Commit complete.
SQL> startup
ORACLE instance started.
Total System Global Area  597098496 bytes
Fixed Size                  2291072 bytes
Variable Size             272632448 bytes
Database Buffers          314572800 bytes
Redo Buffers                7602176 bytes
Database mounted.
Database opened.
SQL> select value$ from pdb_spfile$ where name='open_cursors';
no rows selected
SQL> show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     100
SQL> alter session set container=pdb2 ;
Session altered.
SQL> alter database open;
Database altered.
SQL>  show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     100

删除PDB_SPFILE$中相关记录,pdb的参数值会自动继续继承cdb中参数值
总结说明:通过上述的一些列试验证明cdb中参数关系,在cdb中修改,会默认所有pdb均自动继承;如果在pdb中修改值会覆盖cdb参数,而且只对当前pdb生效,并记录在PDB_SPFILE$

ORACLE 12C ASM 新特性:共享密码文件

在ORACLE 12C之前大家都知道密码文件是存放在?/dbs或者?/database中,如果要修改修改sysdba权限的用户密码时候,会去修改密码文件,而在rac数据库的sys密码文件是存在各个节点中,这个时候修改sysdba权限的密码就需要在两个节点都要做同样的操作,而对于数据库来说本身是只要在一个节点上修改即可,因为密码是记录在user$中,就是因为密码文件非共享且在各个节点中都有,因此需要在各个节点均要执行修改密码命令,确保密码文件被正常修改。因为rac 密码文件非共享的机制存在,导致修改sysdba权限密码繁琐,有些时候甚至有节点忘记修改,导致需要使用密码文件操作数据库的时候不能正常进行,DG传输日志异常等故障。在ORACLE 12C中为了解决这个问题,引入了密码文件可以存入ASM新特性,从而使得密码文件存储在ASM中实现所有节点共享,从而解决该问题.
ASM存储密码文件前提条件 COMPATIBLE.ASM>= 12.1
查询ASM信息

SQL>  select * from v$version;
BANNER                                                                               CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production              0
PL/SQL Release 12.1.0.1.0 - Production                                                    0
CORE    12.1.0.1.0      Production                                                        0
TNS for Linux: Version 12.1.0.1.0 - Production                                            0
NLSRTL Version 12.1.0.1.0 - Production                                                    0
SQL> select NAME,COMPATIBILITY from v$asm_diskgroup;
NAME                           COMPATIBILITY
------------------------------ ------------------------------------------------------------
DATA                           12.1.0.0.0

查询crs中关于db配置

[grid@xifenfei ~]$  srvctl config database -d cdb
Database unique name: cdb
Database name: cdb
Oracle home: /u01/app/oracle/product/12.1/db_1
Oracle user: oracle
Spfile: +DATA/cdb/spfilecdb.ora
Password file:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: MANUAL
Database instance: cdb
Disk Groups: DATA
Services:

这里db的password file为空,即表示使用默认值,也就是为$ORACLE_HOME/dbs/orapwxifenfei

创建密码文件存储在ASM中

--创建db新密码文件
[oracle@xifenfei ~]$ orapwd file='+data/CDB/orapwdxifenfei' dbuniquename='cdb'
Enter password for SYS:
----输入sys用户密码
--创建asm新密码文件
orapwd file='+data/ASM/orapwasm' asm=y
----asm=y 表示创建的密码文件为asm的
--使用老密码文件创建db/asm新密码文件
orapwd input_file='/oraclegrid/dbs/orapwasm' file='+data/ASM/orapwasm' [asm=y]
----input_file 表示使用老的密码文件创建新的存储在ASM中的密码文件

查看ASM中密码文件

ASMCMD> showversion
ASM version         : 12.1.0.1.0
ASMCMD> pwd
+data/cdb
ASMCMD>  ls -l orapwdxifenfei
Type      Redund  Striped  Time             Sys  Name
PASSWORD  UNPROT  COARSE   MAY 31 19:00:00  N    orapwdxifenfei => +DATA/CDB/PASSWORD/pwdcdb.290.816897265

配置crs中password file项

[grid@xifenfei ~]$ srvctl modify database -db cdb -pwfile  +data/CDB/orapwdxifenfei

查询crs中关于db配置

[grid@xifenfei ~]$  srvctl config database -d cdb
Database unique name: cdb
Database name: cdb
Oracle home: /u01/app/oracle/product/12.1/db_1
Oracle user: oracle
Spfile: +DATA/cdb/spfilecdb.ora
Password file: +data/CDB/orapwdxifenfei
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: MANUAL
Database instance: cdb
Disk Groups: DATA
Services:

至此数据库启动使用密码ASM中的密码文件完成,补充说明,该方式配置在ASM中的密码文件,只能是通过crs方式启动db才会生效,如果手工使用sqlplus启动数据库不会使用该密码文件,还是使用默认密码文件。这里也就提醒大家操作规范:在RAC环境(包含单节点的GI环境)中,对数据库的启动关闭操作强烈建议使用crs相关命令来完成,而不推荐使用sqlplus命令

跳过rman坏块恢复

在有些情况下,我们仅有一份rman备份,而这个时候rman 备份有出现坏块,使得我们的还原/恢复工作无法继续下去,导致数据大量丢失.我们可以通过设置event 19548/19549来跳过坏块,最大程度抢救数据
rman备份数据文件

C:\Users\XIFENFEI>rman target /
Recovery Manager: Release 11.2.0.3.0 - Production on Thu Jun 6 20:31:19 2013
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
connected to target database: XIFENFEI (DBID=1422012639)
RMAN> backup tablespace users format 'f:/users_bak.rman';
Starting backup at 06-JUN-13
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=197 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00004 name=E:\ORACLE\ORADATA\XIFENFEI\USERS01.DBF
channel ORA_DISK_1: starting piece 1 at 06-JUN-13
channel ORA_DISK_1: finished piece 1 at 06-JUN-13
piece handle=F:\USERS_BAK.RMAN tag=TAG20130606T203154 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
Finished backup at 06-JUN-13

切换归档日志

SQL> alter system switch logfile;
System altered.
SQL> /
System altered.
SQL> /
System altered.
SQL> archive log list;
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            E:\oracle\product\11.2.0\dbhome_1\RDBMS
Oldest online log sequence     95
Next log sequence to archive   97
Current log sequence           97

重命名数据文件

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
--------------------------------------
e:\oracle\oradata\XIFENFEI>move USERS01.DBF USERS01_bak.DBF
移动了         1 个文件。
--------------------------------------
SQL> startup
ORACLE instance started.
Total System Global Area  418484224 bytes
Fixed Size                  1385052 bytes
Variable Size             327159204 bytes
Database Buffers           83886080 bytes
Redo Buffers                6053888 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
ORA-01110: data file 4: 'E:\ORACLE\ORADATA\XIFENFEI\USERS01.DBF'

破坏备份集
破坏前


破坏后


这里很明显,我通过ue把rman备份集中的T修改为了A,肯定破坏了文件,使之出现坏块

rman还原数据文件

C:\Users\XIFENFEI>rman target /
Recovery Manager: Release 11.2.0.3.0 - Production on Thu Jun 6 21:02:41 2013
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
connected to target database: XIFENFEI (DBID=1422012639, not open)
RMAN> restore datafile 4;
Starting restore at 06-JUN-13
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=63 device type=DISK
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00004 to E:\ORACLE\ORADATA\XIFENFEI\USERS
01.DBF
channel ORA_DISK_1: reading from backup piece F:\USERS_BAK.RMAN
channel ORA_DISK_1: ORA-19870: error while restoring backup piece F:\USERS_BAK.R
MAN
ORA-19612: datafile 4 not restored due to missing or corrupt data
failover to previous backup
creating datafile file number=4 name=E:\ORACLE\ORADATA\XIFENFEI\USERS01.DBF
Finished restore at 06-JUN-13

这里可以清晰的看到rman报ORA-19612错误,restore 失败,alert日志为:

Thu Jun 06 21:02:31 2013
ALTER DATABASE OPEN
Errors in file E:\ORACLE\diag\rdbms\xifenfei\xff\trace\xff_dbw0_7400.trc:
ORA-01157: ????/?????? 4 - ??? DBWR ????
ORA-01110: ???? 4: 'E:\ORACLE\ORADATA\XIFENFEI\USERS01.DBF'
ORA-27041: ??????
OSD-04002: unable to open file
O/S-Error: (OS 2) 系统找不到指定的文件。
Errors in file E:\ORACLE\diag\rdbms\xifenfei\xff\trace\xff_ora_4272.trc:
ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
ORA-01110: data file 4: 'E:\ORACLE\ORADATA\XIFENFEI\USERS01.DBF'
ORA-1157 signalled during: ALTER DATABASE OPEN...
Thu Jun 06 21:02:33 2013
Checker run found 1 new persistent data failures
Thu Jun 06 21:03:23 2013
Corrupt block 101 found during reading backup piece, file=F:\USERS_BAK.RMAN, corr_type=3
Reread of blocknum=101, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=101, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=101, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=101, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=101, file=F:\USERS_BAK.RMAN, found same corrupt data
Continuing reading piece F:\USERS_BAK.RMAN, no other copies available.

rman备份集有坏块,导致rman还原无法正常进行下去,还原后的数据文件大小


观察已经正常还原出来数据文件情况

SQL> select CHECKPOINT_CHANGE#,file# from v$datafile_header;
CHECKPOINT_CHANGE#      FILE#
------------------ ----------
           1571582          1
           1571582          2
           1571582          3
             18379          4
           1571582          5
           1571582          6
           1571582          7
SQL> recover database datafile 4 ;
ORA-00274: illegal recovery option DATAFILE
SQL> recover datafile 4;
ORA-00279: change 18379 generated at 01/20/2013 17:13:56 needed for thread 1
ORA-00289: suggestion :
E:\ORACLE\PRODUCT\11.2.0\DBHOME_1\RDBMS\ARC0000000001_0805223583.0001
ORA-00280: change 18379 for thread 1 is in sequence #1
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

rman只是还原了很小的一部分文件,做恢复提示需要从归档日志seq 1开始(某些情况可能需要其他归档,总之不是正常情况),证明rman还原异常

设置event事件还原

SQL> shutdown abort;
ORACLE instance shut down.
SQL> startup pfile='e:/pfile.txt' mount;
ORACLE instance started.
Total System Global Area  418484224 bytes
Fixed Size                  1385052 bytes
Variable Size             327159204 bytes
Database Buffers           83886080 bytes
Redo Buffers                6053888 bytes
Database mounted.
SQL> show parameter event;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
event                                string      19548 trace name context forev
                                                 er, 19549 trace name context f
                                                 orever
Event 19548:This will attempt to restore content of the corrupted block if it is possible.
Event 19549:This will suppress erroring out during restore

rman还原数据文件

RMAN> restore datafile 4;
Starting restore at 06-JUN-13
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=63 device type=DISK
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00004 to E:\ORACLE\ORADATA\XIFENFEI\USERS
01.DBF
channel ORA_DISK_1: reading from backup piece F:\USERS_BAK.RMAN
channel ORA_DISK_1: piece handle=F:\USERS_BAK.RMAN tag=TAG20130606T203154
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:35
Finished restore at 06-JUN-13

这里证明数据库rman有坏块通过rman还原成功,alert日志提示如下

Thu Jun 06 21:29:53 2013
WARNING: The block that appears to be block number 100
         in file 4 is corrupt in backup piece F:\USERS_BAK.RMAN.
         Such blocks would usually be formatted as empty
         in the restored file, but event 19548 has been
         set to include the block as-is in the restored
         file.
Corrupt block 102 found during reading backup piece, file=F:\USERS_BAK.RMAN, corr_type=-2
Reread of blocknum=102, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=102, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=102, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=102, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=102, file=F:\USERS_BAK.RMAN, found same corrupt data
Continuing reading piece F:\USERS_BAK.RMAN, no other copies available.
…………
Corrupt block 258 found during reading backup piece, file=F:\USERS_BAK.RMAN, corr_type=-2
Reread of blocknum=258, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=258, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=258, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=258, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=258, file=F:\USERS_BAK.RMAN, found same corrupt data
Continuing reading piece F:\USERS_BAK.RMAN, no other copies available.
WARNING: some data in the backup of file 4 was missing
         or corrupt.  Event 19549 has been set to allow
         the file to be restored anyway.
           backup header block count: 5369
           backup actual block count: 5212
              backup header checksum: -218250743
              backup actual checksum: 1442665538
Full restore complete of datafile 4 E:\ORACLE\ORADATA\XIFENFEI\USERS01.DBF.  Elapsed time: 0:00:25
  checkpoint is 1570136
  last deallocation scn is 1508457

这里rman还原依然遇到很多坏块,但是均跳过坏块,还是完整的恢复出来的数据文件(大小)


rman还原数据文件

RMAN> recover datafile 4;
Starting recover at 06-JUN-13
using channel ORA_DISK_1
starting media recovery
archived log for thread 1 with sequence 94 is already on disk as file E:\ORACLE\
PRODUCT\11.2.0\DBHOME_1\RDBMS\ARC0000000094_0805223583.0001
archived log for thread 1 with sequence 95 is already on disk as file E:\ORACLE\
PRODUCT\11.2.0\DBHOME_1\RDBMS\ARC0000000095_0805223583.0001
archived log for thread 1 with sequence 96 is already on disk as file E:\ORACLE\
PRODUCT\11.2.0\DBHOME_1\RDBMS\ARC0000000096_0805223583.0001
archived log file name=E:\ORACLE\PRODUCT\11.2.0\DBHOME_1\RDBMS\ARC0000000094_080
5223583.0001 thread=1 sequence=94
media recovery complete, elapsed time: 00:00:00
Finished recover at 06-JUN-13

这里可以明显的看到在recover过程中数据库应用的是备份后的所有归档,数据文件是正常被还原出来(坏块除外)

查询对象

SQL> alter database open;
Database altered.
SQL> conn test/test
Connected.
SQL> select * from tab;
TNAME                          TABTYPE  CLUSTERID
------------------------------ ------- ----------
STB101                         TABLE
SQL> select count(*) from stb101;
select count(*) from stb101
                     *
ERROR at line 1:
ORA-08103: object no longer exists

dbv检查坏块

e:\oracle\oradata\XIFENFEI>dbv file=USERS01.DBF
DBVERIFY: Release 11.2.0.3.0 - Production on Thu Jun 6 23:59:49 2013
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
DBVERIFY - Verification starting : FILE = E:\ORACLE\ORADATA\XIFENFEI\USERS01.DBF
Page 100 is marked corrupt
Corrupt block relative dba: 0x01000064 (file 4, block 100)
Bad check value found during dbv:
Data in bad block:
 type: 30 format: 2 rdba: 0x01000064
 last change scn: 0x0000.00004890 seq: 0x1 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x48901e01
 check value in block header: 0x8311
 computed block checksum: 0x20
DBVERIFY - Verification complete
Total Pages Examined         : 12320
Total Pages Processed (Data) : 4952
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 7069
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 298

证明设置了event之后,rman确实跳过了备份集中的坏块,而且是直接还原了坏块内容,证明了event 19548和19549作用

补充说明
在非特殊情况下强烈不建议设置相关event跳过rman中的坏块来还原/恢复数据库,这样将对数据的丢失,甚至数据库是否可以正常open不好评估,rman备份重要,确保rman备份可用也很重要.

记录因磁盘头被重写,抢救redo恢复经历

客户使用赛门铁克做同城异地容灾部署extent rac,因某种情况导致主备容灾不同步,然后在主库中进行了若干操作,导致主库所有裸设备丢失,然后进行了一些列的操作,主库识别了裸设备,但是oracle出现异常
数据库裸设备异常

Fri May 31 22:07:39 2013
ORA-00202: control file: '/dev/rcontrol2'
ORA-27047: unable to read the header block of file
Additional information: 2
ORA-205 signalled during: ALTER DATABASE   MOUNT...

使用备份还原控制文件后,查询数据文件头v$datafile_header.error全部为”WRONG FILE TYPE”,使用bbed去查看,10个block以内全部是0,证明数据库文件头也损坏。因为客户的数据库虽然有rman备份,但涉及到memory,对redo的信息也很敏感,所以希望能够在他们当前的情况下评估redo是否可以应用,确保他们的数据不丢失.验证文件头

DATA FILE #1:
  (name #4) /dev/rsystem
creation size=128000 block size=8192 status=0xe head=4 tail=4 dup=1
 tablespace 0, index=1 krfil=1 prev_file=0
 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
 Checkpoint cnt:22469 scn: 0x0000.7b4f9d86 05/29/2013 22:09:50
 Stop scn: 0xffff.ffffffff 05/15/2013 00:08:31
 Creation Checkpointed at scn:  0x0000.00000009 05/20/2007 21:52:41
 thread:1 rba:(0x1.3.10)
 enabled  threads:  01000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000
 Offline scn: 0x0000.00000000 prev_range: 0
 Online Checkpointed at scn:  0x0000.00000000
 thread:0 rba:(0x0.0.0)
 enabled  threads:  00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000
 Hot Backup end marker scn: 0x0000.00000000
 aux_file is NOT DEFINED
File header version cannot be determined due to corruption
Dump may be suspect
 V10 STYLE FILE HEADER:
	Software vsn=0=0x0, Compatibility Vsn=0=0x0
	Db ID=0=0x0, Db Name=''
	Activation ID=0=0x0
	Control Seq=0=0x0, File size=0=0x0
	File Number=0, Blksiz=0, File Type=0 UNKNOWN
Tablespace #0 -   rel_fn:0
Creation   at   scn: 0x0000.00000000 01/01/1988 00:00:00
Backup taken at scn: 0x0000.00000000 01/01/1988 00:00:00 thread:0
 reset logs count:0x0 scn: 0x0000.00000000 reset logs terminal rcv data:0x0 scn: 0x0000.00000000
 prev reset logs count:0x0 scn: 0x0000.00000000 prev reset logs terminal rcv data:0x0 scn: 0x0000.00000000
 recovered at 01/01/1988 00:00:00
 status:0x0 root dba:0x00000000 chkpt cnt: 0 ctl cnt:0
begin-hot-backup file size: 0
Checkpointed at scn:  0x0000.00000000
 thread:0 rba:(0x0.0.0)
 enabled  threads:  00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000
Backup Checkpointed at scn:  0x0000.00000000
 thread:0 rba:(0x0.0.0)
 enabled  threads:  00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000
External cache id: 0x0 0x0 0x0 0x0
Absolute fuzzy scn: 0x0000.00000000
Recovery fuzzy scn: 0x0000.00000000 01/01/1988 00:00:00
Terminal Recovery Stamp  01/01/1988 00:00:00

这里可以看出来文件头完全损坏,dump redo header

LOG FILE #1:
  (name #1) /dev/rredo11
 Thread 1 redo log links: forward: 2 backward: 0
 siz: 0x3c000 seq: 0x00003f9c hws: 0x2 bsz: 512 nab: 0x1763c flg: 0x1 dup: 1
 Archive links: fwrd: 0 back: 0 Prev scn: 0x0000.7b458379
 Low scn: 0x0000.7b4583ad 05/29/2013 01:32:39
 Next scn: 0x0000.7b4fa0e5 05/29/2013 22:17:59
 FILE HEADER:
	Software vsn=0=0x0, Compatibility Vsn=0=0x0
	Db ID=0=0x0, Db Name=''
	Activation ID=0=0x0
	Control Seq=0=0x0, File size=0=0x0
	File Number=0, Blksiz=0, File Type=0 UNKNOWN
 descrip:""
 thread: 0 nab: 0x0 seq: 0x00000000 hws: 0x0 eot: 0 dis: 0
 reset logs count: 0x0 scn: 0x0000.00000000
 Low scn: 0x0000.00000000 01/01/1988 00:00:00
 Next scn: 0x0000.00000000 01/01/1988 00:00:00
 Enabled scn: 0x0000.00000000 01/01/1988 00:00:00
 Thread closed scn: 0x0000.00000000 01/01/1988 00:00:00
 Log format vsn: 0x0 Disk cksum: 0x0 Calc cksum: 0x0
 Terminal Recovery Stop scn: 0x0000.00000000
 Terminal Recovery Stamp  01/01/1988 00:00:00
 Most recent redo scn: 0x0000.00000000
 Largest LWN: 0 blocks
 Miscellaneous flags: 0x0
 Thread internal enable indicator: thr: 0, seq: 0 scn: 0x0000.00000000

验证redo header已经异常,dump redo logfile全部提示文件头错误


现在情况已经很明显,客户的库因为使用了裸设备,online 磁盘的过程中,所有的裸设备卷已经重写了文件头,oracle的datafile header信息不在了,无法正常操作完成.我们决定使用rman备份来恢复该数据库,然后想办法处理redo

注册带库备份集
在rman恢复过程中,我们遇到一个问题,客户的库rman备份策略有问题,一周一个全备,每天一个增量备份,一次归档备份,最后一次增量备份后备份了控制文件,但是最后一次归档备份之后无控制文件,而且是归档的备份发生在增量备份之后,因为是使用了带库无catalog库,我们增量恢复之后,数据不一致需要归档,但是归档,而归档的备份未记录在还原出来的控制文件中,需要人工注册带库的备份集到控制文件中

--涉及到3个节点都有配置不同的NB_ORA_CLIENT=pysa,如果在同一个节点中还原归档日志,需要配置如下
CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' PARMS  'ENV=(NB_ORA_CLIENT=pysa)';
--如果在默认节点直接分配sbt通道即可
configure default device type to sbt_tape;
--注册带库备份集
catalog device type 'sbt_tape' backuppiece 'al_21395_1_816744765';

分析redo
通过ue打开dd出来的redo文件,我们分析得到20000h(10*8192)全部为0,应该是和赛门铁克存储管理系统有关系,后面开始是aix的设备头信息

正常redo文件信息

该库redo信息对比(获得aix偏移量)

对比正常redo起点信息和经验我们定位到aix的设备头偏移量为1000h(4096),整体偏移量为21000h(10*8192+4096)
该库的redo起点为21000h,也就是说,我们需要执行的dd语句为类似语句(redo 大小为120M)

dd if=/dev/vx/rdsk/dg/redo31 bs=512 skip=264 count=245761 of=/arch/xifenfei/redo31

dd出来所有数据后,对先要已经应用了归档的库,继续尝试recover redo

SQL> recover database using backup controlfile;
ORA-00279: change 2069348436 generated at 05/30/2013 16:18:08 needed for thread
1
ORA-00289: suggestion : /arch/1_16289_623109141.dbf
ORA-00280: change 2069348436 for thread 1 is in sequence #16289
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/arch/xifenfei/redo12
ORA-00279: change 2069348436 generated at 05/30/2013 16:18:08 needed for thread
2
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/arch/xifenfei/redo23
ORA-00279: change 2069348436 generated at 05/30/2013 16:18:07 needed for thread
3
ORA-00289: suggestion : /arch/3_3898_623109141.dbf
ORA-00280: change 2069348436 for thread 3 is in sequence #3898
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/arch/xifenfei/redo31
ORA-00279: change 2069455373 generated at 05/30/2013 20:03:26 needed for thread
1
ORA-00289: suggestion : /arch/1_16290_623109141.dbf
ORA-00280: change 2069455373 for thread 1 is in sequence #16290
ORA-00278: log file '/arch/xifenfei/redo12' no longer needed for this recovery
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/arch/xifenfei/redo11
ORA-00279: change 2069475956 generated at 05/30/2013 20:04:09 needed for thread
3
ORA-00289: suggestion : /arch/3_3899_623109141.dbf
ORA-00280: change 2069475956 for thread 3 is in sequence #3899
ORA-00278: log file '/arch/xifenfei/redo31' no longer needed for this recovery
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/arch/xifenfei/redo33
Log applied.
Media recovery complete.
SQL> alter database open resetlogs;
Database altered.

到这一步我们完整的通过dd跳过了由于赛门铁克管理磁盘导致的磁盘头损坏的块,从裸设备中复制出redo到文件系统,然后进行恢复,完整的抢救了客户的数据,减少了客户的损坏.这里温馨提示对于非常重要的系统(涉及钱),强烈建议redo多路冗余,光依靠存储容灾,备份,dg依然不够

通过基表获取segment header block

数据库不能open的时候,可以通过dul挖取相关基表(user$,obj$,ts$,tab$,seg$,file$),从而来获得segment header信息,然后通过dump该block,结合shell脚本获得extents分布脚本来获得extent分布

   SELECT NVL (u.name, 'SYS'),
          o.name,
          o.subname,
          so.object_type,
          s.type#,
          DECODE (BITAND (s.spare1, 2097408),
                  2097152, 'SECUREFILE',
                  256, 'ASSM',
                  'MSSM'),
          ts.ts#,
          ts.name,
          ts.blocksize,
          f.file#,
          s.block#,
          s.blocks * ts.blocksize,
          s.blocks,
          s.extents,
          s.iniexts * ts.blocksize,
          s.extsize * ts.blocksize,
          s.minexts,
          s.maxexts,
          DECODE (BITAND (s.spare1, 4194304), 4194304, bitmapranges, NULL),
          TO_CHAR (
             DECODE (
                BITAND (s.spare1, 2097152),
                2097152, DECODE (s.lists,
                                 0, 'NONE',
                                 1, 'AUTO',
                                 2, 'MIN',
                                 3, 'MAX',
                                 4, 'DEFAULT',
                                 'INVALID'),
                NULL)),
          DECODE (BITAND (s.spare1, 2097152), 2097152, s.groups, NULL),
          DECODE (BITAND (ts.flags, 3), 1, TO_NUMBER (NULL), s.extpct),
          DECODE (BITAND (ts.flags, 32),
                  32, TO_NUMBER (NULL),
                  DECODE (s.lists, 0, 1, s.lists)),
          DECODE (BITAND (ts.flags, 32),
                  32, TO_NUMBER (NULL),
                  DECODE (s.groups, 0, 1, s.groups)),
          s.file#,
          BITAND (s.cachehint, 3),
          BITAND (s.cachehint, 12) / 4,
          BITAND (s.cachehint, 48) / 16,
          NVL (s.spare1, 0),
          o.dataobj#
     FROM chf.user$ u,
          chf.obj$ o,
          chf.ts$ ts,
          ( SELECT DECODE (BITAND (t.property, 8192), 8192, 'NESTED TABLE', 'TABLE') OBJECT_TYPE,
          2 OBJECT_TYPE_ID,
          5 SEGMENT_TYPE_ID,
          t.obj# OBJECT_ID,
          t.file# HEADER_FILE,
          t.block# HEADER_BLOCK,
          t.ts# TS_NUMBER
     FROM chf.tab$ t) so,
          chf.seg$ s,
          chf.file$ f
    WHERE     s.file# = so.header_file
          AND s.block# = so.header_block
          AND s.ts# = so.ts_number
          AND s.ts# = ts.ts#
          AND o.obj# = so.object_id
          AND o.owner# = u.user#(+)
          AND s.type# = so.segment_type_id
          AND o.type# = so.object_type_id
          AND s.ts# = f.ts#
          AND s.file# = f.relfile#
          and o.name in('XIFENFEI','T_XIFENFEI');

shell脚本获得extents分布

比较深入看过dba_extents视图的朋友都知道,它得到extent的信息不是通过普通的存储在数据库中的基表获得,而是x$相关的表获得(x$表是数据库启动时候在内存中创建,不存在数据文件中),因为当数据库未正常启动,我们无法直接确定某个block是否在某个对象中.其实关于extent的信息都已经记录在了segment header的block中,通过dump该block记录的rdba信息,未转化为file_id和block_id,这里写shell脚本实现把segment header dump 内容转化为类似dba_extents记录,方便在某些不能open的库中分析某个异常block是否属于某个表

#! /bin/bash
dec2bin(){
  val_16=$1
  ((num=$val_16));
  val=`echo $num`
  local base=$2
  [ $val -eq 0 ] && bin=0
if [ $val -ge $base ]; then
    dec2bin $val $((base*2))
    if [ $val -ge $base ]; then
      val=$(($val-$base))
      bin=${bin}1
    else
      bin=${bin}0
    fi
  fi
  [ $base -eq 1 ] && printf  $bin
}
for i in `grep "length:" $1 |awk '{print $1 $3}'`;
do
rdba=`echo ${i:0:10}`
blocks=`echo ${i:10}`
echo -n "rdba:"$rdba"    "
bin2=`dec2bin $rdba  1`
len=`expr length $bin2`
len_gd=22
len_jg=`expr $len - $len_gd`
file_no_2=`echo ${bin2:0:$len_jg}`
((file_no=2#$file_no_2))
echo -n "file_id:"$file_no"    "
block_no_2=`echo ${bin2:$len_jg}`
((block_no=2#$block_no_2))
echo -n "block_id:"$block_no"    "
echo  "blocks:"$blocks
done;

trace文件中部分信息

  -----------------------------------------------------------------
   0x00400901  length: 7
   0x00402e10  length: 8
   0x00402e60  length: 8
   0x00402e68  length: 8
   0x00402ea0  length: 8
   0x00402f20  length: 8
   0x00402f48  length: 8
   0x00403050  length: 8
   0x00403180  length: 8
   0x00403b38  length: 8
   0x00404c48  length: 8
   0x00404c78  length: 8
   0x00404cf8  length: 8
   0x00404da8  length: 8
   0x00404db8  length: 8
   0x00404de8  length: 8
   0x00404e80  length: 128
   0x00405900  length: 128
   0x00406500  length: 128
   0x00406980  length: 128
   0x00407480  length: 128
   0x00407500  length: 128
   0x00407680  length: 128
   0x00407800  length: 128
   0x00407880  length: 128
   0x00407a00  length: 128
   0x00407a80  length: 128
   0x00407c80  length: 128
…………

执行结果

[oracle@xifenfei tmp]$ ./get_extent.sh /u01/app/oracle/diag/rdbms/cdb/cdb/trace/cdb_ora_29565.trc
rdba:0x00400901    file_id:1    block_id:2305     blocks:7
rdba:0x00402e10    file_id:1    block_id:11792     blocks:8
rdba:0x00402e60    file_id:1    block_id:11872     blocks:8
rdba:0x00402e68    file_id:1    block_id:11880     blocks:8
rdba:0x00402ea0    file_id:1    block_id:11936     blocks:8
rdba:0x00402f20    file_id:1    block_id:12064     blocks:8
rdba:0x00402f48    file_id:1    block_id:12104     blocks:8
rdba:0x00403050    file_id:1    block_id:12368     blocks:8
rdba:0x00403180    file_id:1    block_id:12672     blocks:8
rdba:0x00403b38    file_id:1    block_id:15160     blocks:8
rdba:0x00404c48    file_id:1    block_id:19528     blocks:8
rdba:0x00404c78    file_id:1    block_id:19576     blocks:8
rdba:0x00404cf8    file_id:1    block_id:19704     blocks:8
rdba:0x00404da8    file_id:1    block_id:19880     blocks:8
rdba:0x00404db8    file_id:1    block_id:19896     blocks:8
rdba:0x00404de8    file_id:1    block_id:19944     blocks:8
rdba:0x00404e80    file_id:1    block_id:20096     blocks:128
rdba:0x00405900    file_id:1    block_id:22784     blocks:128
rdba:0x00406500    file_id:1    block_id:25856     blocks:128
rdba:0x00406980    file_id:1    block_id:27008     blocks:128
rdba:0x00407480    file_id:1    block_id:29824     blocks:128
rdba:0x00407500    file_id:1    block_id:29952     blocks:128
rdba:0x00407680    file_id:1    block_id:30336     blocks:128
rdba:0x00407800    file_id:1    block_id:30720     blocks:128
…………

记录一次ORA-600 4000数据库故障恢复

ORA-600[4000]错误
一朋友数据库因为当前redo丢失,在恢复的过程中启动报ORA-600[4000]错误

SMON: enabling cache recovery
Thu May 30 16:24:17 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/udump/xifenfei1_ora_1458370.trc:
ORA-00600: internal error code, arguments: [4000], [83], [], [], [], [], [], []
Thu May 30 16:24:19 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/udump/xifenfei1_ora_1458370.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [83], [], [], [], [], [], []
Thu May 30 16:24:19 2013
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 1458370
ORA-1092 signalled during: alter database open resetlogs...

分析trace文件

*** 2013-05-30 16:24:17.979
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4000], [83], [], [], [], [], [], []
Current SQL statement for this session:
select ctime, mtime, stime from obj$ where obj# = :1
--确定是obj$对象异常,通过某种手段找到obj$的objid和dataobjid均为16,对应16进制为12
Block header dump:  0x0040007a
 Object id on Block? Y
 seg/obj: 0x12  csc: 0xc1e.a329e76f  itc: 1  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0053.02a.000598bd  0x0d407e46.4f52.2f  --U-    1  fsc 0x0000.a329e772

这里比较明显obj$对象在rdba为0040007a的block上,scn为0c1e.a329e76f(13325725984623)且未提交的事务,这样的现象就决定了处理的特殊性(不是因为块延迟清理导致访问undo现象,该现象直接推进scn解决,而该情况不行)

数据文件头scn

SQL> SELECT DISTINCT CHECKPOINT# FROM V$DATAFILE_HEADER;
    CHECKPOINT_CHANGE#
-------------------------
           13324676536960

bbed查看文件头scn

   struct kcvfhckp, 160 bytes               @484
      struct kcvcpscn, 8 bytes              @484
         ub4 kscnbas                        @484      0x649c9a80
         ub2 kscnwrp                        @488      0x0c1e

这里看到的文件头scn也是为13324676536960(0c1e.649c9a80)和sql查询结果一致,也就是说数据库中的obj$的某个对象含有事务,且scn大于文件头scn(因为当前redo丢失,无法前滚,所以出现该情况),当数据库访问obj$的时候,为了事务的一致性,就需要访问undo(这里提示为83 回滚段),而undo异常,所以smon进程回滚失败,数据库无法正常启动

使用bbed提交事务

BBED> map
 File: /oradata/sys/xifenfei/system01.dbf (1)
 Block: 122                                   Dba:0x0040007a
------------------------------------------------------------
 KTB Data Block (Table/Cluster)
 struct kcbh, 20 bytes                      @0
 struct ktbbh, 48 bytes                     @20
 struct kdbh, 14 bytes                      @68
 struct kdbt[1], 4 bytes                    @82
 sb2 kdbr[108]                              @86
 ub1 freespace[802]                         @302
 ub1 rowdata[7084]                          @1104
 ub4 tailchk                                @8188
BBED> p ktbbh
struct ktbbh, 48 bytes                      @20
   ub1 ktbbhtyp                             @20       0x01 (KDDBTDATA)
   union ktbbhsid, 4 bytes                  @24
      ub4 ktbbhsg1                          @24       0x00000012
      ub4 ktbbhod1                          @24       0x00000012
   struct ktbbhcsc, 8 bytes                 @28
      ub4 kscnbas                           @28       0xa329e76f
      ub2 kscnwrp                           @32       0x0c1e
   b2 ktbbhict                              @36       1
   ub1 ktbbhflg                             @38       0x02 (NONE)
   ub1 ktbbhfsl                             @39       0x00
   ub4 ktbbhfnx                             @40       0x00000000
   struct ktbbhitl[0], 24 bytes             @44
      struct ktbitxid, 8 bytes              @44
         ub2 kxidusn                        @44       0x0053
         ub2 kxidslt                        @46       0x002a
         ub4 kxidsqn                        @48       0x000598bd
      struct ktbituba, 8 bytes              @52
         ub4 kubadba                        @52       0x0d407e46
         ub2 kubaseq                        @56       0x4f52
         ub1 kubarec                        @58       0x2f
      ub2 ktbitflg                          @60       0x2001 (KTBFUPB)<--需要提交
      union _ktbitun, 2 bytes               @62
         b2 _ktbitfsc                       @62       0
         ub2 _ktbitwrp                      @62       0x0000
      ub4 ktbitbas                          @64       0xa329e772
BBED> set count 32
        COUNT           32
BBED> set offset 60
        OFFSET          60
BBED> d
 File: /oradata/sys/xifenfei/system01.dbf (1)
 Block: 122              Offsets:   60 to   91           Dba:0x0040007a
------------------------------------------------------------------------
 20010000 a329e772 0001006c ffff00ea 040c0368 03680000 006c1f7c 1f3c1efb
 <32 bytes per line>
BBED> m /x 8001
 File: /oradata/sys/xifenfei/system01.dbf (1)
 Block: 122              Offsets:   60 to   91           Dba:0x0040007a
------------------------------------------------------------------------
 80010000 a329e772 0001006c ffff00ea 040c0368 03680000 006c1f7c 1f3c1efb
 <32 bytes per line>
BBED> sum apply
Check value for File 1, Block 122:
current = 0xafd6, required = 0xafd6

尝试open数据库ORA-600[2662]解决

Thu May 30 21:16:00 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/bdump/xifenfei1_smon_819664.trc:
ORA-00600: internal error code, arguments:[2662],[3102],[2737532996],[3102],[2745973074],[4194397],[],[]
Non-fatal internal error happenned while SMON was doing non-existent object cleanup.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Thu May 30 21:16:01 2013
Trace dumping is performing id=[cdmp_20130530211601]
Thu May 30 21:16:02 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/bdump/xifenfei1_smon_819664.trc:
ORA-00600: internal error code, arguments:[2662],[3102],[2737532997],[3102],[2745973074],[4194397],[],[]
Thu May 30 21:16:03 2013
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.
SMON encountered 2 out of maximum 100 non-fatal internal errors.
Thu May 30 21:16:05 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/bdump/xifenfei1_smon_819664.trc:
ORA-00600: internal error code, arguments:[2662],[3102],[2737532997],[3102],[2745973074],[4194397],[],[]
Thu May 30 21:16:08 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/bdump/xifenfei1_pmon_958764.trc:
ORA-00474: SMON process terminated with error
Thu May 30 21:16:08 2013
PMON: terminating instance due to error 474

数据库在open过程中遇到大量ORA-00600[2662],这个是因为数据库中文件头的scn小于访问的数据块scn导致该问题,解决方法推荐scn,如果数据库的scn本身就很大(和时间理论scn较接近),推进过程中可能遇到如下错误,这个时候就需要选择合适的方法/合适的值来推进scn

SQL> startup pfile=/home/oracle/pfile force
ORACLE instance started.
Total System Global Area 5.5835E+10 bytes
Fixed Size                  2177056 bytes
Variable Size            3.2867E+10 bytes
Database Buffers         2.2951E+10 bytes
Redo Buffers               14598144 bytes
Database mounted.
ORA-01052: required destination LOG_ARCHIVE_DUPLEX_DEST is not specified

后面的工作因为没有redo前滚,而且该库故障时有大量事务在跑,现在无法前滚,导致大量的undo回滚段异常,index和data不一致等故障,需要做的就是屏蔽undo seg,重建undo,重建库

Viewing Information About CDBs and PDBs

在ORACLE 12C中引入了CDB和PDB的概念,实现了ORACLE数据库的可插拔,在一个CDB数据库中,有多个PDB,而每一个PDB又可以理解为一个独立的传统ORACLE 数据库,那为了能够通过一个sql查询获得整个CDB数据库的信息,ORALCE 引入了CDB_*开头的视图,该视图就是在传统的DBA_*视图基础之上增加了CON_ID,用来区分不同的PDB,从而实现了一个简单sql查询在有足够权限的情况下,可以查询所有PDB中信息
ORACLE 12C版本

SQL> select * from v$version;
BANNER                                                                               CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production              0
PL/SQL Release 12.1.0.1.0 - Production                                                    0
CORE    12.1.0.1.0      Production                                                        0
TNS for Linux: Version 12.1.0.1.0 - Production                                            0
NLSRTL Version 12.1.0.1.0 - Production                                                    0

确定是否是CDB

SQL> SELECT CDB FROM V$DATABASE;
CDB
---
YES

YES表示该数据库是CDB,如果是NO表示是NO-CDB(普通数据库)

CDB中各容器信息

SQL> COLUMN NAME FORMAT A8
SQL> SELECT NAME, CON_ID, DBID, CON_UID, GUID FROM V$CONTAINERS ORDER BY CON_ID;
NAME         CON_ID       DBID    CON_UID GUID
-------- ---------- ---------- ---------- --------------------------------
CDB$ROOT          1 1922813718          1 DB85A3D39F8E7703E0431CAAE80A8C44
PDB$SEED          2 4048821679 4048821679 D49B80694E4A449BE0430100007F906F
PDB1              3 3313918585 3313918585 D49BF37938FB4C10E0430100007F6CBD
PDB2              4 3872456618 3872456618 D49BFA33332F4C3EE0430100007FA059
SQL> select CON_NAME_TO_ID('PDB1') FROM DUAL;
CON_NAME_TO_ID('PDB1')
----------------------
                     3
SQL> SELECT CON_DBID_TO_ID(3313918585) FROM DUAL;
CON_DBID_TO_ID(3313918585)
--------------------------
                         3
SQL> SELECT CON_UID_TO_ID(3313918585) FROM DUAL;
CON_UID_TO_ID(3313918585)
-------------------------
                        3

PDB部分信息

SQL> COLUMN PDB_NAME FORMAT A15
SQL> SELECT PDB_ID, PDB_NAME, STATUS FROM DBA_PDBS ORDER BY PDB_ID;
    PDB_ID PDB_NAME        STATUS
---------- --------------- -------------
         2 PDB$SEED        NORMAL
         3 PDB1            NORMAL
         4 PDB2            NORMAL
SQL> COLUMN NAME FORMAT A15
SQL> COLUMN RESTRICTED FORMAT A10
SQL> COLUMN OPEN_TIME FORMAT A30
SQL> SELECT NAME, OPEN_MODE, RESTRICTED, OPEN_TIME FROM V$PDBS;
NAME            OPEN_MODE  RESTRICTED OPEN_TIME
--------------- ---------- ---------- ------------------------------
PDB$SEED        READ ONLY  NO         12-MAY-13 08.51.53.177 AM
PDB1            READ WRITE NO         12-MAY-13 09.35.48.646 AM
PDB2            MOUNTED               12-MAY-13 08.56.59.859 AM

CDB中查询对象信息

SQL> COLUMN PDB_NAME FORMAT A15
SQL> COLUMN OWNER FORMAT A15
SQL> COLUMN TABLE_NAME FORMAT A30
SQL> SELECT p.PDB_ID, p.PDB_NAME, t.OWNER, t.TABLE_NAME
  2  FROM DBA_PDBS p, CDB_TABLES t
  3  WHERE p.PDB_ID > 2
  4  AND T.TABLE_NAME='COL$'
  5  AND p.PDB_ID = t.CON_ID
  6  ORDER BY p.PDB_ID;
    PDB_ID PDB_NAME        OWNER           TABLE_NAME
---------- --------------- --------------- ------------------------------
         3 PDB1            SYS             COL$
         4 PDB2            SYS             COL$

查询在CDB中的PDB数据/临时文件信息

SQL> COLUMN PDB_ID FORMAT 999
SQL> COLUMN PDB_NAME FORMAT A8
SQL> COLUMN FILE_ID FORMAT 9999
SQL> COLUMN TABLESPACE_NAME FORMAT A10
SQL> COLUMN FILE_NAME FORMAT A45
SQL> SELECT p.PDB_ID, p.PDB_NAME, d.FILE_ID, d.TABLESPACE_NAME, d.FILE_NAME
  2  FROM DBA_PDBS p, CDB_DATA_FILES d
  3  WHERE p.PDB_ID = d.CON_ID
  4  ORDER BY p.PDB_ID;
PDB_ID PDB_NAME FILE_ID TABLESPACE FILE_NAME
------ -------- ------- ---------- ---------------------------------------------
     2 PDB$SEED       5 SYSTEM     +DATA/pdb$seed_system01.dbf
     2 PDB$SEED       7 SYSAUX     +DATA/pdb$seed_sysaux01.dbf
     3 PDB1           9 SYSAUX     +DATA/pdb1_pdb$seed_sysaux01.dbf
     3 PDB1          10 USERS      +DATA/cdb/pdb1_users01.dbf
     3 PDB1           8 SYSTEM     +DATA/pdb1_pdb$seed_system01.dbf
     4 PDB2          13 USERS      +DATA/cdb/pdb2_users01.dbf
     4 PDB2          12 SYSAUX     +DATA/pdb2_pdb$seed_sysaux01.dbf
     4 PDB2          11 SYSTEM     +DATA/pdb2_pdb$seed_system01.dbf
8 rows selected.
SQL> COLUMN CON_ID FORMAT 999
SQL> COLUMN FILE_ID FORMAT 9999
SQL> COLUMN TABLESPACE_NAME FORMAT A15
SQL> COLUMN FILE_NAME FORMAT A45
SQL> SELECT CON_ID, FILE_ID, TABLESPACE_NAME, FILE_NAME
  2  FROM CDB_TEMP_FILES
  3  ORDER BY CON_ID;
CON_ID FILE_ID TABLESPACE_NAME FILE_NAME
------ ------- --------------- ---------------------------------------------
     1       1 TEMP            +DATA/cdb/temp01.dbf
     2       2 TEMP            +DATA/pdbseed_temp01.dbf
     3       3 TEMP            +DATA/pdb1_temp01.dbf
     4       4 TEMP            +DATA/pdb2_temp01.dbf

查询PDB的service信息

SQL> COLUMN NETWORK_NAME FORMAT A30
SQL> COLUMN PDB FORMAT A15
SQL> COLUMN CON_ID FORMAT 999
SQL> SELECT PDB, NETWORK_NAME, CON_ID FROM CDB_SERVICES
  2  WHERE PDB IS NOT NULL AND
  3  CON_ID > 2
  4  ORDER BY PDB;
PDB             NETWORK_NAME                   CON_ID
--------------- ------------------------------ ------
PDB1            pdb1                                3
PDB2            pdb2                                4

PDB中可以修改参数

SELECT NAME FROM V$SYSTEM_PARAMETER
WHERE ISPDB_MODIFIABLE = 'TRUE'
ORDER BY NAME;

查看PDB历史信息

SQL> COLUMN DB_NAME FORMAT A10
SQL> COLUMN CON_ID FORMAT 999
SQL> COLUMN PDB_NAME FORMAT A15
SQL> COLUMN OPERATION FORMAT A16
SQL> COLUMN OP_TIMESTAMP FORMAT A10
SQL> COLUMN CLONED_FROM_PDB_NAME FORMAT A15
SQL> SELECT DB_NAME, CON_ID, PDB_NAME, OPERATION, OP_TIMESTAMP, CLONED_FROM_PDB_NAME
  2  FROM CDB_PDB_HISTORY
  3  WHERE CON_ID > 2
  4  ORDER BY CON_ID;
DB_NAME    CON_ID PDB_NAME        OPERATION        OP_TIMESTA CLONED_FROM_PDB
---------- ------ --------------- ---------------- ---------- ---------------
SEEDDATA        3 PDB$SEED        UNPLUG           29-APR-13
CDB             3 PDB1            CREATE           01-FEB-13  PDB$SEED
CDB             3 PDB$SEED        PLUG             01-FEB-13  PDB$SEED
SEEDDATA        4 PDB$SEED        UNPLUG           29-APR-13
CDB             4 PDB2            CREATE           01-FEB-13  PDB$SEED
CDB             4 PDB$SEED        PLUG             01-FEB-13  PDB$SEED
6 rows selected.

ORACLE 12C PDB 维护基础介绍

CDB和PDB是ORACLE 12C一个很亮的新特性,由于他们的引入导致传统的ORACLE数据库管理理念不少发生了改变,这里列举了部分最基本的cdb和pdb管理方式
cdb和pdb关系图

ORACLE 12C版本

SQL> select * from v$version;
BANNER                                                                               CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production              0
PL/SQL Release 12.1.0.1.0 - Production                                                    0
CORE    12.1.0.1.0      Production                                                        0
TNS for Linux: Version 12.1.0.1.0 - Production                                            0
NLSRTL Version 12.1.0.1.0 - Production                                                    0

启动关闭pdb

SQL> startup
ORACLE instance started.
Total System Global Area  597098496 bytes
Fixed Size                  2291072 bytes
Variable Size             272632448 bytes
Database Buffers          314572800 bytes
Redo Buffers                7602176 bytes
Database mounted.
Database opened.
SQL> select con_id,dbid,NAME,OPEN_MODE from v$pdbs;
    CON_ID       DBID NAME                           OPEN_MODE
---------- ---------- ------------------------------ ----------
         2 4048821679 PDB$SEED                       READ ONLY
         3 3313918585 PDB1                           MOUNTED
         4 3872456618 PDB2                           MOUNTED
SQL> alter PLUGGABLE database pdb1 open;
Pluggable database altered.
SQL> select con_id,dbid,NAME,OPEN_MODE from v$pdbs;
    CON_ID       DBID NAME                           OPEN_MODE
---------- ---------- ------------------------------ ----------
         2 4048821679 PDB$SEED                       READ ONLY
         3 3313918585 PDB1                           READ WRITE
         4 3872456618 PDB2                           MOUNTED
SQL> alter PLUGGABLE database pdb1 close;
Pluggable database altered.
SQL> select con_id,dbid,NAME,OPEN_MODE from v$pdbs;
    CON_ID       DBID NAME                           OPEN_MODE
---------- ---------- ------------------------------ ----------
         2 4048821679 PDB$SEED                       READ ONLY
         3 3313918585 PDB1                           MOUNTED
         4 3872456618 PDB2                           MOUNTED
SQL> alter PLUGGABLE database all open;
Pluggable database altered.
SQL> select con_id,dbid,NAME,OPEN_MODE from v$pdbs;
    CON_ID       DBID NAME                           OPEN_MODE
---------- ---------- ------------------------------ ----------
         2 4048821679 PDB$SEED                       READ ONLY
         3 3313918585 PDB1                           READ WRITE
         4 3872456618 PDB2                           READ WRITE
SQL> alter PLUGGABLE database all close;
Pluggable database altered.
SQL> select con_id,dbid,NAME,OPEN_MODE from v$pdbs;
    CON_ID       DBID NAME                           OPEN_MODE
---------- ---------- ------------------------------ ----------
         2 4048821679 PDB$SEED                       READ ONLY
         3 3313918585 PDB1                           MOUNTED
         4 3872456618 PDB2                           MOUNTED
SQL> alter session set container=pdb1;
Session altered.
SQL> startup
Pluggable Database opened.
SQL>  select con_id,dbid,NAME,OPEN_MODE from v$pdbs;
    CON_ID       DBID NAME                           OPEN_MODE
---------- ---------- ------------------------------ ----------
         3 3313918585 PDB1                           READ WRITE

pdb的管理可以在cdb中进行也可以在pdb中进行,如果是cdb中进行,需要PLUGGABLE关键字,如果是pdb中直接和普通数据库一样

登录pdb

[oracle@xifenfei ~]$ lsnrctl status
LSNRCTL for Linux: Version 12.1.0.1.0 - Production on 12-MAY-2013 08:07:02
Copyright (c) 1991, 2013, Oracle.  All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=xifenfei)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 12.1.0.1.0 - Production
Start Date                11-MAY-2013 18:30:54
Uptime                    0 days 13 hr. 36 min. 8 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/grid/product/12.1/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/xifenfei/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=xifenfei)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=xifenfei)(PORT=5500))
(Security=(my_wallet_directory=/u01/oracle/12.1/db_1/admin/cdb/xdb_wallet))(Presentation=HTTP)(Session=RAW))
Services Summary...
Service "+ASM" has 1 instance(s).
  Instance "+ASM", status READY, has 1 handler(s) for this service...
Service "cdb" has 1 instance(s).
  Instance "cdb", status READY, has 1 handler(s) for this service...
Service "cdbXDB" has 1 instance(s).
  Instance "cdb", status READY, has 1 handler(s) for this service...
Service "pdb1" has 1 instance(s).
  Instance "cdb", status READY, has 1 handler(s) for this service...
Service "pdb2" has 1 instance(s).
  Instance "cdb", status READY, has 1 handler(s) for this service...
The command completed successfully
[oracle@xifenfei ~]$ tnsping pdb1
TNS Ping Utility for Linux: Version 12.1.0.1.0 - Production on 12-MAY-2013 08:07:09
Copyright (c) 1997, 2013, Oracle.  All rights reserved.
Used parameter files:
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = xifenfei)
(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = pdb1)))
OK (20 msec)
[oracle@xifenfei ~]$ sqlplus sys/xifenfei@pdb1 as sysdba
SQL*Plus: Release 12.1.0.1.0 Production on Sun May 12 08:08:02 2013
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics
and Real Application Testing options
SQL> show con_name;
CON_NAME
------------------------------
PDB1
[oracle@xifenfei ~]$ sqlplus / as sysdba
SQL*Plus: Release 12.1.0.1.0 Production on Sun May 12 08:09:14 2013
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics
and Real Application Testing options
SQL> alter session set container=pdb1;
Session altered.
SQL> show con_name;
CON_NAME
------------------------------
PDB1

pdb可以通过alter session container进入也可以直接通过tns方式登录

创建用户

SQL> show con_name;
CON_NAME
------------------------------
CDB$ROOT
SQL> select con_id,dbid,NAME,OPEN_MODE from v$pdbs;
    CON_ID       DBID NAME                           OPEN_MODE
---------- ---------- ------------------------------ ----------
         2 4048821679 PDB$SEED                       READ ONLY
         3 3313918585 PDB1                           READ WRITE
         4 3872456618 PDB2                           MOUNTED
SQL> create user xff identified by xifenfei;
create user xff identified by xifenfei
            *
ERROR at line 1:
ORA-65096: invalid common user or role name
SQL> !oerr ora 65096
65096, 00000, "invalid common user or role name"
// *Cause:  An attempt was made to create a common user or role with a name
//          that wass not valid for common users or roles.  In addition to
//          the usual rules for user and role names, common user and role
//          names must start with C## or c## and consist only of ASCII
//          characters.
// *Action: Specify a valid common user or role name.
//
SQL> create user c##xff identified by xifenfei;
User created.
SQL> SELECT USERNAME,CON_ID,USER_ID FROM CDB_USERS WHERE USERNAME='C##XFF';
USERNAME       CON_ID    USER_ID
---------- ---------- ----------
C##XFF              1        103
C##XFF              3        104
SQL> alter session set container=pdb1;
Session altered.
SQL> show con_name
CON_NAME
------------------------------
PDB1
SQL> create user xff identified by xifenfei;
User created.
SQL> create user c##abc identified by xifenfei;
create user c##abc identified by xifenfei
            *
ERROR at line 1:
ORA-65094: invalid local user or role name

创建用户默认的是container=all,在cdb中只能创建全局用户(c##开头),会在cdb和所有的pdb中创建该用户(但是pdb中的全局用户需要另外授权才能够在pdb中访问)。在pdb中只能创建的用户为本地用户

用户授权

SQL> grant connect to c##xff;
Grant succeeded.
SQL> select GRANTEE,con_id from cdb_ROLE_PRIVS where GRANTED_ROLE='CONNECT' AND GRANTEE='C##XFF';
GRANTEE                            CON_ID
------------------------------ ----------
C##XFF                                  1
SQL> grant resource to c##xff container=all;
Grant succeeded.
SQL>  select GRANTEE,con_id from cdb_ROLE_PRIVS where GRANTED_ROLE='RESOURCE' AND  GRANTEE='C##XFF';
GRANTEE                            CON_ID
------------------------------ ----------
C##XFF                                  1
C##XFF                                  3

用户授权默认情况下是只会给当前container,在cdb中也可以指定container=all,对所有open的pdb且存在该用户都进行授权

修改参数

SQL> alter system set open_cursors=500 container=all;
System altered.
SQL> conn sys/xifenfei@pdb1 as sysdba
Connected.
SQL> show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     500
SQL>  alter system set open_cursors=100;
 alter system set open_cursors=100
*
ERROR at line 1:
ORA-01219: database or pluggable database not open: queries allowed on fixed
tables or views only
SQL> alter database open;
Database altered.
SQL> alter system set open_cursors=100;
System altered.
SQL>  show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     100
SQL> conn / as sysdba
Connected.
SQL> show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     500

这里可以看到在cdb中修改,pdb会继承进去;如果在pdb中修改会覆盖pdb从cdb中继承的参数含义