11G RAC TO 11G RAC ADG SWITCHOVER

11G RAC TO 11G RAC ADG切换过程
主库准备工作

SQL> select inst_id,database_role,OPEN_MODE from  gv$database;
   INST_ID DATABASE_ROLE    OPEN_MODE
---------- ---------------- --------------------
         2 PRIMARY          READ WRITE
         1 PRIMARY          READ WRITE
[oracle@q9db02 ~]$ srvctl stop instance -d q9db -i q9db2
SQL> select inst_id,database_role,OPEN_MODE from  gv$database;
   INST_ID DATABASE_ROLE    OPEN_MODE
---------- ---------------- --------------------
         1 PRIMARY          READ WRITE
SQL> CREATE RESTORE POINT SWITCHOVER_START_GRP GUARANTEE FLASHBACK DATABASE;
Restore point created.

备库准备工作

SQL> select inst_id,database_role,OPEN_MODE from  gv$database;
   INST_ID DATABASE_ROLE    OPEN_MODE
---------- ---------------- --------------------
         2 PHYSICAL STANDBY READ ONLY WITH APPLY
         1 PHYSICAL STANDBY READ ONLY WITH APPL
[oracle@q9adg02 ~]$ srvctl stop instance -d q9db_adg -i q9db2
SQL> select inst_id,database_role,OPEN_MODE from  gv$database;
   INST_ID DATABASE_ROLE    OPEN_MODE
---------- ---------------- --------------------
         1 PHYSICAL STANDBY READ ONLY WITH APPL
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
Database altered.
SQL> CREATE RESTORE POINT SWITCHOVER_START_GRP GUARANTEE FLASHBACK DATABASE;
Restore point created.
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
Database altered.

主库切换日志,观察备库

--主库
SQL> alter system switch logfile;
System altered.
SQL> /
System altered.
--备库
[oracle@q9adg01 trace]$ tail -f alert_q9db1.log
Tue Jun 25 15:35:27 2013
RFS[10]: Selected log 52 for thread 1 sequence 4777 dbid 844605368 branch 817913807
Tue Jun 25 15:35:28 2013
Archived Log entry 4889 added for thread 1 sequence 4776 ID 0x3545ffea dest 1:
Tue Jun 25 15:35:28 2013
Media Recovery Waiting for thread 1 sequence 4777 (in transit)
Tue Jun 25 15:35:28 2013
RFS[11]: Selected log 72 for thread 2 sequence 1630 dbid 844605368 branch 817913807
Recovery of Online Redo Log: Thread 1 Group 52 Seq 4777 Reading mem 0
  Mem# 0: +DATA/q9db_adg/onlinelog/group_52.1564.818724635
Media Recovery Waiting for thread 2 sequence 1630 (in transit)
Recovery of Online Redo Log: Thread 2 Group 72 Seq 1630 Reading mem 0
  Mem# 0: +DATA/q9db_adg/onlinelog/group_72.1575.818724653
Tue Jun 25 15:35:30 2013
Archived Log entry 4890 added for thread 2 sequence 1629 ID 0x3545ffea dest 1:

几乎同步进行表示主备日志传输应用正常

主库切换

SQL> SELECT SWITCHOVER_STATUS FROM V$DATABASE;
SWITCHOVER_STATUS
--------------------
TO STANDBY
SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO STANDBY WITH SESSION SHUTDOWN;
Database altered.

备库切换

SQL> SELECT SWITCHOVER_STATUS FROM V$DATABASE;
SWITCHOVER_STATUS
--------------------
TO PRIMARY
SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION SHUTDOWN;
Database altered.
SQL> ALTER DATABASE OPEN;
Database altered.

继续处理主库

SQL> shutdown immediate
ORA-01092: ORACLE instance terminated. Disconnection forced
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, Data Mining
and Real Application Testing options
[oracle@q9db01 ogg]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Tue Jun 25 14:13:58 2013
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup mount
ORACLE instance started.
Total System Global Area 1.6034E+11 bytes
Fixed Size                  2236968 bytes
Variable Size            2.5770E+10 bytes
Database Buffers         1.3422E+11 bytes
Redo Buffers              352468992 bytes
Database mounted.
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
Database altered.
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE cancel;
Database altered.
SQL> alter database open;
Database altered.
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
Database altered.

清理快照

--主库
SQL> DROP RESTORE POINT SWITCHOVER_START_GRP;
Restore point dropped.
--备库
SQL> startup mount
ORACLE instance started.
Total System Global Area 1.6034E+11 bytes
Fixed Size                  2236968 bytes
Variable Size            2.7380E+10 bytes
Database Buffers         1.3261E+11 bytes
Redo Buffers              352468992 bytes
Database mounted.
SQL> DROP RESTORE POINT SWITCHOVER_START_GRP;
Restore point dropped.
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
Database altered.
SQL>  ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
Database altered.
SQL> alter database open;
Database altered.
SQL>  ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
Database altered.

启动主备另外节点

--主库
[oracle@q9db01 ~]$ srvctl start instance -d q9db -i q9db2
--备库
[oracle@q9adg02 ~]$ srvctl start instance -d q9db_adg -i q9db2

补充说明:如果出现日志切换暂时不能传输

备库执行(因为重启动态监听没有马上别识别)
alter system register;
主库执行
alter system set log_archive_dest_state_2=enable;

在11GR2 GI上配置第二个监听

有客户因为归档日志量每天很大,为了不影响业务,需要配置一个单独的万兆网络来专门的传输归档日志到DG库,这里就涉及到在11G(11203 Linux) RAC中增加一个监听用来使用专门的网络.这里提供在主库配置第二个监听的整体操作过程,主要涉及配置解析,增加网络,增加vip,配置监听,配置listener_networks
网卡情况

[oracle@q9db01 admin]$ /sbin/ifconfig eth2
eth2      Link encap:Ethernet  HWaddr 90:E2:BA:1E:14:34
          inet addr:192.168.5.60  Bcast:192.168.5.255  Mask:255.255.255.0
          inet6 addr: fe80::92e2:baff:fe1e:1434/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3932094203 errors:0 dropped:0 overruns:0 frame:0
          TX packets:176073749 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:5752649063554 (5.2 TiB)  TX bytes:31298228144 (29.1 GiB)
[grid@q9db02 ~]$ /sbin/ifconfig eth2
eth2      Link encap:Ethernet  HWaddr 90:E2:BA:1E:13:5C
          inet addr:192.168.5.61  Bcast:192.168.5.255  Mask:255.255.255.0
          inet6 addr: fe80::92e2:baff:fe1e:135c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:22861938 errors:0 dropped:0 overruns:0 frame:0
          TX packets:187447459 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2603356463 (2.4 GiB)  TX bytes:276672018723 (257.6 GiB)

配置hosts文件

#public dg ip
192.168.5.60     q9db01-dg
192.168.5.61     q9db02-dg
192.168.5.64    q9db01-dg-vip
192.168.5.65    q9db02-dg-vip

CRS中配置

--增加网络资源
[root@q9db01 ~]# srvctl add network -k 2 -S 192.168.5.0/255.255.255.0/eth2 -w static -v
--启动网络资源
[root@q9db01 ~]# crsctl start res ora.net2.network
--增加vip资源
[root@q9db01 ~]# srvctl add vip -n q9db01 -A 192.168.5.64/255.255.255.0 -k 2
[root@q9db01 ~]# srvctl add vip -n q9db02 -A 192.168.5.65/255.255.255.0 -k 2
--启动vip资源
[root@q9db01 ~]# srvctl start vip -i q9db01-dg-vip
[root@q9db01 ~]# srvctl start vip -i q9db02-dg-vip
--netca创建监听
[root@q9db01 ~]# su - grid
[grid@q9db01 ~]$ export DISPLAY=172.18.50.150:0.0
[grid@q9db01 ~]$ netca
------选择Subnet 2(选择网络192.168.5.60网段),选择非1521端口---------
--查看资源状态
[grid@q9db01 ~]$ crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCH.dg
               ONLINE  ONLINE       q9db01
               ONLINE  ONLINE       q9db02
ora.DATA.dg
               ONLINE  ONLINE       q9db01
               ONLINE  ONLINE       q9db02
ora.LISTENER.lsnr
               ONLINE  ONLINE       q9db01
               ONLINE  ONLINE       q9db02
ora.LISTENER_DG.lsnr
               ONLINE  ONLINE       q9db01
               ONLINE  ONLINE       q9db02
ora.OCR_VOTE.dg
               ONLINE  ONLINE       q9db01
               ONLINE  ONLINE       q9db02
ora.asm
               ONLINE  ONLINE       q9db01                   Started
               ONLINE  ONLINE       q9db02                   Started
ora.gsd
               OFFLINE OFFLINE      q9db01
               OFFLINE OFFLINE      q9db02
ora.net1.network
               ONLINE  ONLINE       q9db01
               ONLINE  ONLINE       q9db02
ora.net2.network
               ONLINE  ONLINE       q9db01
               ONLINE  ONLINE       q9db02
ora.ons
               ONLINE  ONLINE       q9db01
               ONLINE  ONLINE       q9db02
ora.registry.acfs
               ONLINE  ONLINE       q9db01
               ONLINE  ONLINE       q9db02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       q9db01
ora.cvu
      1        ONLINE  ONLINE       q9db01
ora.oc4j
      1        ONLINE  ONLINE       q9db01
ora.q9db.db
      1        ONLINE  ONLINE       q9db01                   Open
      2        ONLINE  ONLINE       q9db02                   Open
ora.q9db01-dg-vip.vip
      1        ONLINE  ONLINE       q9db01
ora.q9db01.vip
      1        ONLINE  ONLINE       q9db01
ora.q9db02-dg-vip.vip
      1        ONLINE  ONLINE       q9db02
ora.q9db02.vip
      1        ONLINE  ONLINE       q9db02
ora.scan1.vip
      1        ONLINE  ONLINE       q9db01
--查看监听状态
[grid@q9db01 ~]$ lsnrctl status listener_dg
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 19-JUN-2013 14:40:24
Copyright (c) 1991, 2011, Oracle.  All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_DG)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_DG
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                19-JUN-2013 14:38:43
Uptime                    0 days 0 hr. 1 min. 42 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/q9db01/listener_dg/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_DG)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.5.64)(PORT=1522)))
The listener supports no services
The command completed successfully

配置listener_networks

--在RDBMS的tnsnames.ora中配置
Q9DB01_LOCAL_NET1 =(DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = q9db01-vip )(PORT = 1521)))
Q9DB01_LOCAL_NET2 =(DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = q9db01-dg-vip )(PORT = 1522)))
Q9DB02_LOCAL_NET1 =(DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = q9db02-vip )(PORT = 1521)))
Q9DB02_LOCAL_NET2 =(DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = q9db02-dg-vip )(PORT = 1522)))
Q9DB_REMOTE_NET2 =(DESCRIPTION_LIST =(DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = q9db01-dg-vip )
(PORT = 1522)))(DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = q9db02-dg-vip )(PORT = 1522))))
--配置listener_networks
----节点1
SQL> ALTER SYSTEM SET listener_networks='((NAME=network1)(LOCAL_LISTENER=Q9DB01_LOCAL_NET1)
     (REMOTE_LISTENER=q9dbscan:1521))','((NAME=network2)(LOCAL_LISTENER=Q9DB01_LOCAL_NET2)
     (REMOTE_LISTENER=Q9DB_REMOTE_NET2))'SCOPE=BOTH SID='q9db1';
System altered.
----节点2
SQL> ALTER SYSTEM SET listener_networks='((NAME=network1)(LOCAL_LISTENER=Q9DB02_LOCAL_NET1)
     (REMOTE_LISTENER=q9db-scan:1521))','((NAME=network2)(LOCAL_LISTENER=Q9DB02_LOCAL_NET2)
     (REMOTE_LISTENER=Q9DB_REMOTE_NET2))'SCOPE=BOTH SID='q9db2';
System altered.

查看监听状态

--节点1
[grid@q9db01 ~]$ lsnrctl status listener_dg
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 19-JUN-2013 17:12:45
Copyright (c) 1991, 2011, Oracle.  All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_DG)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_DG
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                19-JUN-2013 16:47:03
Uptime                    0 days 0 hr. 25 min. 42 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/q9db01/listener_dg/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_DG)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.5.64)(PORT=1522)))
Services Summary...
Service "q9db" has 2 instance(s).
  Instance "q9db1", status READY, has 2 handler(s) for this service...
  Instance "q9db2", status READY, has 1 handler(s) for this service...
The command completed successfully
--节点2
[grid@q9db02 ~]$ lsnrctl status listener_dg
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 19-JUN-2013 17:12:02
Copyright (c) 1991, 2011, Oracle.  All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_DG)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_DG
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                19-JUN-2013 16:52:24
Uptime                    0 days 0 hr. 19 min. 37 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/q9db02/listener_dg/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_DG)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.5.65)(PORT=1522)))
Services Summary...
Service "q9db" has 2 instance(s).
  Instance "q9db1", status READY, has 1 handler(s) for this service...
  Instance "q9db2", status READY, has 2 handler(s) for this service...
The command completed successfully

测试新监听

--RDBMS目录中tns配置
q9db1dg=
(DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = q9db01-dg-vip)(PORT = 1522))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SID = q9db1)
    )
 )
q9db2dg=
(DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = q9db02-dg-vip)(PORT = 1522))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SID = q9db2)
    )
 )
--验证结果
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> conn test/test@q9db1dg
Connected.
SQL> show parameter instance_name
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
instance_name                        string      q9db1
SQL> conn test/test@q9db2dg
Connected.
SQL>  show parameter instance_name
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
instance_name                        string      q9db2

到这里,可以正常的访问两个库,监听已经配置完成,数据库之间也可以使用特定网络

ORACLE 12C RMAN 功能增强

在ORACLE 12C中对rman的功能有了不少增强,在以前的文章中写过RMAN RECOVER TABLE功能,这里另外补充rman增强的两个小功能(sql语句和数据文件分割)
数据库版本

SQL>  select * from v$version;
BANNER                                                                               CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production              0
PL/SQL Release 12.1.0.1.0 - Production                                                    0
CORE    12.1.0.1.0      Production                                                        0
TNS for Linux: Version 12.1.0.1.0 - Production                                            0
NLSRTL Version 12.1.0.1.0 - Production                                                    0

rman对sql语句支持增强

[oracle@xifenfei tmp]$ rman target /
Recovery Manager: Release 12.1.0.1.0 - Production on Sat Jun 1 14:07:50 2013
Copyright (c) 1982, 2013, Oracle and/or its affiliates.  All rights reserved.
connected to target database: CDB (DBID=1922813718)
RMAN> select sysdate from dual;
using target database control file instead of recovery catalog
SYSDATE
---------
01-JUN-13
RMAN> alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss';
Statement processed
RMAN>  select sysdate from dual;
SYSDATE
-------------------
2013-06-01 14:16:48
RMAN> desc v$log
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 GROUP#                                             NUMBER
 THREAD#                                            NUMBER
 SEQUENCE#                                          NUMBER
 BYTES                                              NUMBER
 BLOCKSIZE                                          NUMBER
 MEMBERS                                            NUMBER
 ARCHIVED                                           VARCHAR2(3)
 STATUS                                             VARCHAR2(16)
 FIRST_CHANGE#                                      NUMBER
 FIRST_TIME                                         DATE
 NEXT_CHANGE#                                       NUMBER
 NEXT_TIME                                          DATE
 CON_ID                                             NUMBER

这里看到rman只是sql语句中的select和desc用法

rman分割数据文件增强

RMAN>  CONFIGURE DEVICE TYPE DISK PARALLELISM 3;
old RMAN configuration parameters:
CONFIGURE DEVICE TYPE DISK PARALLELISM 1 BACKUP TYPE TO BACKUPSET;
new RMAN configuration parameters:
CONFIGURE DEVICE TYPE DISK PARALLELISM 3 BACKUP TYPE TO BACKUPSET;
new RMAN configuration parameters are successfully stored
RMAN> backup incremental level 1 section size 30M datafile 1 format '/tmp/system_%U.rman';
Starting backup at 01-JUN-13
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=27 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=269 device type=DISK
allocated channel: ORA_DISK_3
channel ORA_DISK_3: SID=24 device type=DISK
no parent backup or copy of datafile 1 found
channel ORA_DISK_1: starting incremental level 1 datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=+DATA/cdb/system01.dbf
backing up blocks 1 through 3840
channel ORA_DISK_1: starting piece 1 at 01-JUN-13
channel ORA_DISK_2: starting incremental level 1 datafile backup set
channel ORA_DISK_2: specifying datafile(s) in backup set
input datafile file number=00001 name=+DATA/cdb/system01.dbf
……………………
backing up blocks 96001 through 99840
channel ORA_DISK_3: starting piece 26 at 01-JUN-13
channel ORA_DISK_1: finished piece 24 at 01-JUN-13
piece handle=/tmp/system_02ob3pg1_24_1.rman tag=TAG20130601T144518 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:08
channel ORA_DISK_1: starting incremental level 1 datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=+DATA/cdb/system01.dbf
backing up blocks 99841 through 101120
channel ORA_DISK_1: starting piece 27 at 01-JUN-13
channel ORA_DISK_2: finished piece 25 at 01-JUN-13
piece handle=/tmp/system_02ob3pg1_25_1.rman tag=TAG20130601T144518 comment=NONE
channel ORA_DISK_2: backup set complete, elapsed time: 00:00:07
channel ORA_DISK_3: finished piece 26 at 01-JUN-13
piece handle=/tmp/system_02ob3pg1_26_1.rman tag=TAG20130601T144518 comment=NONE
channel ORA_DISK_3: backup set complete, elapsed time: 00:00:06
channel ORA_DISK_1: finished piece 27 at 01-JUN-13
piece handle=/tmp/system_02ob3pg1_27_1.rman tag=TAG20130601T144518 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:07
Finished backup at 01-JUN-13

备份文件情况

[oracle@xifenfei tmp]$ ll -ltr system*
-rw-r----- 1 oracle dba 14761984 Jun  1 14:45 system_02ob3pg1_1_1.rman
-rw-r----- 1 oracle dba  9535488 Jun  1 14:45 system_02ob3pg1_2_1.rman
-rw-r----- 1 oracle dba 16973824 Jun  1 14:45 system_02ob3pg1_4_1.rman
-rw-r----- 1 oracle dba 18284544 Jun  1 14:45 system_02ob3pg1_3_1.rman
-rw-r----- 1 oracle dba 12804096 Jun  1 14:45 system_02ob3pg1_5_1.rman
-rw-r----- 1 oracle dba 29163520 Jun  1 14:45 system_02ob3pg1_6_1.rman
-rw-r----- 1 oracle dba 31326208 Jun  1 14:46 system_02ob3pg1_7_1.rman
-rw-r----- 1 oracle dba 30851072 Jun  1 14:46 system_02ob3pg1_8_1.rman
-rw-r----- 1 oracle dba 30801920 Jun  1 14:46 system_02ob3pg1_9_1.rman
-rw-r----- 1 oracle dba 23977984 Jun  1 14:46 system_02ob3pg1_11_1.rman
-rw-r----- 1 oracle dba 28508160 Jun  1 14:46 system_02ob3pg1_10_1.rman
-rw-r----- 1 oracle dba 30277632 Jun  1 14:46 system_02ob3pg1_12_1.rman
-rw-r----- 1 oracle dba 31498240 Jun  1 14:46 system_02ob3pg1_13_1.rman
-rw-r----- 1 oracle dba 31498240 Jun  1 14:47 system_02ob3pg1_14_1.rman
-rw-r----- 1 oracle dba 31498240 Jun  1 14:47 system_02ob3pg1_15_1.rman
-rw-r----- 1 oracle dba 30507008 Jun  1 14:47 system_02ob3pg1_17_1.rman
-rw-r----- 1 oracle dba 30834688 Jun  1 14:47 system_02ob3pg1_16_1.rman
-rw-r----- 1 oracle dba 31498240 Jun  1 14:47 system_02ob3pg1_18_1.rman
-rw-r----- 1 oracle dba 30244864 Jun  1 14:47 system_02ob3pg1_19_1.rman
-rw-r----- 1 oracle dba 29016064 Jun  1 14:47 system_02ob3pg1_20_1.rman
-rw-r----- 1 oracle dba 29212672 Jun  1 14:47 system_02ob3pg1_21_1.rman
-rw-r----- 1 oracle dba 30728192 Jun  1 14:47 system_02ob3pg1_22_1.rman
-rw-r----- 1 oracle dba 29384704 Jun  1 14:47 system_02ob3pg1_23_1.rman
-rw-r----- 1 oracle dba 26566656 Jun  1 14:47 system_02ob3pg1_24_1.rman
-rw-r----- 1 oracle dba 24928256 Jun  1 14:48 system_02ob3pg1_25_1.rman
-rw-r----- 1 oracle dba 19324928 Jun  1 14:48 system_02ob3pg1_26_1.rman
-rw-r----- 1 oracle dba  6291456 Jun  1 14:48 system_02ob3pg1_27_1.rman

在12C之前的版本,ORACLE 11GR2只是对于全备的备份集备份(非增量,非copy备份方式)方式支持数据文件分割备份功能,对于11.2之前的版本均不支持该功能.在12C中rman可以支持对于全备,增量备份,copy备份全部支持分割数据文件备份(CONTROLFILE,SPFILE不支持)

root 用户操作 ORACLE 数据库导致悲剧

接到同事请求,说客户的linux redhat 5.8平台部署的11.2.0.3 RAC 节点2挂掉了,报磁盘IO异常,数据库hang住

Fri Jun 14 12:01:22 2013
Thread 2 advanced to log sequence 369 (LGWR switch)
  Current log# 49 seq# 369 mem# 0: +DATA/q9db/onlinelog/group_49.861.817830099
Fri Jun 14 12:01:22 2013
Archived Log entry 89300 added for thread 2 sequence 368 ID 0x35324053 dest 1:
Fri Jun 14 14:26:18 2013
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_11788.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun2"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_11788.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun2"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
WARNING: failed to read mirror side 1 of virtual extent 441 logical extent 0 of file 625
  in group [2.3857217523] from disk DATA_0001
  allocation unit 377890 reason error; if possible, will try another mirror side
Fri Jun 14 14:31:17 2013
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_13767.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun2"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_13767.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun2"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
WARNING: failed to read mirror side 1 of virtual extent 441 logical extent 0 of file 625
  in group [2.3857217523] from disk DATA_0001
  allocation unit 377890 reason error; if possible, will try another mirror side

在12点钟数据库运行正常,无任何错误,突然到了14多出现ORA-15025/ORA-27041,并且重启ORACLE 数据库恢复正常。该错误很明显是数据库无权限访问ASM DISK,检查ASM实例日志

Thu Jun 13 19:01:21 2013
ASMB started with pid=25, OS id=25066
Thu Jun 13 19:01:22 2013
NOTE: client +ASM2:+ASM registered, osid 25068, mbr 0x0
WARNING: failed to online diskgroup resource ora.DATA.dg (unable to communicate with CRSD/OHASD)
Thu Jun 13 19:01:24 2013
WARNING: failed to online diskgroup resource ora.OCR_VOTE.dg (unable to communicate with CRSD/OHASD)
Thu Jun 13 19:01:57 2013
NOTE: client q9db2:q9db registered, osid 25732, mbr 0x1
Thu Jun 13 19:02:31 2013
ALTER SYSTEM SET local_listener=' (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.8.8.33)
 (PORT=1521))))' SCOPE=MEMORY SID='+ASM2';
Fri Jun 14 14:53:09 2013
SQL> ALTER DISKGROUP OCR_VOTE DISMOUNT  /* asm agent *//* {2:61929:97} */
Fri Jun 14 14:53:10 2013
SQL> ALTER DISKGROUP ARCH DISMOUNT  /* asm agent *//* {2:61929:97} */
Fri Jun 14 14:53:10 2013
SQL> ALTER DISKGROUP DATA DISMOUNT  /* asm agent *//* {2:61929:97} */

这里可以明显的看到,ASM实例在该时间点无任何错误,证明一切运行正常,查看系统日志,在该故障点,message中无任何记录,查看asm disk权限

[oracle@q9db02 trace]$ ll /dev/mapper/
total 0
crw------- 1 root root    10, 60 Jun  9 11:08 control
brw-rw---- 1 grid asmdba 253, 15 Jun 14 16:20 q9datalun1
brw-rw---- 1 grid asmdba 253, 16 Jun 14 16:20 q9datalun2
brw-rw---- 1 grid asmdba 253, 17 Jun 14 16:20 q9datalun3
brw-rw---- 1 grid asmdba 253, 18 Jun 14 16:19 q9datalun4
brw-rw---- 1 grid asmdba 253, 19 Jun 14 16:20 q9datalun5
brw-rw---- 1 grid asmdba 253, 20 Jun 14 16:20 q9datalun6
brw-rw---- 1 grid asmdba 253, 21 Jun 14 16:19 q9datalun7
brw-rw---- 1 grid asmdba 253,  4 Jun 14 16:20 q9datalun8
brw-rw---- 1 grid asmdba 253,  5 Jun 14 16:20 q9votelun1

所有文件权限没有任何问题,和当初部署之时完全相同而且运行了一段时间都正常,部署之时权限

[oracle@q9db02 trace]$ more /etc/rc.local
chown grid:asmdba /dev/mapper/q9votelun1
chmod 660 /dev/mapper/q9votelun1
chown grid:asmdba /dev/mapper/q9datalun1
chmod 660 /dev/mapper/q9datalun1
chown grid:asmdba /dev/mapper/q9datalun2
chmod 660 /dev/mapper/q9datalun2
chown grid:asmdba /dev/mapper/q9datalun3
chmod 660 /dev/mapper/q9datalun3
chown grid:asmdba /dev/mapper/q9datalun4
chmod 660 /dev/mapper/q9datalun4
chown grid:asmdba /dev/mapper/q9datalun5
chmod 660 /dev/mapper/q9datalun5
chown grid:asmdba /dev/mapper/q9datalun6
chmod 660 /dev/mapper/q9datalun6
chown grid:asmdba /dev/mapper/q9datalun7
chmod 660 /dev/mapper/q9datalun7
chown grid:asmdba /dev/mapper/q9datalun8
chmod 660 /dev/mapper/q9datalun8
chown grid:asmdba /dev/mapper/q9datalun8
chmod 660 /dev/mapper/q9datalun8

因为这里权限没有任何改变,而且asm disk权限正确,系统日志无任何日志,证明该问题不是因为ASM DISK权限改变导致,那我怀疑是人做了不该做的操作,比喻临时性修改了ASM DISK权限,然后有修改回来了,或者是不正常的用户操作了数据库,而这些操作更加可能是root用户操作,分析root用户操作记录

--history部分记录
  803  su  oracle
  804  exit
  805  cd /tmp
  806  ls
  807  cd sysbench/
  808  cd bin/
  809  ls
  810  ORACLE_SID=q9db2
  811  export ORACLE_BASE
  812  export ORACLE_HOME
  813  ./sysbench --test=oltp --oltp-table-name=sysbench --oltp-table-size=1 --oracle-db=Q9DB
       --oracle-user=sysbench --oracle-password=sysbench --db-driver=oracle  prepare
  814  syssql
  815  sqlplus system/sysbench@q9db02
  816  sqlplus system/q9db@q9db02
  817  echo $ORACLE_HOME
  818  cd $ORACLE_HOME/network/
  819  vi admin/tnsnames.ora
  820  sqlplus system/NEWQ9DB
  821   echo $ORACLE_HOME
  822  vi ~/.bash_profile
  823   echo $ORACLE_SID
  824  ps -ef | grep smon
  825  sqlplus system/NEWQ9DB
  826  exit

这里很明显的看到,由于SA想使用sysbench做系统基线测试,使用了root用户登录数据库并进行了相关操作,从而出现了该问题,因为ASM DISK 所有者是grid:asmdba,权限是660,root用户无法对ASM DISK进行读写操作,从而出现了上述错误。让同事协助SA重现上述操作,果然出现完全相同的错误,而且退出root session,数据库恢复正常

Fri Jun 14 15:44:24 2013
Archived Log entry 89330 added for thread 2 sequence 389 ID 0x35324053 dest 1:
Fri Jun 14 15:50:42 2013
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_29404.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun2"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_29404.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun2"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
WARNING: failed to read mirror side 1 of virtual extent 473 logical extent 0 of file 625
  in group [2.3857045540] from disk DATA_0001
  allocation unit 377894 reason error; if possible, will try another mirror side
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_29404.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun4"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
ORA-00604: error occurred at recursive SQL level 2
ORA-01115: IO error reading block from file  (block # )
ORA-01110: data file 1: '+DATA/q9db/datafile/system.625.817825255'
ORA-15081: failed to submit an I/O operation to a disk
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db2/trace/q9db2_ora_29404.trc:
ORA-15025: could not open disk "/dev/mapper/q9datalun4"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
ORA-00604: error occurred at recursive SQL level 2
ORA-01115: IO error reading block from file  (block # )
ORA-01110: data file 1: '+DATA/q9db/datafile/system.625.817825255'
ORA-15081: failed to submit an I/O operation to a disk
WARNING: failed to read mirror side 1 of virtual extent 652 logical extent 0 of file 625
  in group [2.3857045540] from disk DATA_0003
  allocation unit 377939 reason error; if possible, will try another mirror side
Fri Jun 14 15:55:58 2013
Thread 2 advanced to log sequence 391 (LGWR switch)
  Current log# 41 seq# 391 mem# 0: +DATA/q9db/onlinelog/group_41.853.817830085
Fri Jun 14 15:55:58 2013
Archived Log entry 89331 added for thread 2 sequence 390 ID 0x35324053 dest 1:
Thread 2 advanced to log sequence 392 (LGWR switch)
  Current log# 42 seq# 392 mem# 0: +DATA/q9db/onlinelog/group_42.854.817830087

在ASM ORACLE RAC环境中,使用root操作oracle 数据库导致该错误,强烈建议:操作oracle数据库,请使用oracle数据库安装用户(最少也是同一个所属组用户)运行,超级用户root对于oracle来说也不是万能的

ORACLE 12C CDB中PDB参数管理机制

在ORACLE 12C中参数文件只是记录了cdb的参数信息,没有记录任何的pdb的信息,那ORACLE是如何管理使得各个pdb有自己的参数,这里通过试验的出来ORACLE 12C CDB环境中是通过参数文件结合PDB_SPFILE$来实现参数管理
数据库版本

SQL> select * from v$version;
BANNER                                                                               CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production              0
PL/SQL Release 12.1.0.1.0 - Production                                                    0
CORE    12.1.0.1.0      Production                                                        0
TNS for Linux: Version 12.1.0.1.0 - Production                                            0
NLSRTL Version 12.1.0.1.0 - Production                                                    0

pdb信息

SQL>  select PDB_NAME,CON_UID,pdb_id,status from dba_pdbs;
PDB_NAME      CON_UID     PDB_ID STATUS
---------- ---------- ---------- -------------
PDB1       3313918585          3 NORMAL
PDB$SEED   4048821679          2 NORMAL
PDB2       3872456618          4 NORMAL
SQL> select con_id,dbid,NAME,OPEN_MODE from v$pdbs;
    CON_ID       DBID NAME                           OPEN_MODE
---------- ---------- ------------------------------ ----------
         2 4048821679 PDB$SEED                       READ ONLY
         3 3313918585 PDB1                           READ WRITE
         4 3872456618 PDB2                           MOUNTED

CDB$ROOT中修改参数

--指定container=all
SQL> show con_name
CON_NAME
------------------------------
CDB$ROOT
SQL> alter system set open_cursors=500 container=all;
System altered.
SQL> show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     500
SQL> alter session set container=pdb1;
Session altered.
SQL> show con_name
CON_NAME
------------------------------
PDB1
SQL> show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     500
--在CDB$ROOT中修改不指定container参数表示全部pdb生效
SQL> alter session set container=CDB$ROOT;
Session altered.
SQL> alter system set open_cursors=100;
System altered.
SQL>  show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     100
SQL> alter session set container=pdb1;
Session altered.
SQL> show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     100
--指定container=current
SQL> alter system set open_cursors=120 container=current;
System altered.
SQL> show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     120
SQL> alter session set container=pdb2 ;
Session altered.
SQL> show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     120

这里可以看出来,在ROOT中修改参数,默认情况和指定container=all/current均是所有open的pdb都生效.
这里有个疑问ORACLE的参数文件只是记录的cdb的sid的参数,并未记录各个pdb的参数,那是如何实现cdb中各个pdb参数不一致的呢?继续分析

修改pdb参数做10046

SQL> show con_name;
CON_NAME
------------------------------
PDB1
SQL> oradebug setmypid
Statement processed.
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
Statement processed.
SQL> oradebug TRACEFILE_NAME
/u01/app/oracle/diag/rdbms/cdb/cdb/trace/cdb_ora_18377.trc
SQL> alter system set sessions=100;
System altered.
SQL> oradebug EVENT 10046 trace name context off
Statement processed.
--继续修改pdb参数
SQL> alter session set container=pdb1;
Session altered.
SQL>  oradebug setmypid
Statement processed.
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
Statement processed.
SQL>  oradebug TRACEFILE_NAME
/u01/app/oracle/diag/rdbms/cdb/cdb/trace/cdb_ora_20275.trc
SQL> alter system set sessions=101;
System altered.
SQL> oradebug EVENT 10046 trace name context off
Statement processed.

分析trace文件

--第一次修改pdb参数值
insert into pdb_spfile$(db_uniq_name, pdb_uid, sid, name, value$, comment$)  values(:1,:2,:3,:4,:5,:6)
END OF STMT
PARSE #140085118752824:c=3999,e=3397,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=99767937623
BINDS #140085118752824:
 Bind#0
  oacdty=01 mxl=32(03) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=7fffcfaa5842  bln=32  avl=03  flg=09
  value="cdb"
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f681bbb2170  bln=22  avl=06  flg=05
  value=3313918585
 Bind#2
  oacdty=01 mxl=32(01) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=7fffcfaa46f8  bln=32  avl=01  flg=09
  value="*"
 Bind#3
  oacdty=01 mxl=32(08) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=0bc220d8  bln=32  avl=08  flg=09
  value="sessions"
 Bind#4
  oacdty=01 mxl=32(03) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=7fffcfaa474c  bln=32  avl=03  flg=09
  value="100"
 Bind#5
  oacdty=01 mxl=32(00) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=00000000  bln=32  avl=00  flg=09
--第二次修改pdb参数值(相同参数)
update pdb_spfile$ set value$=:5, comment$=:6  where name=:1 and pdb_uid=:2 and db_uniq_name=:3 and sid=:4
BINDS #140603847818408:
 Bind#0
  oacdty=01 mxl=32(03) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=7ffff6477dcc  bln=32  avl=03  flg=09
  value="101"
 Bind#1
  oacdty=01 mxl=32(00) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=00000000  bln=32  avl=00  flg=09
 Bind#2
  oacdty=01 mxl=32(08) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=0bc220d8  bln=32  avl=08  flg=09
  value="sessions"
 Bind#3
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fe0e2638320  bln=22  avl=06  flg=05
  value=3313918585
 Bind#4
  oacdty=01 mxl=32(03) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=7ffff6478ec2  bln=32  avl=03  flg=09
  value="cdb"
 Bind#5
  oacdty=01 mxl=32(01) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=7ffff6477d78  bln=32  avl=01  flg=09
  value="*"

通过这里我们发现在独立修改pdb参数之时,其本质是在pdb_spfile$基表中插入或者修改相关记录(第一次修改插入,后续修改是更新)

关于pdb_spfile$基表分析

SQL> SHOW CON_NAME;
CON_NAME
------------------------------
CDB$ROOT
SQL> COL OWNER FOR A10
SQL> select con_id,owner,object_type from cdb_objects where object_name='PDB_SPFILE$';
    CON_ID OWNER      OBJECT_TYPE
---------- ---------- -----------------------
         2 SYS        TABLE
         1 SYS        TABLE
         3 SYS        TABLE
SQL> COL DB_UNIQ_NAME FOR A10
SQL> COL NAME FOR A15
SQL> COL VALUE$ FOR A10
SQL> SELECT DB_UNIQ_NAME,PDB_UID,NAME,VALUE$ FROM PDB_SPFILE$;
DB_UNIQ_NA    PDB_UID NAME            VALUE$
---------- ---------- --------------- ----------
cdb        3313918585 sessions        101
SQL> ALTER SESSION SET CONTAINER=pdb1;
Session altered.
SQL>  SELECT DB_UNIQ_NAME,PDB_UID,NAME,VALUE$ FROM PDB_SPFILE$;
no rows selected

证明pdb中不同于root的参数是记录在root的PDB_SPFILE$基表中.
整个CDB的工作原理是如果在PDB_SPFILE$中无相关参数记录,则继承cdb的参数文件中值,如果PDB_SPFILE$中有记录则使用该值覆盖cdb参数文件值.

删除PDB_SPFILE$验证

SQL> SHOW CON_NAME;
CON_NAME
------------------------------
CDB$ROOT
SQL> show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     100
SQL>  select con_id,dbid,NAME,OPEN_MODE from v$pdbs;
    CON_ID       DBID NAME                           OPEN_MODE
---------- ---------- ------------------------------ ----------
         2 4048821679 PDB$SEED                       READ ONLY
         3 3313918585 PDB1                           MOUNTED
         4 3872456618 PDB2                           READ WRITE
SQL> alter session set container=pdb2;
Session altered.
SQL> show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     100
SQL> alter system set open_cursors=110;
System altered.
SQL> show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     110
SQL> conn / as sysdba
Connected.
SQL> select value$ from pdb_spfile$ where name='open_cursors';
VALUE$
--------------------------------------------------------------------------------
110
SQL> delete from  pdb_spfile$ where name='open_cursors';
1 row deleted.
SQL> commit;
Commit complete.
SQL> startup
ORACLE instance started.
Total System Global Area  597098496 bytes
Fixed Size                  2291072 bytes
Variable Size             272632448 bytes
Database Buffers          314572800 bytes
Redo Buffers                7602176 bytes
Database mounted.
Database opened.
SQL> select value$ from pdb_spfile$ where name='open_cursors';
no rows selected
SQL> show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     100
SQL> alter session set container=pdb2 ;
Session altered.
SQL> alter database open;
Database altered.
SQL>  show parameter open_cursors;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     100

删除PDB_SPFILE$中相关记录,pdb的参数值会自动继续继承cdb中参数值
总结说明:通过上述的一些列试验证明cdb中参数关系,在cdb中修改,会默认所有pdb均自动继承;如果在pdb中修改值会覆盖cdb参数,而且只对当前pdb生效,并记录在PDB_SPFILE$

ORACLE 12C ASM 新特性:共享密码文件

在ORACLE 12C之前大家都知道密码文件是存放在?/dbs或者?/database中,如果要修改修改sysdba权限的用户密码时候,会去修改密码文件,而在rac数据库的sys密码文件是存在各个节点中,这个时候修改sysdba权限的密码就需要在两个节点都要做同样的操作,而对于数据库来说本身是只要在一个节点上修改即可,因为密码是记录在user$中,就是因为密码文件非共享且在各个节点中都有,因此需要在各个节点均要执行修改密码命令,确保密码文件被正常修改。因为rac 密码文件非共享的机制存在,导致修改sysdba权限密码繁琐,有些时候甚至有节点忘记修改,导致需要使用密码文件操作数据库的时候不能正常进行,DG传输日志异常等故障。在ORACLE 12C中为了解决这个问题,引入了密码文件可以存入ASM新特性,从而使得密码文件存储在ASM中实现所有节点共享,从而解决该问题.
ASM存储密码文件前提条件 COMPATIBLE.ASM>= 12.1
查询ASM信息

SQL>  select * from v$version;
BANNER                                                                               CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production              0
PL/SQL Release 12.1.0.1.0 - Production                                                    0
CORE    12.1.0.1.0      Production                                                        0
TNS for Linux: Version 12.1.0.1.0 - Production                                            0
NLSRTL Version 12.1.0.1.0 - Production                                                    0
SQL> select NAME,COMPATIBILITY from v$asm_diskgroup;
NAME                           COMPATIBILITY
------------------------------ ------------------------------------------------------------
DATA                           12.1.0.0.0

查询crs中关于db配置

[grid@xifenfei ~]$  srvctl config database -d cdb
Database unique name: cdb
Database name: cdb
Oracle home: /u01/app/oracle/product/12.1/db_1
Oracle user: oracle
Spfile: +DATA/cdb/spfilecdb.ora
Password file:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: MANUAL
Database instance: cdb
Disk Groups: DATA
Services:

这里db的password file为空,即表示使用默认值,也就是为$ORACLE_HOME/dbs/orapwxifenfei

创建密码文件存储在ASM中

--创建db新密码文件
[oracle@xifenfei ~]$ orapwd file='+data/CDB/orapwdxifenfei' dbuniquename='cdb'
Enter password for SYS:
----输入sys用户密码
--创建asm新密码文件
orapwd file='+data/ASM/orapwasm' asm=y
----asm=y 表示创建的密码文件为asm的
--使用老密码文件创建db/asm新密码文件
orapwd input_file='/oraclegrid/dbs/orapwasm' file='+data/ASM/orapwasm' [asm=y]
----input_file 表示使用老的密码文件创建新的存储在ASM中的密码文件

查看ASM中密码文件

ASMCMD> showversion
ASM version         : 12.1.0.1.0
ASMCMD> pwd
+data/cdb
ASMCMD>  ls -l orapwdxifenfei
Type      Redund  Striped  Time             Sys  Name
PASSWORD  UNPROT  COARSE   MAY 31 19:00:00  N    orapwdxifenfei => +DATA/CDB/PASSWORD/pwdcdb.290.816897265

配置crs中password file项

[grid@xifenfei ~]$ srvctl modify database -db cdb -pwfile  +data/CDB/orapwdxifenfei

查询crs中关于db配置

[grid@xifenfei ~]$  srvctl config database -d cdb
Database unique name: cdb
Database name: cdb
Oracle home: /u01/app/oracle/product/12.1/db_1
Oracle user: oracle
Spfile: +DATA/cdb/spfilecdb.ora
Password file: +data/CDB/orapwdxifenfei
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: MANUAL
Database instance: cdb
Disk Groups: DATA
Services:

至此数据库启动使用密码ASM中的密码文件完成,补充说明,该方式配置在ASM中的密码文件,只能是通过crs方式启动db才会生效,如果手工使用sqlplus启动数据库不会使用该密码文件,还是使用默认密码文件。这里也就提醒大家操作规范:在RAC环境(包含单节点的GI环境)中,对数据库的启动关闭操作强烈建议使用crs相关命令来完成,而不推荐使用sqlplus命令

跳过rman坏块恢复

在有些情况下,我们仅有一份rman备份,而这个时候rman 备份有出现坏块,使得我们的还原/恢复工作无法继续下去,导致数据大量丢失.我们可以通过设置event 19548/19549来跳过坏块,最大程度抢救数据
rman备份数据文件

C:\Users\XIFENFEI>rman target /
Recovery Manager: Release 11.2.0.3.0 - Production on Thu Jun 6 20:31:19 2013
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
connected to target database: XIFENFEI (DBID=1422012639)
RMAN> backup tablespace users format 'f:/users_bak.rman';
Starting backup at 06-JUN-13
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=197 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00004 name=E:\ORACLE\ORADATA\XIFENFEI\USERS01.DBF
channel ORA_DISK_1: starting piece 1 at 06-JUN-13
channel ORA_DISK_1: finished piece 1 at 06-JUN-13
piece handle=F:\USERS_BAK.RMAN tag=TAG20130606T203154 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
Finished backup at 06-JUN-13

切换归档日志

SQL> alter system switch logfile;
System altered.
SQL> /
System altered.
SQL> /
System altered.
SQL> archive log list;
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            E:\oracle\product\11.2.0\dbhome_1\RDBMS
Oldest online log sequence     95
Next log sequence to archive   97
Current log sequence           97

重命名数据文件

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
--------------------------------------
e:\oracle\oradata\XIFENFEI>move USERS01.DBF USERS01_bak.DBF
移动了         1 个文件。
--------------------------------------
SQL> startup
ORACLE instance started.
Total System Global Area  418484224 bytes
Fixed Size                  1385052 bytes
Variable Size             327159204 bytes
Database Buffers           83886080 bytes
Redo Buffers                6053888 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
ORA-01110: data file 4: 'E:\ORACLE\ORADATA\XIFENFEI\USERS01.DBF'

破坏备份集
破坏前


破坏后


这里很明显,我通过ue把rman备份集中的T修改为了A,肯定破坏了文件,使之出现坏块

rman还原数据文件

C:\Users\XIFENFEI>rman target /
Recovery Manager: Release 11.2.0.3.0 - Production on Thu Jun 6 21:02:41 2013
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
connected to target database: XIFENFEI (DBID=1422012639, not open)
RMAN> restore datafile 4;
Starting restore at 06-JUN-13
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=63 device type=DISK
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00004 to E:\ORACLE\ORADATA\XIFENFEI\USERS
01.DBF
channel ORA_DISK_1: reading from backup piece F:\USERS_BAK.RMAN
channel ORA_DISK_1: ORA-19870: error while restoring backup piece F:\USERS_BAK.R
MAN
ORA-19612: datafile 4 not restored due to missing or corrupt data
failover to previous backup
creating datafile file number=4 name=E:\ORACLE\ORADATA\XIFENFEI\USERS01.DBF
Finished restore at 06-JUN-13

这里可以清晰的看到rman报ORA-19612错误,restore 失败,alert日志为:

Thu Jun 06 21:02:31 2013
ALTER DATABASE OPEN
Errors in file E:\ORACLE\diag\rdbms\xifenfei\xff\trace\xff_dbw0_7400.trc:
ORA-01157: ????/?????? 4 - ??? DBWR ????
ORA-01110: ???? 4: 'E:\ORACLE\ORADATA\XIFENFEI\USERS01.DBF'
ORA-27041: ??????
OSD-04002: unable to open file
O/S-Error: (OS 2) 系统找不到指定的文件。
Errors in file E:\ORACLE\diag\rdbms\xifenfei\xff\trace\xff_ora_4272.trc:
ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
ORA-01110: data file 4: 'E:\ORACLE\ORADATA\XIFENFEI\USERS01.DBF'
ORA-1157 signalled during: ALTER DATABASE OPEN...
Thu Jun 06 21:02:33 2013
Checker run found 1 new persistent data failures
Thu Jun 06 21:03:23 2013
Corrupt block 101 found during reading backup piece, file=F:\USERS_BAK.RMAN, corr_type=3
Reread of blocknum=101, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=101, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=101, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=101, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=101, file=F:\USERS_BAK.RMAN, found same corrupt data
Continuing reading piece F:\USERS_BAK.RMAN, no other copies available.

rman备份集有坏块,导致rman还原无法正常进行下去,还原后的数据文件大小


观察已经正常还原出来数据文件情况

SQL> select CHECKPOINT_CHANGE#,file# from v$datafile_header;
CHECKPOINT_CHANGE#      FILE#
------------------ ----------
           1571582          1
           1571582          2
           1571582          3
             18379          4
           1571582          5
           1571582          6
           1571582          7
SQL> recover database datafile 4 ;
ORA-00274: illegal recovery option DATAFILE
SQL> recover datafile 4;
ORA-00279: change 18379 generated at 01/20/2013 17:13:56 needed for thread 1
ORA-00289: suggestion :
E:\ORACLE\PRODUCT\11.2.0\DBHOME_1\RDBMS\ARC0000000001_0805223583.0001
ORA-00280: change 18379 for thread 1 is in sequence #1
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

rman只是还原了很小的一部分文件,做恢复提示需要从归档日志seq 1开始(某些情况可能需要其他归档,总之不是正常情况),证明rman还原异常

设置event事件还原

SQL> shutdown abort;
ORACLE instance shut down.
SQL> startup pfile='e:/pfile.txt' mount;
ORACLE instance started.
Total System Global Area  418484224 bytes
Fixed Size                  1385052 bytes
Variable Size             327159204 bytes
Database Buffers           83886080 bytes
Redo Buffers                6053888 bytes
Database mounted.
SQL> show parameter event;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
event                                string      19548 trace name context forev
                                                 er, 19549 trace name context f
                                                 orever
Event 19548:This will attempt to restore content of the corrupted block if it is possible.
Event 19549:This will suppress erroring out during restore

rman还原数据文件

RMAN> restore datafile 4;
Starting restore at 06-JUN-13
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=63 device type=DISK
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00004 to E:\ORACLE\ORADATA\XIFENFEI\USERS
01.DBF
channel ORA_DISK_1: reading from backup piece F:\USERS_BAK.RMAN
channel ORA_DISK_1: piece handle=F:\USERS_BAK.RMAN tag=TAG20130606T203154
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:35
Finished restore at 06-JUN-13

这里证明数据库rman有坏块通过rman还原成功,alert日志提示如下

Thu Jun 06 21:29:53 2013
WARNING: The block that appears to be block number 100
         in file 4 is corrupt in backup piece F:\USERS_BAK.RMAN.
         Such blocks would usually be formatted as empty
         in the restored file, but event 19548 has been
         set to include the block as-is in the restored
         file.
Corrupt block 102 found during reading backup piece, file=F:\USERS_BAK.RMAN, corr_type=-2
Reread of blocknum=102, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=102, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=102, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=102, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=102, file=F:\USERS_BAK.RMAN, found same corrupt data
Continuing reading piece F:\USERS_BAK.RMAN, no other copies available.
…………
Corrupt block 258 found during reading backup piece, file=F:\USERS_BAK.RMAN, corr_type=-2
Reread of blocknum=258, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=258, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=258, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=258, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=258, file=F:\USERS_BAK.RMAN, found same corrupt data
Continuing reading piece F:\USERS_BAK.RMAN, no other copies available.
WARNING: some data in the backup of file 4 was missing
         or corrupt.  Event 19549 has been set to allow
         the file to be restored anyway.
           backup header block count: 5369
           backup actual block count: 5212
              backup header checksum: -218250743
              backup actual checksum: 1442665538
Full restore complete of datafile 4 E:\ORACLE\ORADATA\XIFENFEI\USERS01.DBF.  Elapsed time: 0:00:25
  checkpoint is 1570136
  last deallocation scn is 1508457

这里rman还原依然遇到很多坏块,但是均跳过坏块,还是完整的恢复出来的数据文件(大小)


rman还原数据文件

RMAN> recover datafile 4;
Starting recover at 06-JUN-13
using channel ORA_DISK_1
starting media recovery
archived log for thread 1 with sequence 94 is already on disk as file E:\ORACLE\
PRODUCT\11.2.0\DBHOME_1\RDBMS\ARC0000000094_0805223583.0001
archived log for thread 1 with sequence 95 is already on disk as file E:\ORACLE\
PRODUCT\11.2.0\DBHOME_1\RDBMS\ARC0000000095_0805223583.0001
archived log for thread 1 with sequence 96 is already on disk as file E:\ORACLE\
PRODUCT\11.2.0\DBHOME_1\RDBMS\ARC0000000096_0805223583.0001
archived log file name=E:\ORACLE\PRODUCT\11.2.0\DBHOME_1\RDBMS\ARC0000000094_080
5223583.0001 thread=1 sequence=94
media recovery complete, elapsed time: 00:00:00
Finished recover at 06-JUN-13

这里可以明显的看到在recover过程中数据库应用的是备份后的所有归档,数据文件是正常被还原出来(坏块除外)

查询对象

SQL> alter database open;
Database altered.
SQL> conn test/test
Connected.
SQL> select * from tab;
TNAME                          TABTYPE  CLUSTERID
------------------------------ ------- ----------
STB101                         TABLE
SQL> select count(*) from stb101;
select count(*) from stb101
                     *
ERROR at line 1:
ORA-08103: object no longer exists

dbv检查坏块

e:\oracle\oradata\XIFENFEI>dbv file=USERS01.DBF
DBVERIFY: Release 11.2.0.3.0 - Production on Thu Jun 6 23:59:49 2013
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
DBVERIFY - Verification starting : FILE = E:\ORACLE\ORADATA\XIFENFEI\USERS01.DBF
Page 100 is marked corrupt
Corrupt block relative dba: 0x01000064 (file 4, block 100)
Bad check value found during dbv:
Data in bad block:
 type: 30 format: 2 rdba: 0x01000064
 last change scn: 0x0000.00004890 seq: 0x1 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x48901e01
 check value in block header: 0x8311
 computed block checksum: 0x20
DBVERIFY - Verification complete
Total Pages Examined         : 12320
Total Pages Processed (Data) : 4952
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 7069
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 298

证明设置了event之后,rman确实跳过了备份集中的坏块,而且是直接还原了坏块内容,证明了event 19548和19549作用

补充说明
在非特殊情况下强烈不建议设置相关event跳过rman中的坏块来还原/恢复数据库,这样将对数据的丢失,甚至数据库是否可以正常open不好评估,rman备份重要,确保rman备份可用也很重要.

记录因磁盘头被重写,抢救redo恢复经历

客户使用赛门铁克做同城异地容灾部署extent rac,因某种情况导致主备容灾不同步,然后在主库中进行了若干操作,导致主库所有裸设备丢失,然后进行了一些列的操作,主库识别了裸设备,但是oracle出现异常
数据库裸设备异常

Fri May 31 22:07:39 2013
ORA-00202: control file: '/dev/rcontrol2'
ORA-27047: unable to read the header block of file
Additional information: 2
ORA-205 signalled during: ALTER DATABASE   MOUNT...

使用备份还原控制文件后,查询数据文件头v$datafile_header.error全部为”WRONG FILE TYPE”,使用bbed去查看,10个block以内全部是0,证明数据库文件头也损坏。因为客户的数据库虽然有rman备份,但涉及到memory,对redo的信息也很敏感,所以希望能够在他们当前的情况下评估redo是否可以应用,确保他们的数据不丢失.验证文件头

DATA FILE #1:
  (name #4) /dev/rsystem
creation size=128000 block size=8192 status=0xe head=4 tail=4 dup=1
 tablespace 0, index=1 krfil=1 prev_file=0
 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
 Checkpoint cnt:22469 scn: 0x0000.7b4f9d86 05/29/2013 22:09:50
 Stop scn: 0xffff.ffffffff 05/15/2013 00:08:31
 Creation Checkpointed at scn:  0x0000.00000009 05/20/2007 21:52:41
 thread:1 rba:(0x1.3.10)
 enabled  threads:  01000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000
 Offline scn: 0x0000.00000000 prev_range: 0
 Online Checkpointed at scn:  0x0000.00000000
 thread:0 rba:(0x0.0.0)
 enabled  threads:  00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000
 Hot Backup end marker scn: 0x0000.00000000
 aux_file is NOT DEFINED
File header version cannot be determined due to corruption
Dump may be suspect
 V10 STYLE FILE HEADER:
	Software vsn=0=0x0, Compatibility Vsn=0=0x0
	Db ID=0=0x0, Db Name=''
	Activation ID=0=0x0
	Control Seq=0=0x0, File size=0=0x0
	File Number=0, Blksiz=0, File Type=0 UNKNOWN
Tablespace #0 -   rel_fn:0
Creation   at   scn: 0x0000.00000000 01/01/1988 00:00:00
Backup taken at scn: 0x0000.00000000 01/01/1988 00:00:00 thread:0
 reset logs count:0x0 scn: 0x0000.00000000 reset logs terminal rcv data:0x0 scn: 0x0000.00000000
 prev reset logs count:0x0 scn: 0x0000.00000000 prev reset logs terminal rcv data:0x0 scn: 0x0000.00000000
 recovered at 01/01/1988 00:00:00
 status:0x0 root dba:0x00000000 chkpt cnt: 0 ctl cnt:0
begin-hot-backup file size: 0
Checkpointed at scn:  0x0000.00000000
 thread:0 rba:(0x0.0.0)
 enabled  threads:  00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000
Backup Checkpointed at scn:  0x0000.00000000
 thread:0 rba:(0x0.0.0)
 enabled  threads:  00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000
External cache id: 0x0 0x0 0x0 0x0
Absolute fuzzy scn: 0x0000.00000000
Recovery fuzzy scn: 0x0000.00000000 01/01/1988 00:00:00
Terminal Recovery Stamp  01/01/1988 00:00:00

这里可以看出来文件头完全损坏,dump redo header

LOG FILE #1:
  (name #1) /dev/rredo11
 Thread 1 redo log links: forward: 2 backward: 0
 siz: 0x3c000 seq: 0x00003f9c hws: 0x2 bsz: 512 nab: 0x1763c flg: 0x1 dup: 1
 Archive links: fwrd: 0 back: 0 Prev scn: 0x0000.7b458379
 Low scn: 0x0000.7b4583ad 05/29/2013 01:32:39
 Next scn: 0x0000.7b4fa0e5 05/29/2013 22:17:59
 FILE HEADER:
	Software vsn=0=0x0, Compatibility Vsn=0=0x0
	Db ID=0=0x0, Db Name=''
	Activation ID=0=0x0
	Control Seq=0=0x0, File size=0=0x0
	File Number=0, Blksiz=0, File Type=0 UNKNOWN
 descrip:""
 thread: 0 nab: 0x0 seq: 0x00000000 hws: 0x0 eot: 0 dis: 0
 reset logs count: 0x0 scn: 0x0000.00000000
 Low scn: 0x0000.00000000 01/01/1988 00:00:00
 Next scn: 0x0000.00000000 01/01/1988 00:00:00
 Enabled scn: 0x0000.00000000 01/01/1988 00:00:00
 Thread closed scn: 0x0000.00000000 01/01/1988 00:00:00
 Log format vsn: 0x0 Disk cksum: 0x0 Calc cksum: 0x0
 Terminal Recovery Stop scn: 0x0000.00000000
 Terminal Recovery Stamp  01/01/1988 00:00:00
 Most recent redo scn: 0x0000.00000000
 Largest LWN: 0 blocks
 Miscellaneous flags: 0x0
 Thread internal enable indicator: thr: 0, seq: 0 scn: 0x0000.00000000

验证redo header已经异常,dump redo logfile全部提示文件头错误


现在情况已经很明显,客户的库因为使用了裸设备,online 磁盘的过程中,所有的裸设备卷已经重写了文件头,oracle的datafile header信息不在了,无法正常操作完成.我们决定使用rman备份来恢复该数据库,然后想办法处理redo

注册带库备份集
在rman恢复过程中,我们遇到一个问题,客户的库rman备份策略有问题,一周一个全备,每天一个增量备份,一次归档备份,最后一次增量备份后备份了控制文件,但是最后一次归档备份之后无控制文件,而且是归档的备份发生在增量备份之后,因为是使用了带库无catalog库,我们增量恢复之后,数据不一致需要归档,但是归档,而归档的备份未记录在还原出来的控制文件中,需要人工注册带库的备份集到控制文件中

--涉及到3个节点都有配置不同的NB_ORA_CLIENT=pysa,如果在同一个节点中还原归档日志,需要配置如下
CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' PARMS  'ENV=(NB_ORA_CLIENT=pysa)';
--如果在默认节点直接分配sbt通道即可
configure default device type to sbt_tape;
--注册带库备份集
catalog device type 'sbt_tape' backuppiece 'al_21395_1_816744765';

分析redo
通过ue打开dd出来的redo文件,我们分析得到20000h(10*8192)全部为0,应该是和赛门铁克存储管理系统有关系,后面开始是aix的设备头信息

正常redo文件信息

该库redo信息对比(获得aix偏移量)

对比正常redo起点信息和经验我们定位到aix的设备头偏移量为1000h(4096),整体偏移量为21000h(10*8192+4096)
该库的redo起点为21000h,也就是说,我们需要执行的dd语句为类似语句(redo 大小为120M)

dd if=/dev/vx/rdsk/dg/redo31 bs=512 skip=264 count=245761 of=/arch/xifenfei/redo31

dd出来所有数据后,对先要已经应用了归档的库,继续尝试recover redo

SQL> recover database using backup controlfile;
ORA-00279: change 2069348436 generated at 05/30/2013 16:18:08 needed for thread
1
ORA-00289: suggestion : /arch/1_16289_623109141.dbf
ORA-00280: change 2069348436 for thread 1 is in sequence #16289
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/arch/xifenfei/redo12
ORA-00279: change 2069348436 generated at 05/30/2013 16:18:08 needed for thread
2
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/arch/xifenfei/redo23
ORA-00279: change 2069348436 generated at 05/30/2013 16:18:07 needed for thread
3
ORA-00289: suggestion : /arch/3_3898_623109141.dbf
ORA-00280: change 2069348436 for thread 3 is in sequence #3898
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/arch/xifenfei/redo31
ORA-00279: change 2069455373 generated at 05/30/2013 20:03:26 needed for thread
1
ORA-00289: suggestion : /arch/1_16290_623109141.dbf
ORA-00280: change 2069455373 for thread 1 is in sequence #16290
ORA-00278: log file '/arch/xifenfei/redo12' no longer needed for this recovery
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/arch/xifenfei/redo11
ORA-00279: change 2069475956 generated at 05/30/2013 20:04:09 needed for thread
3
ORA-00289: suggestion : /arch/3_3899_623109141.dbf
ORA-00280: change 2069475956 for thread 3 is in sequence #3899
ORA-00278: log file '/arch/xifenfei/redo31' no longer needed for this recovery
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/arch/xifenfei/redo33
Log applied.
Media recovery complete.
SQL> alter database open resetlogs;
Database altered.

到这一步我们完整的通过dd跳过了由于赛门铁克管理磁盘导致的磁盘头损坏的块,从裸设备中复制出redo到文件系统,然后进行恢复,完整的抢救了客户的数据,减少了客户的损坏.这里温馨提示对于非常重要的系统(涉及钱),强烈建议redo多路冗余,光依靠存储容灾,备份,dg依然不够

通过基表获取segment header block

数据库不能open的时候,可以通过dul挖取相关基表(user$,obj$,ts$,tab$,seg$,file$),从而来获得segment header信息,然后通过dump该block,结合shell脚本获得extents分布脚本来获得extent分布

   SELECT NVL (u.name, 'SYS'),
          o.name,
          o.subname,
          so.object_type,
          s.type#,
          DECODE (BITAND (s.spare1, 2097408),
                  2097152, 'SECUREFILE',
                  256, 'ASSM',
                  'MSSM'),
          ts.ts#,
          ts.name,
          ts.blocksize,
          f.file#,
          s.block#,
          s.blocks * ts.blocksize,
          s.blocks,
          s.extents,
          s.iniexts * ts.blocksize,
          s.extsize * ts.blocksize,
          s.minexts,
          s.maxexts,
          DECODE (BITAND (s.spare1, 4194304), 4194304, bitmapranges, NULL),
          TO_CHAR (
             DECODE (
                BITAND (s.spare1, 2097152),
                2097152, DECODE (s.lists,
                                 0, 'NONE',
                                 1, 'AUTO',
                                 2, 'MIN',
                                 3, 'MAX',
                                 4, 'DEFAULT',
                                 'INVALID'),
                NULL)),
          DECODE (BITAND (s.spare1, 2097152), 2097152, s.groups, NULL),
          DECODE (BITAND (ts.flags, 3), 1, TO_NUMBER (NULL), s.extpct),
          DECODE (BITAND (ts.flags, 32),
                  32, TO_NUMBER (NULL),
                  DECODE (s.lists, 0, 1, s.lists)),
          DECODE (BITAND (ts.flags, 32),
                  32, TO_NUMBER (NULL),
                  DECODE (s.groups, 0, 1, s.groups)),
          s.file#,
          BITAND (s.cachehint, 3),
          BITAND (s.cachehint, 12) / 4,
          BITAND (s.cachehint, 48) / 16,
          NVL (s.spare1, 0),
          o.dataobj#
     FROM chf.user$ u,
          chf.obj$ o,
          chf.ts$ ts,
          ( SELECT DECODE (BITAND (t.property, 8192), 8192, 'NESTED TABLE', 'TABLE') OBJECT_TYPE,
          2 OBJECT_TYPE_ID,
          5 SEGMENT_TYPE_ID,
          t.obj# OBJECT_ID,
          t.file# HEADER_FILE,
          t.block# HEADER_BLOCK,
          t.ts# TS_NUMBER
     FROM chf.tab$ t) so,
          chf.seg$ s,
          chf.file$ f
    WHERE     s.file# = so.header_file
          AND s.block# = so.header_block
          AND s.ts# = so.ts_number
          AND s.ts# = ts.ts#
          AND o.obj# = so.object_id
          AND o.owner# = u.user#(+)
          AND s.type# = so.segment_type_id
          AND o.type# = so.object_type_id
          AND s.ts# = f.ts#
          AND s.file# = f.relfile#
          and o.name in('XIFENFEI','T_XIFENFEI');

shell脚本获得extents分布

比较深入看过dba_extents视图的朋友都知道,它得到extent的信息不是通过普通的存储在数据库中的基表获得,而是x$相关的表获得(x$表是数据库启动时候在内存中创建,不存在数据文件中),因为当数据库未正常启动,我们无法直接确定某个block是否在某个对象中.其实关于extent的信息都已经记录在了segment header的block中,通过dump该block记录的rdba信息,未转化为file_id和block_id,这里写shell脚本实现把segment header dump 内容转化为类似dba_extents记录,方便在某些不能open的库中分析某个异常block是否属于某个表

#! /bin/bash
dec2bin(){
  val_16=$1
  ((num=$val_16));
  val=`echo $num`
  local base=$2
  [ $val -eq 0 ] && bin=0
if [ $val -ge $base ]; then
    dec2bin $val $((base*2))
    if [ $val -ge $base ]; then
      val=$(($val-$base))
      bin=${bin}1
    else
      bin=${bin}0
    fi
  fi
  [ $base -eq 1 ] && printf  $bin
}
for i in `grep "length:" $1 |awk '{print $1 $3}'`;
do
rdba=`echo ${i:0:10}`
blocks=`echo ${i:10}`
echo -n "rdba:"$rdba"    "
bin2=`dec2bin $rdba  1`
len=`expr length $bin2`
len_gd=22
len_jg=`expr $len - $len_gd`
file_no_2=`echo ${bin2:0:$len_jg}`
((file_no=2#$file_no_2))
echo -n "file_id:"$file_no"    "
block_no_2=`echo ${bin2:$len_jg}`
((block_no=2#$block_no_2))
echo -n "block_id:"$block_no"    "
echo  "blocks:"$blocks
done;

trace文件中部分信息

  -----------------------------------------------------------------
   0x00400901  length: 7
   0x00402e10  length: 8
   0x00402e60  length: 8
   0x00402e68  length: 8
   0x00402ea0  length: 8
   0x00402f20  length: 8
   0x00402f48  length: 8
   0x00403050  length: 8
   0x00403180  length: 8
   0x00403b38  length: 8
   0x00404c48  length: 8
   0x00404c78  length: 8
   0x00404cf8  length: 8
   0x00404da8  length: 8
   0x00404db8  length: 8
   0x00404de8  length: 8
   0x00404e80  length: 128
   0x00405900  length: 128
   0x00406500  length: 128
   0x00406980  length: 128
   0x00407480  length: 128
   0x00407500  length: 128
   0x00407680  length: 128
   0x00407800  length: 128
   0x00407880  length: 128
   0x00407a00  length: 128
   0x00407a80  length: 128
   0x00407c80  length: 128
…………

执行结果

[oracle@xifenfei tmp]$ ./get_extent.sh /u01/app/oracle/diag/rdbms/cdb/cdb/trace/cdb_ora_29565.trc
rdba:0x00400901    file_id:1    block_id:2305     blocks:7
rdba:0x00402e10    file_id:1    block_id:11792     blocks:8
rdba:0x00402e60    file_id:1    block_id:11872     blocks:8
rdba:0x00402e68    file_id:1    block_id:11880     blocks:8
rdba:0x00402ea0    file_id:1    block_id:11936     blocks:8
rdba:0x00402f20    file_id:1    block_id:12064     blocks:8
rdba:0x00402f48    file_id:1    block_id:12104     blocks:8
rdba:0x00403050    file_id:1    block_id:12368     blocks:8
rdba:0x00403180    file_id:1    block_id:12672     blocks:8
rdba:0x00403b38    file_id:1    block_id:15160     blocks:8
rdba:0x00404c48    file_id:1    block_id:19528     blocks:8
rdba:0x00404c78    file_id:1    block_id:19576     blocks:8
rdba:0x00404cf8    file_id:1    block_id:19704     blocks:8
rdba:0x00404da8    file_id:1    block_id:19880     blocks:8
rdba:0x00404db8    file_id:1    block_id:19896     blocks:8
rdba:0x00404de8    file_id:1    block_id:19944     blocks:8
rdba:0x00404e80    file_id:1    block_id:20096     blocks:128
rdba:0x00405900    file_id:1    block_id:22784     blocks:128
rdba:0x00406500    file_id:1    block_id:25856     blocks:128
rdba:0x00406980    file_id:1    block_id:27008     blocks:128
rdba:0x00407480    file_id:1    block_id:29824     blocks:128
rdba:0x00407500    file_id:1    block_id:29952     blocks:128
rdba:0x00407680    file_id:1    block_id:30336     blocks:128
rdba:0x00407800    file_id:1    block_id:30720     blocks:128
…………