Database SOS – 第78页 – Professional Database Recovery Support@+8617813235971

数据文件自扩展引起—enq: HW – contention

客户反馈数据库比较慢,严重影响业务运行,让我们进行分析
enq: HW – contention等待明显
登录上去查看发现大量的enq: HW – contention等相关等待.

确认相关对象

SQL> select p3 from v$session_wait where event = 'enq: HW - contention';
   P3
----------
34083635
SQL> select dbms_utility.data_block_address_block(34083635) RDBA_FILE,
  2  dbms_utility.data_block_address_file(34083635) RDBA_BLOCK from dual;
 RDBA_FILE RDBA_BLOCK
---------- ----------
    529203          8
SQL>  select owner, segment_type, segment_name,tablespace_name
  2      from dba_extents
  3      where file_id = 8
  4     and 529203 between block_id and block_id + blocks - 1;
   OWNER   SEGMENT_TYPE        SEGMENT_NAME 	    TABLESPACE_NAME
---------- ------------------- -------------------  -------------------
XXXX       INDEX               T_TRADEITEM_85        USERS

确认对应sql

分析表空间使用情况

增加数据文件

再次查询表空间使用情况

查询等待事件

通过这里的分析,由于USERS表空间的空闲表空间已经被完全使用,后续使用需要扩展,因而引起了enq: HW – contention等待,以及由于该等待引起了其他相关等待,当users表空间进行扩展之后,立马恢复正常.实际的例子证明,在实际生产中特别是高并发生产库中,数据文件自扩展会严重影响数据库性能.

恢复没有控制文件备份的rman数据文件备份

最近有朋友咨询,只有rman备份数据文件,无控制文件备份的问题,这里进行分析验证
只有数据文件的rman备份
这里只有数据文件的备份，没有任何控制文件的备份

C:\Documents and Settings\Administrator>e:
E:\>dir
 驱动器 E 中的卷是 新加卷
 卷的序列号是 68FB-8527
 E:\ 的目录
2018-03-29  10:53     6,163,873,792 FULL_20180329_01SV08N2_1_1.RMAN
2018-03-29  10:54     1,050,558,464 FULL_20180329_02SV0918_1_1.RMAN
2018-03-29  10:56     2,167,414,784 FULL_20180329_03SV092B_1_1.RMAN
2018-03-29  11:07     1,034,625,024 FULL_20180329_05SV09QQ_1_1.RMAN
2018-03-29  11:07         1,376,256 FULL_20180329_06SV09RJ_1_1.RMAN
2018-03-29  11:08         3,104,768 FULL_20180329_07SV09TP_1_1.RMAN

还原system01.dbf

SQL> startup nomount
ORACLE 例程已经启动。
Total System Global Area 2137886720 bytes
Fixed Size                  2177456 bytes
Variable Size            1224738384 bytes
Database Buffers          905969664 bytes
Redo Buffers                5001216 bytes
SQL> @restore_datafile.sql system01.dbf
Trying : E:/FULL_20180329_02SV0918_1_1.RMAN From : DISK
Data File : Found
------------------------
RESTORED Data File To: D:\oradata\orcl\system01.dbf
PL/SQL 过程已成功完成。

重建控制文件

SQL> startup nomount pfile=e:/pfile.txt
ORACLE 例程已经启动。
Total System Global Area 2137886720 bytes
Fixed Size                  2177456 bytes
Variable Size            1224738384 bytes
Database Buffers          905969664 bytes
Redo Buffers                5001216 bytes
SQL> CREATE CONTROLFILE REUSE DATABASE "ORCL" RESETLOGS  NOARCHIVELOG
  2      MAXLOGFILES 50
  3      MAXLOGMEMBERS 5
  4      MAXDATAFILES 100
  5      MAXINSTANCES 1
  6      MAXLOGHISTORY 226
  7  LOGFILE
  8    GROUP 1 'D:\oradata\orcl\redo01.log'  SIZE 100M,
  9    GROUP 2 'D:\oradata\orcl\redo02.log'  SIZE 100M,
 10    GROUP 3 'D:\oradata\orcl\redo03.log'  SIZE 100M
 11  DATAFILE
 12    'D:\oradata\orcl\system01.dbf'
 13  CHARACTER SET ZHS16GBK
 14  ;
控制文件已创建。

注册备份集

E:\>rman target /
恢复管理器: Release 11.2.0.1.0 - Production on 星期四 3月 29 15:46:17 2018
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
已连接到目标数据库: ORCL (DBID=1449113534, 未打开)
RMAN> catalog start with 'e:/*.rman';
使用目标数据库控制文件替代恢复目录
搜索与样式 e:/*.rman 匹配的所有文件
数据库未知文件的列表
=====================================
文件名: E:\FULL_20180329_01SV08N2_1_1.RMAN
文件名: E:\FULL_20180329_02SV0918_1_1.RMAN
文件名: E:\FULL_20180329_03SV092B_1_1.RMAN
文件名: E:\FULL_20180329_05SV09QQ_1_1.RMAN
文件名: E:\FULL_20180329_06SV09RJ_1_1.RMAN
文件名: E:\FULL_20180329_07SV09TP_1_1.RMAN
是否确实要将上述文件列入目录 (输入 YES 或 NO)? yes
正在编制文件目录...
目录编制完毕
已列入目录的文件的列表
=======================
文件名: E:\FULL_20180329_01SV08N2_1_1.RMAN
文件名: E:\FULL_20180329_02SV0918_1_1.RMAN
文件名: E:\FULL_20180329_03SV092B_1_1.RMAN
文件名: E:\FULL_20180329_05SV09QQ_1_1.RMAN
文件名: E:\FULL_20180329_06SV09RJ_1_1.RMAN
文件名: E:\FULL_20180329_07SV09TP_1_1.RMAN
RMAN> list backup;
备份集列表
===================
BS 关键字  类型 LV 大小       设备类型 经过时间 完成时间
------- ---- -- ---------- ----------- ------------ ----------
1       Full    5.74G      DISK        00:00:00     29-3月 -18
        BP 关键字: 1   状态: AVAILABLE  已压缩: NO  标记: TAG20180329T104802
段名:E:\FULL_20180329_01SV08N2_1_1.RMAN
  备份集 1 中的数据文件列表
  文件 LV 类型 Ckp SCN    Ckp 时间   名称
  ---- -- ---- ---------- ---------- ----
  5       Full 96384627   29-3月 -18
BS 关键字  类型 LV 大小       设备类型 经过时间 完成时间
------- ---- -- ---------- ----------- ------------ ----------
2       Full    1001.88M   DISK        00:00:00     29-3月 -18
        BP 关键字: 2   状态: AVAILABLE  已压缩: NO  标记: TAG20180329T104802
段名:E:\FULL_20180329_02SV0918_1_1.RMAN
  备份集 2 中的数据文件列表
  文件 LV 类型 Ckp SCN    Ckp 时间   名称
  ---- -- ---- ---------- ---------- ----
  1       Full 96384627   29-3月 -18 D:\ORADATA\ORCL\SYSTEM01.DBF
BS 关键字  类型 LV 大小       设备类型 经过时间 完成时间
------- ---- -- ---------- ----------- ------------ ----------
3       Full    2.02G      DISK        00:00:00     29-3月 -18
        BP 关键字: 3   状态: AVAILABLE  已压缩: NO  标记: TAG20180329T104802
段名:E:\FULL_20180329_03SV092B_1_1.RMAN
  备份集 3 中的数据文件列表
  文件 LV 类型 Ckp SCN    Ckp 时间   名称
  ---- -- ---- ---------- ---------- ----
  6       Full 96384627   29-3月 -18
BS 关键字  类型 LV 大小       设备类型 经过时间 完成时间
------- ---- -- ---------- ----------- ------------ ----------
4       Full    986.69M    DISK        00:00:00     29-3月 -18
        BP 关键字: 4   状态: AVAILABLE  已压缩: NO  标记: TAG20180329T110706
段名:E:\FULL_20180329_05SV09QQ_1_1.RMAN
  备份集 4 中的数据文件列表
  文件 LV 类型 Ckp SCN    Ckp 时间   名称
  ---- -- ---- ---------- ---------- ----
  2       Full 96384627   29-3月 -18
BS 关键字  类型 LV 大小       设备类型 经过时间 完成时间
------- ---- -- ---------- ----------- ------------ ----------
5       Full    1.30M      DISK        00:00:00     29-3月 -18
        BP 关键字: 5   状态: AVAILABLE  已压缩: NO  标记: TAG20180329T110706
段名:E:\FULL_20180329_06SV09RJ_1_1.RMAN
  备份集 5 中的数据文件列表
  文件 LV 类型 Ckp SCN    Ckp 时间   名称
  ---- -- ---- ---------- ---------- ----
  4       Full 96384627   29-3月 -18
BS 关键字  类型 LV 大小       设备类型 经过时间 完成时间
------- ---- -- ---------- ----------- ------------ ----------
6       Full    2.95M      DISK        00:00:00     29-3月 -18
        BP 关键字: 6   状态: AVAILABLE  已压缩: NO  标记: TAG20180329T110841
段名:E:\FULL_20180329_07SV09TP_1_1.RMAN
  备份集 6 中的数据文件列表
  文件 LV 类型 Ckp SCN    Ckp 时间   名称
  ---- -- ---- ---------- ---------- ----
  3       Full 96384627   29-3月 -18

还原所有数据文件

SQL> @restore_datafile all_file
Restoring All Data Files :
--------------------------
Attempting To Restore :D:\oradata\orcl\1.dbf
------------------------
Trying : E:/FULL_20180329_01SV08N2_1_1.RMAN From : DISK
Data File: Not Found
------------------------
Trying : E:/FULL_20180329_02SV0918_1_1.RMAN From : DISK
Data File : Found
------------------------
RESTORED Data File To: D:\oradata\orcl\1.dbf
Attempting To Restore :D:\oradata\orcl\2.dbf
------------------------
Trying : E:/FULL_20180329_01SV08N2_1_1.RMAN From : DISK
Data File: Not Found
------------------------
Trying : E:/FULL_20180329_02SV0918_1_1.RMAN From : DISK
Data File: Not Found
------------------------
Trying : E:/FULL_20180329_03SV092B_1_1.RMAN From : DISK
Data File: Not Found
------------------------
Trying : E:/FULL_20180329_05SV09QQ_1_1.RMAN From : DISK
Data File : Found
------------------------
RESTORED Data File To: D:\oradata\orcl\2.dbf
Attempting To Restore :D:\oradata\orcl\3.dbf
------------------------
Trying : E:/FULL_20180329_01SV08N2_1_1.RMAN From : DISK
Data File: Not Found
------------------------
Trying : E:/FULL_20180329_02SV0918_1_1.RMAN From : DISK
Data File: Not Found
------------------------
Trying : E:/FULL_20180329_03SV092B_1_1.RMAN From : DISK
Data File: Not Found
------------------------
Trying : E:/FULL_20180329_05SV09QQ_1_1.RMAN From : DISK
Data File: Not Found
------------------------
Trying : E:/FULL_20180329_06SV09RJ_1_1.RMAN From : DISK
Data File: Not Found
------------------------
Trying : E:/FULL_20180329_07SV09TP_1_1.RMAN From : DISK
Data File : Found
------------------------
RESTORED Data File To: D:\oradata\orcl\3.dbf
Attempting To Restore :D:\oradata\orcl\4.dbf
------------------------
Trying : E:/FULL_20180329_01SV08N2_1_1.RMAN From : DISK
Data File: Not Found
------------------------
Trying : E:/FULL_20180329_02SV0918_1_1.RMAN From : DISK
Data File: Not Found
------------------------
Trying : E:/FULL_20180329_03SV092B_1_1.RMAN From : DISK
Data File: Not Found
------------------------
Trying : E:/FULL_20180329_05SV09QQ_1_1.RMAN From : DISK
Data File: Not Found
------------------------
Trying : E:/FULL_20180329_06SV09RJ_1_1.RMAN From : DISK
Data File : Found
------------------------
RESTORED Data File To: D:\oradata\orcl\4.dbf
Attempting To Restore :D:\oradata\orcl\5.dbf
------------------------
Trying : E:/FULL_20180329_01SV08N2_1_1.RMAN From : DISK
Data File : Found
------------------------
RESTORED Data File To: D:\oradata\orcl\5.dbf
Attempting To Restore :D:\oradata\orcl\6.dbf
------------------------
Trying : E:/FULL_20180329_01SV08N2_1_1.RMAN From : DISK
Data File: Not Found
------------------------
Trying : E:/FULL_20180329_02SV0918_1_1.RMAN From : DISK
Data File: Not Found
------------------------
Trying : E:/FULL_20180329_03SV092B_1_1.RMAN From : DISK
Data File : Found
------------------------
RESTORED Data File To: D:\oradata\orcl\6.dbf
PL/SQL 过程已成功完成。

重建控制文件并open数据库

SQL> CREATE CONTROLFILE REUSE DATABASE "ORCL" RESETLOGS  NOARCHIVELOG
  2      MAXLOGFILES 50
  3      MAXLOGMEMBERS 5
  4      MAXDATAFILES 100
  5      MAXINSTANCES 1
  6      MAXLOGHISTORY 226
  7  LOGFILE
  8    GROUP 1 'D:\oradata\orcl\redo01.log'  SIZE 100M,
  9    GROUP 2 'D:\oradata\orcl\redo02.log'  SIZE 100M,
 10    GROUP 3 'D:\oradata\orcl\redo03.log'  SIZE 100M
 11  DATAFILE
 12    'D:\oradata\orcl\1.dbf',
 13    'D:\oradata\orcl\2.dbf',
 14    'D:\oradata\orcl\3.dbf',
 15    'D:\oradata\orcl\4.dbf',
 16    'D:\oradata\orcl\5.dbf',
 17    'D:\oradata\orcl\6.dbf'
 18  CHARACTER SET ZHS16GBK
 19  ;
控制文件已创建。
SQL> alter database open resetlogs;
数据库已更改。

这里本质就是通过oracle内部包,直接还原数据文件,这个模拟的是基础的情况,在实际的恢复中,由于只有数据文件的备份,可能文件不一致,还需要通过一些非常规方法对数据库进行强制打开

dataguard配合flashback实现主备任意切换(failover和switchover)

有客户使用本地和公有云通过vpn搭建数据库容灾,其中有一个需求,当本地环境出现问题云端容灾库接管业务,当本地环境恢复之后,本地继续做为主库,云端作为备库.我这里使用oracle dataguard结合flashback模拟实现客户需求(也可以在failover之后通过重新搭建dg实现类似需求,具体需要看客户的实际场景:数据量,带宽,恢复时间,技术能力等)
正常dg同步的主备环境
这里ora11g主机是主库,ora10g主机是备库

--数据库版本
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
PL/SQL Release 11.2.0.4.0 - Production
CORE    11.2.0.4.0      Production
TNS for Linux: Version 11.2.0.4.0 - Production
NLSRTL Version 11.2.0.4.0 - Production
---主库(ora11g)
SQL> select open_mode,database_role,(select HOST_NAME from v$instance) HOST_NAME,flashback_on from v$database;
OPEN_MODE                      DATABASE_ROLE                  HOST_NAME                      FLASHBACK_
------------------------------ ------------------------------ ------------------------------ ----------
READ WRITE                     PRIMARY                        ora11g                         YES
SQL> create  table xff.t_xifenfei as select * from dba_objects;
Table created.
SQL> select count(*) from xff.t_xifenfei;
  COUNT(*)
----------
     86348
---备库(ora10g)
SQL> select open_mode,database_role,(select HOST_NAME from v$instance) HOST_NAME,flashback_on from v$database;
OPEN_MODE                      DATABASE_ROLE                  HOST_NAME                      FLASHBACK_
------------------------------ ------------------------------ ------------------------------ ----------
READ ONLY WITH APPLY           PHYSICAL STANDBY               ora10g                         YES
SQL> select count(*) from xff.t_xifenfei;
  COUNT(*)
----------
     86348

模拟主库(ora11g)故障,直接激活备库(ora10g)
模拟dg备库(ora10g)直接failover操作

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
Database altered.
SQL> ALTER DATABASE ACTIVATE PHYSICAL STANDBY DATABASE;
Database altered.

Sat Mar 24 16:05:40 2017
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
MRP0: Background Media Recovery cancelled with status 16037
Errors in file /u01/app/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_pr00_13428.trc:
ORA-16037: user requested cancel of managed recovery operation
Managed Standby Recovery not using Real Time Apply
Recovery interrupted!
Recovered data files to a consistent state at change 1470499
Sat Mar 24 16:05:40 2017
MRP0: Background Media Recovery process shutdown (ora11g)
Managed Standby Recovery Canceled (ora11g)
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
Sat Mar 24 16:05:50 2017
ALTER DATABASE ACTIVATE PHYSICAL STANDBY DATABASE
ALTER DATABASE ACTIVATE [PHYSICAL] STANDBY DATABASE (ora11g)
All dispatchers and shared servers shutdown
CLOSE: killing server sessions.
CLOSE: all sessions shutdown successfully.
Sat Mar 24 16:05:50 2017
SMON: disabling cache recovery
Killing 3 processes with pids 13372,13393,13388 (all RFS) in order to
    disallow current and future RFS connections. Requested by OS process 13350
Begin: Standby Redo Logfile archival
End: Standby Redo Logfile archival
RESETLOGS after incomplete recovery UNTIL CHANGE 1470499
Archived Log entry 9 added for thread 1 sequence 14 ID 0xfd5a5fc1 dest 1:
Resetting resetlogs activation ID 4250558401 (0xfd5a5fc1)
Online log /u01/app/oracle/oradata/ora11g/redo01.log: Thread 1 Group 1 was previously cleared
Online log /u01/app/oracle/oradata/ora11g/redo02.log: Thread 1 Group 2 was previously cleared
Online log /u01/app/oracle/oradata/ora11g/redo03.log: Thread 1 Group 3 was previously cleared
Standby became primary SCN: 1470497
Sat Mar 24 16:05:52 2017
Setting recovery target incarnation to 4
ACTIVATE STANDBY: Complete - Database mounted as primary
Completed: ALTER DATABASE ACTIVATE PHYSICAL STANDBY DATABASE

模拟新主库（ora10g）业务操作

SQL> select open_mode,database_role,(select HOST_NAME from v$instance) HOST_NAME,flashback_on from v$database;
OPEN_MODE                      DATABASE_ROLE                  HOST_NAME                      FLASHBACK_
------------------------------ ------------------------------ ------------------------------ ----------
MOUNTED                        PRIMARY                        ora10g                         YES
SQL> alter database open;
Database altered.
SQL>  select open_mode,database_role,(select HOST_NAME from v$instance) HOST_NAME,flashback_on from v$database;
OPEN_MODE                      DATABASE_ROLE                  HOST_NAME                      FLASHBACK_
------------------------------ ------------------------------ ------------------------------ ----------
READ WRITE                     PRIMARY                        ora10g                         YES
SQL> alter system switch logfile;
System altered.
SQL> /
System altered.
SQL> /
/
System altered.
SQL>
System altered.
SQL>
SQL> archive log list;
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     5
Next log sequence to archive   7
Current log sequence           7
SQL> delete from xff.t_xifenfei;
86348 rows deleted.
SQL> commit;
Commit complete.
SQL> select count(*) from xff.t_xifenfei;
  COUNT(*)
----------
         0

原主库(ora11g)配置为新备库

---原主库(ora10g)
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.
Total System Global Area 3056513024 bytes
Fixed Size                  2257152 bytes
Variable Size             704646912 bytes
Database Buffers         2332033024 bytes
Redo Buffers               17575936 bytes
Database mounted.
SQL> Flashback database to scn 1470490;
Flashback complete.
SQL> select open_mode,database_role,(select HOST_NAME from v$instance) HOST_NAME,flashback_on from v$database;
OPEN_MODE                      DATABASE_ROLE                  HOST_NAME                      FLASHBACK_
------------------------------ ------------------------------ ------------------------------ ----------
MOUNTED                        PRIMARY                        ora11g                         YES
--现在主库(ora10g)
SQL> alter database create standby controlfile as '/tmp/ctl.3';
Database altered.
[oracle@ora10g ~]$ scp /tmp/ctl.3 192.168.222.11:/u01/app/oracle/oradata/ora11g/control01.ctl
oracle@192.168.222.11's password:
ctl.3
--现在备库(ora11g)
SQL> shutdown abort
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.
Total System Global Area 3056513024 bytes
Fixed Size                  2257152 bytes
Variable Size             704646912 bytes
Database Buffers         2332033024 bytes
Redo Buffers               17575936 bytes
Database mounted.
SQL> col open_mode for a30
SQL> col database_role for a30
SQL> col HOST_NAME for a30
SQL> col flashback_on for a10
SQL> set lines 150
SQL> select open_mode,database_role,(select HOST_NAME from v$instance) HOST_NAME,flashback_on from v$database;
OPEN_MODE                      DATABASE_ROLE                  HOST_NAME                      FLASHBACK_
------------------------------ ------------------------------ ------------------------------ ----------
MOUNTED                        PHYSICAL STANDBY               ora11g                         NO
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT  LOGFILE DISCONNECT FROM SESSION;
Database altered.

Sat Mar 24 16:10:09 2017
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT  LOGFILE DISCONNECT FROM SESSION
Attempt to start background Managed Standby Recovery process (ora11g)
Sat Mar 24 16:10:09 2017
MRP0 started with pid=27, OS id=27086
MRP0: Background Managed Standby Recovery process started (ora11g)
 started logmerger process
Sat Mar 24 16:10:14 2017
Managed Standby Recovery starting Real Time Apply
Parallel Media Recovery started with 4 slaves
Media Recovery start incarnation depth : 1, target inc# : 4, irscn : 1470499
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Clearing online redo logfile 1 /u01/app/oracle/oradata/ora11g/redo01.log
Clearing online log 1 of thread 1 sequence number 7
Clearing online redo logfile 1 complete
Clearing online redo logfile 2 /u01/app/oracle/oradata/ora11g/redo02.log
Clearing online log 2 of thread 1 sequence number 8
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT  LOGFILE DISCONNECT FROM SESSION
Clearing online redo logfile 2 complete
Clearing online redo logfile 3 /u01/app/oracle/oradata/ora11g/redo03.log
Clearing online log 3 of thread 1 sequence number 9
Clearing online redo logfile 3 complete
Media Recovery Waiting for thread 1 sequence 13 branch(resetlogs_id) 854379205
Sat Mar 24 16:10:16 2017
RFS[3]: Assigned to RFS process 27098
RFS[3]: Opened log for thread 1 sequence 13 dbid 48871109 branch 854379205
Archived Log entry 2 added for thread 1 sequence 13 rlc 854379205 ID 0xfd5a5fc1 dest 2:
Sat Mar 24 16:10:16 2017
RFS[4]: Assigned to RFS process 27100
RFS[4]: Opened log for thread 1 sequence 14 dbid 48871109 branch 854379205
Archived Log entry 3 added for thread 1 sequence 14 rlc 854379205 ID 0xfd5a5fc1 dest 2:
RFS[3]: Opened log for thread 1 sequence 1 dbid 48871109 branch 947798452
RFS[4]: Opened log for thread 1 sequence 3 dbid 48871109 branch 947798452
Sat Mar 24 16:10:16 2017
RFS[5]: Assigned to RFS process 27070
RFS[5]: Opened log for thread 1 sequence 2 dbid 48871109 branch 947798452
Archived Log entry 4 added for thread 1 sequence 1 rlc 947798452 ID 0x2ec1ed0 dest 2:
Archived Log entry 5 added for thread 1 sequence 3 rlc 947798452 ID 0x2ec1ed0 dest 2:
RFS[3]: Opened log for thread 1 sequence 4 dbid 48871109 branch 947798452
RFS[4]: Opened log for thread 1 sequence 5 dbid 48871109 branch 947798452
Archived Log entry 6 added for thread 1 sequence 5 rlc 947798452 ID 0x2ec1ed0 dest 2:
Archived Log entry 7 added for thread 1 sequence 4 rlc 947798452 ID 0x2ec1ed0 dest 2:
Archived Log entry 8 added for thread 1 sequence 2 rlc 947798452 ID 0x2ec1ed0 dest 2:
RFS[4]: Opened log for thread 1 sequence 6 dbid 48871109 branch 947798452
Archived Log entry 9 added for thread 1 sequence 6 rlc 947798452 ID 0x2ec1ed0 dest 2:
RFS[3]: Opened log for thread 1 sequence 7 dbid 48871109 branch 947798452
Archived Log entry 10 added for thread 1 sequence 7 rlc 947798452 ID 0x2ec1ed0 dest 2:
Media Recovery Log /u01/app/oracle/fast_recovery_area/ORA11GDG/archivelog/2017_03_24/o1_mf_1_13_9xmyh8cs_.arc
Media Recovery Log /u01/app/oracle/fast_recovery_area/ORA11GDG/archivelog/2017_03_24/o1_mf_1_14_9xmyh8dr_.arc
Media Recovery Log /u01/app/oracle/fast_recovery_area/ORA11GDG/archivelog/2017_03_24/o1_mf_1_1_9xmyh8fk_.arc
Media Recovery Log /u01/app/oracle/fast_recovery_area/ORA11GDG/archivelog/2017_03_24/o1_mf_1_2_9xmyh8fo_.arc
Media Recovery Log /u01/app/oracle/fast_recovery_area/ORA11GDG/archivelog/2017_03_24/o1_mf_1_3_9xmyh8fm_.arc
Media Recovery Log /u01/app/oracle/fast_recovery_area/ORA11GDG/archivelog/2017_03_24/o1_mf_1_4_9xmyh8g3_.arc
Media Recovery Log /u01/app/oracle/fast_recovery_area/ORA11GDG/archivelog/2017_03_24/o1_mf_1_5_9xmyh8g4_.arc
Media Recovery Log /u01/app/oracle/fast_recovery_area/ORA11GDG/archivelog/2017_03_24/o1_mf_1_6_9xmyh8gs_.arc
Media Recovery Log /u01/app/oracle/fast_recovery_area/ORA11GDG/archivelog/2017_03_24/o1_mf_1_7_9xmyh8hl_.arc
Media Recovery Log /u01/app/oracle/fast_recovery_area/ORA11GDG/archivelog/2017_03_24/o1_mf_1_8_9xmy5d3f_.arc
Media Recovery Waiting for thread 1 sequence 9 (in transit)
Recovery of Online Redo Log: Thread 1 Group 4 Seq 9 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/ora11g/std_redo10.log

SQL>  ALTER DATABASE RECOVER MANAGED STANDBY DATABASE cancel;
Database altered.
SQL> alter database open;
Database altered.
SQL>  ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT  LOGFILE DISCONNECT FROM SESSION;
Database altered.
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE cancel;
Database altered.
SQL> alter database flashback on;
Database altered.
SQL> select open_mode,database_role,(select HOST_NAME from v$instance) HOST_NAME,flashback_on from v$database;
OPEN_MODE                      DATABASE_ROLE                  HOST_NAME                      FLASHBACK_
------------------------------ ------------------------------ ------------------------------ ----------
READ ONLY                      PHYSICAL STANDBY               ora11g                         YES
SQL> select count(*) from xff.t_xifenfei;
  COUNT(*)
----------
         0
--现在主库
SQL> drop table xff.t_xifenfei ;
Table dropped.
SQL> select count(*) from xff.t_xifenfei;
select count(*) from xff.t_xifenfei
                         *
ERROR at line 1:
ORA-00942: table or view does not exist
--现在备库
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT  LOGFILE DISCONNECT FROM SESSION;
Database altered.
SQL> select open_mode,database_role,(select HOST_NAME from v$instance) HOST_NAME,flashback_on from v$database;
OPEN_MODE                      DATABASE_ROLE                  HOST_NAME                      FLASHBACK_
------------------------------ ------------------------------ ------------------------------ ----------
READ ONLY WITH APPLY           PHYSICAL STANDBY               ora11g                         YES
SQL> select count(*) from xff.t_xifenfei;
select count(*) from xff.t_xifenfei
                         *
ERROR at line 1:
ORA-00942: table or view does not exist

通过上述一系列操作,以前假设故障的主库,现在变成了被failover激活的主库的备库,也就是说数据库主备关系由ora11g主库—>ora10g备库变成了ora10g主库—>ora11g备库

switchover实现主备库互换

---现在主库(ora10g)
SQL>  ALTER DATABASE COMMIT TO SWITCHOVER TO STANDBY WITH SESSION SHUTDOWN;
Database altered.
--现在备库(ora11g)
SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION SHUTDOWN;
Database altered.
SQL>  select open_mode,database_role,(select HOST_NAME from v$instance) HOST_NAME,flashback_on from v$database;
OPEN_MODE                      DATABASE_ROLE                  HOST_NAME                      FLASHBACK_
------------------------------ ------------------------------ ------------------------------ ----------
MOUNTED                        PRIMARY                        ora11g                         YES
SQL> alter database open;
Database altered.
SQL> select open_mode,database_role,(select HOST_NAME from v$instance) HOST_NAME,flashback_on from v$database;
OPEN_MODE                      DATABASE_ROLE                  HOST_NAME                      FLASHBACK_
------------------------------ ------------------------------ ------------------------------ ----------
READ WRITE                     PRIMARY                        ora11g                         YES
--最新备库(ora10g)
SQL> startup
ORACLE instance started.
Total System Global Area 3056513024 bytes
Fixed Size                  2257152 bytes
Variable Size             687869696 bytes
Database Buffers         2348810240 bytes
Redo Buffers               17575936 bytes
Database mounted.
Database opened.
SQL>  ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT  LOGFILE DISCONNECT FROM SESSION;
Database altered.
SQL> select open_mode,database_role,(select HOST_NAME from v$instance) HOST_NAME,flashback_on from v$database;
OPEN_MODE                      DATABASE_ROLE                  HOST_NAME                      FLASHBACK_
------------------------------ ------------------------------ ------------------------------ ----------
READ ONLY WITH APPLY           PHYSICAL STANDBY               ora10g                         YES
--最新主库(ora11g)
SQL> alter system switch logfile;
System altered.
SQL> create table t_xifenfei as select * from dba_objects;
Table created.
SQL> select count(*) from t_xifenfei;
  COUNT(*)
----------
     86347
--最新备库(ora10g)
SQL>  select count(*) from t_xifenfei;
  COUNT(*)
----------
     86347

通过switchover主备库再次互换由ora10g主库—>ora11g备库变成了ora11g主库—>ora10g备库,再次恢复到最初的状态.这个不是唯一的方法,可以通过重建dg,也能够实现类似需求.

比特币加密勒索间隔加密

最近我们在一个客户的oracle恢复case中发现比特币文件系统勒索加密比较特殊,和大家做一个分享
文件加密后缀名为:.$ILICONE

文件加密特点分析

DUL> dump datafile 5 block 1
Block Header:
block type=0x0b (file header)
block format=0xa2 (oracle 10)
block rdba=0x01400001 (file#=5, block#=1)
scn=0x0000.00000000, seq=1, tail=0x00000b01
block checksum value=0x6e7d=28285, flag=4
File Header:
Db Id=0xe1891cca=3783859402, Db Name=XIFENFEI, Root Dba=0x0
Software vsn=0x0, Compatibility Vsn=0xa200300, File Size=0x3ffffe=4194302 Blocks
File Type=0x3 (data file), File Number=5, Block Size=8192
Tablespace #7 - OA rel_fn:5
DUL> dump datafile 5 block 2
Block Header:
block type=0x63 (unknown)
block format=0x57 (unknown)
block rdba=0xc6538298 (file#=793, block#=1278616)
scn=0xe0ab.fdc4d8d0, seq=225, tail=0xa7b5cab5
block checksum value=0xfaa1=64161, flag=165
corrupted block.
DUL> dump datafile 5 block 3
Block Header:
block type=0x1e (LMT space map block)
block format=0xa2 (oracle 10)
block rdba=0x01400003 (file#=5, block#=3)
scn=0x0000.00246fbe, seq=1, tail=0x6fbe1e01
block checksum value=0xe495=58517, flag=4
DUL> dump datafile 5 block 4
Block Header:
block type=0x83 (unknown)
block format=0xa3 (unknown)
block rdba=0x17e4c9e4 (file#=95, block#=2410980)
scn=0xe3b2.fc505eea, seq=101, tail=0x6e2f1004
block checksum value=0x7f2e=32558, flag=196
corrupted block.
DUL> dump datafile 5 block 5
Block Header:
block type=0x1e (LMT space map block)
block format=0xa2 (oracle 10)
block rdba=0x01400005 (file#=5, block#=5)
scn=0x0000.00264875, seq=1, tail=0x48751e01
block checksum value=0xb25e=45662, flag=4
DUL> dump datafile 5 block 6
Block Header:
block type=0x68 (unknown)
block format=0x35 (unknown)
block rdba=0x7011e0e3 (file#=448, block#=1171683)
scn=0x47bf.9f2df54a, seq=207, tail=0x69ae0a91
block checksum value=0x49f8=18936, flag=174
corrupted block.

通过这里初步分析,确认加密是间隔方式加密,在数据库中表现明显的是每相隔8k进行加密,而且这里是偶数block被加密
确认加密文件结束位置

DUL> dump datafile 5 block 962818 header
Block Header:
block type=0x4d (unknown)
block format=0xde (unknown)
block rdba=0x0bab780d (file#=46, block#=2848781)
scn=0x056b.2c695f6b, seq=223, tail=0x2399e0cb
block checksum value=0x9706=38662, flag=212
corrupted block.
DUL> dump datafile 5 block 962820 header
Block Header:
block type=0x00 (blank block)
block format=0xa2 (oracle 10)
block rdba=0x014eb104 (file#=5, block#=962820)
scn=0x0000.00000000, seq=1, tail=0x00000001
block checksum value=0x174a=5962, flag=5

通过这里可以发现,对于一个32G的文件,一直被加密到block 962818,也就是7.34G(962818*8k),这里间隔加密,而且加密深度特别深,在以往的比特币文件系统加密中比较少见.
再次提醒
1. 不要把数据库暴露在外网
2. 相对linux而言,win更容易受到黑客的攻击
3. 数据库一定要做好备份,条件允许的情况下,配置数据实时同步到其他机器还是有必要的

Using mlock ulimits for SHM_HUGETLB is deprecated

Oracle 数据库运行在linux 6中,启用大页之后,我们经常在/var/log/messages里面会看到类似这样的记录:Mar 11 12:12:33 i-q2ghx82t kernel: oracle (3677): Using mlock ulimits for SHM_HUGETLB is deprecated,我这里的环境也有重现该问题
环境说明

--系统配置
[root@i-q2ghx82t ~]# more /etc/issue
CentOS release 6.8 (Final)
Kernel \r on an \m
[root@i-q2ghx82t ~]# ulimit  -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 128331
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 128331
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
[oracle@i-q2ghx82t ~]$ cat /proc/meminfo|grep Hu
AnonHugePages:         0 kB
HugePages_Total:   10752
HugePages_Free:    10752
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
[root@i-q2ghx82t ~]# more  /proc/sys/vm/hugetlb_shm_group
[root@i-q2ghx82t ~]# id oracle
uid=1000(oracle) gid=1000(oinstall) groups=1000(oinstall),1001(dba),1002(oper),1005(asmdba)
--数据库参数
use_large_pages=only

在本环境中数据库启动正常,大页也可以正常使用,但是在系统日志中有类似Mar 11 12:12:33 i-q2ghx82t kernel: oracle (3677): Using mlock ulimits for SHM_HUGETLB is deprecated 这样的告警.通过分析,是由于少配置了hugetlb_shm_group参数导致(vm.hugetlb_shm_group 参数设置为有权使用 HugePages 的操作系统组。默认情况下，此参数设置为 0，从而允许所有组使用 HugePages。可以将此参数设置为 Oracle 数据库进程所属的操作系统组，如 oinstall),在本系统中在sysctl.conf中增加vm.hugetlb_shm_group=1000,然后重启系统(测试中,如果只是重启数据库,非系统重启后第一次重启数据库,不会出现该告警),系统日志没有出现相关告警.
在Linux 6中配置大页建议加上对应的hugetlb_shm_group参数

分布式存储crash导致ORACLE坏块原因分析

oracle运行在分布式存储中，我们在实验室模拟所有存储节点故障(整个存储断电),然后加电，数据库启动没有坏块，但是有客户在自己的环境做测试，发现数据库启动有大量坏块(几次测试问题依旧)。
客户异常环境报错汇总
模拟存储故障后，数据库报错信息,主要体现犹豫存储模拟故障,数据库读写io失败ORA-15081，ORA-27072，ORA-27061

Thread 2 cannot allocate new log, sequence 488
Checkpoint not complete
  Current log# 3 seq# 487 mem# 0: +DATADG/testdb/onlinelog/group_3.318.970157559
  Current log# 3 seq# 487 mem# 1: +ARCHDG/testdb/onlinelog/group_3.616.970157559
Thu Mar 08 16:17:08 2018
WARNING: Read Failed. group:2 disk:0 AU:6925 offset:49152 size:16384
WARNING: failed to read mirror side 1 of virtual extent 4 logical extent 0 of
file 313 in group [2.1286527068] from disk DATADG_0000  allocation unit 6925 reason error;
if possible, will try another mirror side
Thu Mar 08 16:17:08 2018
WARNING: Read Failed. group:2 disk:0 AU:6921 offset:16384 size:16384
WARNING: failed to read mirror side 1 of virtual extent 0 logical extent 0
of file 313 in group [2.1286527068] from disk DATADG_0000  allocation unit 6921 reason error;
 if possible, will try another mirror side
Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_lmon_21091.trc:
ORA-00202: control file: '+DATADG/testdb/controlfile/current.313.970157461'
ORA-15081: failed to submit an I/O operation to a disk
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 56729696
Additional information: -1
Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_lgwr_21111.trc:
ORA-00202: control file: '+DATADG/testdb/controlfile/current.313.970157461'
ORA-15081: failed to submit an I/O operation to a disk
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 56696864
Additional information: -1
Thu Mar 08 16:17:08 2018
WARNING: Write Failed. group:1 disk:0 AU:56970 offset:65536 size:16384
WARNING: Write Failed. group:2 disk:0 AU:6921 offset:65536 size:16384
…………
WARNING: failed to write mirror side 1 of virtual extent 147 logical extent
0 of file 335 in group 2 on disk 0 allocation unit 11789
Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_dbw0_21109.trc:
ORA-15080: synchronous I/O operation to a disk failed
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 8192
WARNING: failed to write mirror side 1 of virtual extent 147 logical extent
0 of file 335 in group 2 on disk 0 allocation unit 11789
KCF: read, write or open error, block=0xddec online=1
        file=18 '+DATADG/testdb/datafile/soe.333.970157833'
        error=15081 txt: ''
Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_dbw0_21109.trc:
Thu Mar 08 16:17:09 2018
System state dump requested by (instance=2, osid=21113 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_diag_21081_20180308161709.trc
Thu Mar 08 16:17:09 2018
ORA-1092 : opitsk aborting process
Thu Mar 08 16:17:12 2018
ORA-1092 : opitsk aborting process
Instance terminated by CKPT, pid = 21113

asm日志报错信息，主要也是体现在io的读写错误上Linux-x86_64 Error: 5: Input/output error

Thu Mar 08 16:17:20 2018
Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_9636.trc:
ORA-17503: ksfdopn:2 Failed to open file +DATADG/testdb/spfiletestdb.ora
ORA-15001: diskgroup "DATADG" does not exist or is not mounted
Thu Mar 08 16:17:37 2018
WARNING: Write Failed. group:3 disk:0 AU:1 offset:4190208 size:4096
Thu Mar 08 16:17:37 2018
WARNING: Write Failed. group:3 disk:0 AU:22 offset:143360 size:4096
Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_20679.trc:
ORA-15080: synchronous I/O operation to a disk failed
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 180504
Additional information: -1
WARNING: Hbeat write to PST disk 0.3916310704 in group 3 failed. [4]
WARNING: failed to write mirror side 1 of virtual extent 0 logical extent 0
of file 255 in group 3 on disk 0 allocation unit 22
Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_20679.trc:
ORA-15081: failed to submit an I/O operation to a disk
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 180504
Additional information: -1
Thu Mar 08 16:17:37 2018
NOTE: process _b001_+asm2 (9546) initiating offline of
disk 0.3916310704 (OCRDG_0000) with mask 0x7e in group 3
NOTE: initiating PST update: grp = 3, dsk = 0/0xe96e28b0, mask = 0x6a, op = clear
GMON updating disk modes for group 3 at 15 for pid 27, osid 9546
ERROR: no read quorum in group: required 1, found 0 disks
Thu Mar 08 16:17:37 2018
NOTE: cache dismounting (not clean) group 3/0x4CAED85D (OCRDG)
WARNING: Offline for disk OCRDG_0000 in mode 0x7f failed.
NOTE: messaging CKPT to quiesce pins Unix process pid: 9505, image: oracle@dbtest02 (B000)
Thu Mar 08 16:17:37 2018
NOTE: halting all I/Os to diskgroup 3 (OCRDG)
Thu Mar 08 16:17:37 2018
NOTE: LGWR doing non-clean dismount of group 3 (OCRDG)
NOTE: LGWR sync ABA=11.6 last written ABA 11.6
WARNING: Write Failed. group:3 disk:0 AU:22 offset:102400 size:4096
Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_20679.trc:
ORA-15080: synchronous I/O operation to a disk failed
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 180424
Additional information: -1
WARNING: failed to write mirror side 1 of virtual extent 0 logical extent
0 of file 255 in group 3 on disk 0 allocation unit 22
Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_20679.trc:
ORA-15081: failed to submit an I/O operation to a disk
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 180424
Additional information: -1
WARNING: Write Failed. group:3 disk:0 AU:22 offset:147456 size:4096
Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_20679.trc:
ORA-15080: synchronous I/O operation to a disk failed
ORA-27072: File I/O error
…………
Thu Mar 08 16:17:37 2018
ERROR: -9(Error 27061, OS Error (Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 512)
)
ERROR: -9(Error 27061, OS Error (Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 512)
)
ERROR: -9(Error 27061, OS Error (Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 512)
)

存储恢复之后，asm mount磁盘组没有报任何错误

SQL> ALTER DISKGROUP ALL MOUNT /* asm agent call crs *//* {0:0:2} */
NOTE: Diskgroups listed in ASM_DISKGROUPS are
         ARCHDG
         DATADG
NOTE: Diskgroup used for Voting files is:
         OCRDG
Diskgroup with spfile:OCRDG
Diskgroup used for OCR is:OCRDG
NOTE: cache registered group ARCHDG number=1 incarn=0xb81889ff
NOTE: cache began mount (first) of group ARCHDG number=1 incarn=0xb81889ff
NOTE: cache registered group DATADG number=2 incarn=0xb8288a00
NOTE: cache began mount (first) of group DATADG number=2 incarn=0xb8288a00
NOTE: cache registered group OCRDG number=3 incarn=0xb8288a01
NOTE: cache began mount (first) of group OCRDG number=3 incarn=0xb8288a01
NOTE: Loaded library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so
NOTE: Assigning number (1,0) to disk (/dev/oracleasm/disks/ARCHDISK01)
NOTE: Assigning number (2,0) to disk (/dev/oracleasm/disks/DATADISK01)
NOTE: Assigning number (3,0) to disk (/dev/oracleasm/disks/OCRDISK01)
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 5 for pid 23, osid 3303
NOTE: cache opening disk 0 of grp 1: ARCHDG_0000 path:/dev/oracleasm/disks/ARCHDISK01
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache mounting (first) external redundancy group 1/0xB81889FF (ARCHDG)
* allocate domain 1, invalid = TRUE
NOTE: attached to recovery domain 1
NOTE: starting recovery of thread=1 ckpt=13.6287 group=1 (ARCHDG)
NOTE: starting recovery of thread=2 ckpt=12.7985 group=1 (ARCHDG)
NOTE: advancing ckpt for group 1 (ARCHDG) thread=1 ckpt=13.6287
NOTE: advancing ckpt for group 1 (ARCHDG) thread=2 ckpt=12.7985
NOTE: cache recovered group 1 to fcn 0.122934
NOTE: redo buffer size is 256 blocks (1053184 bytes)
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (ARCHDG)
Process LGWR (pid 3287) is running at high priority QoS for Exadata I/O
NOTE: LGWR found thread 1 closed at ABA 13.6286
NOTE: LGWR mounted thread 1 for diskgroup 1 (ARCHDG)
NOTE: LGWR opening thread 1 at fcn 0.122934 ABA 14.6287
NOTE: cache mounting group 1/0xB81889FF (ARCHDG) succeeded
NOTE: cache ending mount (success) of group ARCHDG number=1 incarn=0xb81889ff
NOTE: GMON heartbeating for grp 2
GMON querying group 2 at 7 for pid 23, osid 3303
NOTE: cache opening disk 0 of grp 2: DATADG_0000 path:/dev/oracleasm/disks/DATADISK01
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache mounting (first) external redundancy group 2/0xB8288A00 (DATADG)
* allocate domain 2, invalid = TRUE
NOTE: attached to recovery domain 2
NOTE: starting recovery of thread=1 ckpt=12.2323 group=2 (DATADG)
NOTE: starting recovery of thread=2 ckpt=11.2681 group=2 (DATADG)
NOTE: advancing ckpt for group 2 (DATADG) thread=1 ckpt=12.2326
NOTE: advancing ckpt for group 2 (DATADG) thread=2 ckpt=11.2687
NOTE: cache recovered group 2 to fcn 0.21395
NOTE: redo buffer size is 512 blocks (2101760 bytes)
NOTE: LGWR attempting to mount thread 1 for diskgroup 2 (DATADG)
NOTE: LGWR found thread 1 closed at ABA 12.2325
NOTE: LGWR mounted thread 1 for diskgroup 2 (DATADG)
NOTE: LGWR opening thread 1 at fcn 0.21395 ABA 13.2326
NOTE: cache mounting group 2/0xB8288A00 (DATADG) succeeded
NOTE: cache ending mount (success) of group DATADG number=2 incarn=0xb8288a00
NOTE: GMON heartbeating for grp 3
GMON querying group 3 at 9 for pid 23, osid 3303
NOTE: cache opening disk 0 of grp 3: OCRDG_0000 path:/dev/oracleasm/disks/OCRDISK01
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache mounting (first) external redundancy group 3/0xB8288A01 (OCRDG)
* allocate domain 3, invalid = TRUE
Thu Mar 08 17:00:24 2018
NOTE: attached to recovery domain 3
NOTE: starting recovery of thread=1 ckpt=13.55 group=3 (OCRDG)
NOTE: starting recovery of thread=2 ckpt=11.7 group=3 (OCRDG)
NOTE: advancing ckpt for group 3 (OCRDG) thread=1 ckpt=13.55
NOTE: advancing ckpt for group 3 (OCRDG) thread=2 ckpt=11.7
NOTE: cache recovered group 3 to fcn 0.851
NOTE: redo buffer size is 512 blocks (2101760 bytes)
Thu Mar 08 17:00:24 2018
NOTE: LGWR attempting to mount thread 1 for diskgroup 3 (OCRDG)
NOTE: LGWR found thread 1 closed at ABA 13.54
NOTE: LGWR mounted thread 1 for diskgroup 3 (OCRDG)
NOTE: LGWR opening thread 1 at fcn 0.851 ABA 14.55
NOTE: cache mounting group 3/0xB8288A01 (OCRDG) succeeded
NOTE: cache ending mount (success) of group OCRDG number=3 incarn=0xb8288a01
Thu Mar 08 17:00:24 2018
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup ARCHDG was mounted
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 2
SUCCESS: diskgroup DATADG was mounted
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 3
SUCCESS: diskgroup OCRDG was mounted
SUCCESS: ALTER DISKGROUP ALL MOUNT /* asm agent call crs *//* {0:0:2} */

数据库虽然启动成功了,但是大量的坏块生产ORA-01578，ORA-01110

ALTER DATABASE OPEN /* db agent *//* {2:40834:2} */
This instance was first to open
Beginning crash recovery of 2 threads
 parallel recovery started with 7 processes
Started redo scan
Completed redo scan
 read 107566 KB redo, 10569 data blocks need recovery
Started redo application at
 Thread 1: logseq 767, block 101851
 Thread 2: logseq 486, block 91861
Recovery of Online Redo Log: Thread 1 Group 1 Seq 767 Reading mem 0
  Mem# 0: +DATADG/testdb/onlinelog/group_1.314.970157463
  Mem# 1: +ARCHDG/testdb/onlinelog/group_1.614.970157463
Recovery of Online Redo Log: Thread 2 Group 4 Seq 486 Reading mem 0
  Mem# 0: +DATADG/testdb/onlinelog/group_4.319.970157561
  Mem# 1: +ARCHDG/testdb/onlinelog/group_4.617.970157561
Recovery of Online Redo Log: Thread 1 Group 2 Seq 768 Reading mem 0
  Mem# 0: +DATADG/testdb/onlinelog/group_2.315.970157463
  Mem# 1: +ARCHDG/testdb/onlinelog/group_2.615.970157463
Recovery of Online Redo Log: Thread 2 Group 3 Seq 487 Reading mem 0
  Mem# 0: +DATADG/testdb/onlinelog/group_3.318.970157559
  Mem# 1: +ARCHDG/testdb/onlinelog/group_3.616.970157559
Thu Mar 08 17:01:11 2018
Hex dump of (file 12, block 126469) in trace file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_p001_4090.trc
Corrupt block relative dba: 0x0301ee05 (file 12, block 126469)
Fractured block found during crash/instance recovery
Data in bad block:
 type: 32 format: 2 rdba: 0x0301ee05
 last change scn: 0x0000.00446cbb seq: 0x1 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xeac22001
 check value in block header: 0xbd0c
 computed block checksum: 0x8679
Reading datafile '+DATADG/testdb/datafile/soe.327.970157825' for corruption at rdba: 0x0301ee05 (file 12, block 126469)
Reread (file 12, block 126469) found same corrupt data (no logical check)
Completed redo application of 84.56MB
Completed crash recovery at
 Thread 1: logseq 768, block 102395, scn 4586960
 Thread 2: logseq 487, block 101664, scn 4587050
 10569 data blocks read, 10569 data blocks written, 107566 redo k-bytes read
ARCH: STARTING ARCH PROCESSES
Thu Mar 08 17:01:12 2018
ARC0 started with pid=43, OS id=4151
Thu Mar 08 17:01:13 2018
ARC0: Archival started
ARCH: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Thu Mar 08 17:01:13 2018
ARC1 started with pid=44, OS id=4153
Thu Mar 08 17:01:13 2018
ARC2 started with pid=45, OS id=4155
ARC1: Archival started
ARC2: Archival started
Thu Mar 08 17:01:13 2018
ARC3 started with pid=46, OS id=4157
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Thread 1 advanced to log sequence 769 (thread recovery)
Picked broadcast on commit scheme to generate SCNs
Thu Mar 08 17:01:13 2018
Thread 2 advanced to log sequence 488 (thread open)
Thread 2 opened at log sequence 488
  Current log# 4 seq# 488 mem# 0: +DATADG/testdb/onlinelog/group_4.319.970157561
  Current log# 4 seq# 488 mem# 1: +ARCHDG/testdb/onlinelog/group_4.617.970157561
Successful open of redo thread 2
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Mar 08 17:01:13 2018
SMON: enabling cache recovery
Archived Log entry 1249 added for thread 2 sequence 487 ID 0xa41f5c11 dest 1:
[4046] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:4294332240 end:4294332460 diff:220 (2 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is AL32UTF8
No Resource Manager plan active
Starting background process GTX0
Thu Mar 08 17:01:14 2018
GTX0 started with pid=55, OS id=4181
Starting background process RCBG
Thu Mar 08 17:01:14 2018
RCBG started with pid=57, OS id=4185
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Thu Mar 08 17:01:14 2018
QMNC started with pid=61, OS id=4195
Thu Mar 08 17:01:15 2018
Hex dump of (file 9, block 43523) in trace file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_p009_4175.trc
Corrupt block relative dba: 0x0240aa03 (file 9, block 43523)
Fractured block found during buffer read
Data in bad block:
 type: 32 format: 2 rdba: 0x0240aa03
 last change scn: 0x0000.0044ccc4 seq: 0x1 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xd94f2001
 check value in block header: 0xb09c
 computed block checksum: 0x158b
Reading datafile '+DATADG/testdb/datafile/soe.324.970157821' for corruption at rdba: 0x0240aa03 (file 9, block 43523)
Reread (file 9, block 43523) found same corrupt data (no logical check)
Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_p009_4175.trc  (incident=216418):
ORA-01578: ORACLE data block corrupted (file # 9, block # 43523)
ORA-01110: data file 9: '+DATADG/testdb/datafile/soe.324.970157821'
Incident details in: /oracle/app/db/diag/rdbms/testdb/testdb2/incident/incdir_216418/testdb2_p009_4175_i216418.trc
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Thu Mar 08 17:01:15 2018
Hex dump of (file 14, block 71173) in trace file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_p003_4094.trc
Corrupt block relative dba: 0x03811605 (file 14, block 71173)
Fractured block found during buffer read
Data in bad block:
 type: 32 format: 2 rdba: 0x03811605
 last change scn: 0x0000.0043916d seq: 0x1 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x93662001
 check value in block header: 0xad98
 computed block checksum: 0x20b
Reading datafile '+DATADG/testdb/datafile/soe.329.970157831' for corruption at rdba: 0x03811605 (file 14, block 71173)
Reread (file 14, block 71173) found same corrupt data (no logical check)
Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_p003_4094.trc  (incident=216306):
ORA-01578: ORACLE data block corrupted (file # 14, block # 71173)
ORA-01110: data file 14: '+DATADG/testdb/datafile/soe.329.970157831'
Incident details in: /oracle/app/db/diag/rdbms/testdb/testdb2/incident/incdir_216306/testdb2_p003_4094_i216306.trc
Hex dump of (file 12, block 112385) in trace file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_smon_4005.trc
Corrupt block relative dba: 0x0301b701 (file 12, block 112385)
Fractured block found during buffer read
Data in bad block:
 type: 6 format: 2 rdba: 0x0301b701
 last change scn: 0x0000.003cb146 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x1d030601
 check value in block header: 0x1e62
 computed block checksum: 0x7a47
Reading datafile '+DATADG/testdb/datafile/soe.327.970157825' for corruption at rdba: 0x0301b701 (file 12, block 112385)
Reread (file 12, block 112385) found same corrupt data (no logical check)
Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_smon_4005.trc  (incident=216170):
ORA-01578: ORACLE data block corrupted (file # 12, block # 112385)
ORA-01110: data file 12: '+DATADG/testdb/datafile/soe.327.970157825'
Incident details in: /oracle/app/db/diag/rdbms/testdb/testdb2/incident/incdir_216170/testdb2_smon_4005_i216170.trc
Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_p003_4094.trc  (incident=216307):
ORA-01578: ORACLE data block corrupted (file # , block # )
Incident details in: /oracle/app/db/diag/rdbms/testdb/testdb2/incident/incdir_216307/testdb2_p003_4094_i216307.trc
Completed: ALTER DATABASE OPEN /* db agent *//* {2:40834:2} */

rman 检查坏块信息，发现有大量坏块，而且主要坏块类型是FRACTURED(折断)

SQL> select * from v$database_block_corruption;
     FILE#     BLOCK#	  BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
	12     126469	       1		  0 FRACTURED
	12     112385	       1		  0 FRACTURED
	14	71173	       1		  0 FRACTURED
	 9	43523	       1		  0 FRACTURED
	15	44801	       1		  0 FRACTURED
	15	44855	       1		  0 FRACTURED
	15	44862	       1		  0 FRACTURED
	11	56252	       1		  0 FRACTURED
	12     116481	       1		  0 CHECKSUM
	12     116535	       1		  0 FRACTURED
	15	46903	       1		  0 FRACTURED
	12     108289	       1		  0 FRACTURED
	12     108343	       1		  0 FRACTURED
	 8	95872	       1		  0 FRACTURED
	 8	99712	       1		  0 FRACTURED
	 8     102783	       1		  0 FRACTURED
	 8     104974	       1		  0 FRACTURED
	 8     105856	       1		  0 FRACTURED
	 8     105998	       1		  0 FRACTURED
	 8     109120	       1		  0 FRACTURED
	 8     110016	       1		  0 FRACTURED
	 8     110092	       1		  0 FRACTURED
	 8     116226	       1		  0 FRACTURED
	 8     116231	       1		  0 FRACTURED
	 8     119295	       1		  0 FRACTURED
	 8     130559	       1		  0 FRACTURED
	12     113163	       1		  0 FRACTURED
	12     118283	       1		  0 FRACTURED
	12     119553	       1		  0 FRACTURED
	12     119608	       1		  0 FRACTURED
	12     120576	       1		  0 FRACTURED
	12     120632	       1		  0 FRACTURED
	12     120639	       1		  0 FRACTURED
	12     123407	       1		  0 FRACTURED
	 9	38411	       1		  0 FRACTURED
	 9	67647	       1		  0 FRACTURED
	 9     109068	       1		  0 FRACTURED
	 9     109070	       1		  0 FRACTURED
	11	99839	       1		  0 FRACTURED
	11     104960	       1		  0 FRACTURED
	13	63214	       1		  0 FRACTURED
	13	65097	       1		  0 FRACTURED
	13	71183	       1		  0 FRACTURED
	13	77319	       1		  0 FRACTURED
	13     103104	       1		  0 FRACTURED
	13     107072	       1		  0 FRACTURED
	13     110082	       1		  0 FRACTURED
	13     111106	       1		  0 FRACTURED
	13     114752	       1		  0 FRACTURED
	14	72196	       1		  0 FRACTURED
	14	75275	       1		  0 FRACTURED
	 7	76293	       1		  0 FRACTURED
	 7	76296	       1		  0 FRACTURED
	 7	76415	       1		  0 FRACTURED
	 7	76864	       1		  0 FRACTURED
	15	49976	       1		  0 FRACTURED
	15	81413	       1		  0 FRACTURED
	 2	61512	       1		  0 FRACTURED
	19	35338	       1		  0 FRACTURED
	19	42687	       1		  0 FRACTURED
60 rows selected.

对比实验室和客户环境配置
实验室环境

[oracle@i-q2ghx82t ~]$ more /etc/issue
CentOS release 6.8 (Final)
Kernel \r on an \m
SQL> select path from v$asm_disk;
PATH
----------------------------------------
/dev/asm-disk011
/dev/asm-disk001
/dev/asm-disk014
/dev/asm-disk002
/dev/asm-disk003
/dev/asm-disk015
/dev/asm-disk012
/dev/asm-disk013
8 rows selected.
[root@i-q2ghx82t ~]# fdisk -l|grep vd
Disk /dev/vda: 107.4 GB, 107374182400 bytes
/dev/vda1   *           1       13055   104856576   83  Linux
Disk /dev/vdb: 34.4 GB, 34359738368 bytes
Disk /dev/vdc: 107.4 GB, 107374182400 bytes
Disk /dev/vdd: 268.4 GB, 268435456000 bytes
/dev/vdd1               1       32635   262140606   83  Linux
Disk /dev/vde: 1073.7 GB, 1073741824000 bytes
Disk /dev/vdf: 53.7 GB, 53687091200 bytes
/dev/vdf1               1      104025    52428568+  83  Linux
Disk /dev/vdg: 107.4 GB, 107374182400 bytes
Disk /dev/vdh: 1073.7 GB, 1073741824000 bytes
Disk /dev/vdi: 107.4 GB, 107374182400 bytes
Disk /dev/vdj: 1073.7 GB, 1073741824000 bytes
Disk /dev/vdk: 1073.7 GB, 1073741824000 bytes
Disk /dev/vdl: 1073.7 GB, 1073741824000 bytes
[oracle@i-q2ghx82t ~]$ ls -l /dev/asm-*
lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk001 -> vdg
lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk002 -> vdc
lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk003 -> vdi
lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk011 -> vde
lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk012 -> vdh
lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk013 -> vdl
lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk014 -> vdj
lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk015 -> vdk
[root@i-q2ghx82t ~]# fdisk -l /dev/vdj
Disk /dev/vdj: 1073.7 GB, 1073741824000 bytes
16 heads, 63 sectors/track, 2080507 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

客户环境

[oracle@dbtest02 ~]$ more /etc/issue
Red Hat Enterprise Linux Server release 6.6 (Santiago)
Kernel \r on an \m
SQL> select path from v$asm_disk;
PATH
----------------------------------------
/dev/oracleasm/disks/ARCHDISK01
/dev/oracleasm/disks/OCRDISK01
/dev/oracleasm/disks/DATADISK01
[root@dbtest02 ~]# oracleasm listdisks
ARCHDISK01
DATADISK01
OCRDISK01
[root@dbtest02 ~]# oracleasm querydisk -p DATADISK01
Disk "DATADISK01" is a valid ASM disk
/dev/vdc1: LABEL="DATADISK01" TYPE="oracleasm"
[root@dbtest02 ~]# oracleasm querydisk -p ARCHDISK01
Disk "ARCHDISK01" is a valid ASM disk
/dev/vde1: LABEL="ARCHDISK01" TYPE="oracleasm"
[root@dbtest02 ~]# oracleasm querydisk -p OCRDISK01
Disk "OCRDISK01" is a valid ASM disk
/dev/vdd1: LABEL="OCRDISK01" TYPE="oracleasm"
[root@dbtest02 ~]# fdisk -l|grep vd
Disk /dev/vda: 53.7 GB, 53687091200 bytes
/dev/vda1               1        6528    52427776   83  Linux
Disk /dev/vdb: 17.2 GB, 17179869184 bytes
Disk /dev/vdc: 1073.7 GB, 1073741824000 bytes
/dev/vdc1               1     2080507  1048575496+  83  Linux
Disk /dev/vdd: 107.4 GB, 107374182400 bytes
/dev/vdd1               1      208050   104857168+  83  Linux
Disk /dev/vde: 536.9 GB, 536870912000 bytes
/dev/vde1               1     1040253   524287480+  83  Linux
Disk /dev/vdf: 859.0 GB, 858993459200 bytes
/dev/vdf1               1     1664406   838860592+  83  Linux
[root@dbtest02 ~]# fdisk -l /dev/vdf
Disk /dev/vdf: 859.0 GB, 858993459200 bytes
16 heads, 63 sectors/track, 1664406 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x1bdbf439
   Device Boot      Start         End      Blocks   Id  System
/dev/vdf1               1     1664406   838860592+  83  Linux
Partition 1 does not start on physical sector boundary.

这里比较明显:1)实验室环境使用的是udev,而客户环境使用的是asmlib.2)客户环境使用了分区,而实验室环境直接使用裸盘.

尝试在实验室环境使用asmlib

[root@i-q2ghx82t tmp]# ls -l *oracleasm*
-rw-r--r-- 1 root root 36288 Mar 10 13:09 kmod-oracleasm-2.0.8-13.el6_8.x86_64.rpm
-rw-r--r-- 1 root root 17168 Mar 10 13:09 oracleasmlib-2.0.12-1.el6.x86_64.rpm
-rw-r--r-- 1 root root 74984 Mar 10 13:09 oracleasm-support-2.1.8-1.el6.x86_64.rpm
[root@i-q2ghx82t tmp]# rpm -ivh kmod-oracleasm-2.0.8-13.el6_8.x86_64.rpm
warning: kmod-oracleasm-2.0.8-13.el6_8.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID ec551f03: NOKEY
Preparing...                ########################################### [100%]
   1:kmod-oracleasm         ########################################### [100%]
[root@i-q2ghx82t tmp]# rpm -ivh oracleasm*.rpm
warning: oracleasmlib-2.0.12-1.el6.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID ec551f03: NOKEY
Preparing...                ########################################### [100%]
   1:oracleasm-support      ########################################### [ 50%]
   2:oracleasmlib           ########################################### [100%]
[root@i-q2ghx82t tmp]#
[root@i-q2ghx82t tmp]# rpm -aq|grep oracleasm
kmod-oracleasm-2.0.8-13.el6_8.x86_64
oracleasmlib-2.0.12-1.el6.x86_64
oracleasm-support-2.1.8-1.el6.x86_64
[root@i-q2ghx82t dev]# oracleasm createdisk ASMVDL /dev/vdl
Device "/dev/vdl" is not a partition

测试比较明显，对于没有分区的分布式存储中的盘,无法直接被asmlib创建asm disk.如果要使用asmlib,需要对磁盘进行分区.

存储特征
通过咨询分布式存储开发,确认几点:
0. 整个存储是基于网络技术实现分布式存储内部数据同步
1. 整个分布式存储使用ssd卡,默认三副本
2. 三副本中有两个副本写入成功才会反馈给应用(数据库),io成功
3. 整个分布式存储，没有引入任何的cache.
4. 存储的最小io单元是4k,由于ssd卡决定的
5. 分布式存储划分出来的lun是自动实现快对齐的

故障原因
由于存储最小单元是4k,但是在os层面物理扇区为4k，逻辑扇区为512，客户为了使用asmlib,对磁盘进行了默认分区,而没有考虑块对齐.会使得在存储整体crash的时候,很多block由于没有块对齐,很可能被进行了拆分(本来一个4k的io，存储上对对应的一个原子io完成，但是现在这个os层面的4k被拆分成了多个io的可能性，使得os层面的4k的io无法有了原则性)，进而使得在数据库主机,存储突发异常的时候,有可能发生坏块.

处理建议
1. 在linux 6开始,对于asmlib,建议使用udev
2. 如果坚持asmlib,分区的时候考虑块对齐,不然会出现较大的性能下降,而且还会引起坏块的风险

ORA-12754: Feature 'startup' is disabled due to missing capability 'Runtime Environment'

ORACLE 18C启动报ORA-12754
从http://edelivery.oracle.com网上可以下载oracle 18c for exadata,在普通的linux平台可以正常安装,但是无法正常启动实例(dbca无法创建库),报错信息为:ORA-12754: Feature ‘startup’ is disabled due to missing capability ‘Runtime Environment’.

由于普通的linux而非真实oracle一体机,而该版本中做了限制,因此无法启动数据库.根据oracle的作风,对于引进的新特性,oracle一般都提供隐含参数或者event控制,通过分析确定有_exadata_feature_on参数进行控制.
修改脚本创建库参数
由于我们无法在dbca图形化的界面中增加新参数,因此可以通过生成创建库脚本,然后通过修改参数文件,增加_exadata_feature_on=true

[oracle@xifenfei scripts]$ ls -ltr *.ora
-rw-r----- 1 oracle oinstall 2252 Feb 28 09:22 init.ora
-rw-r----- 1 oracle oinstall 2264 Feb 28 09:28 initxffdbTempOMF.ora
-rw-r----- 1 oracle oinstall 2386 Feb 28 09:29 initxffdbTemp.ora

然后重新执行xffdb.sh脚本创建库

[oracle@xifenfei scripts]$ sh xffdb.sh
You should Add this entry in the /etc/oratab: xffdb:/u02/soft:Y
SQL*Plus: Release 18.0.0.0.0 Production on Wed Feb 28 09:29:48 2018
Version 18.1.0.0.0
Copyright (c) 1982, 2017, Oracle.  All rights reserved.
Enter new password for SYS:
Enter new password for SYSTEM:
Enter password for SYS:
Connected to an idle instance.
SQL> spool /u02/app/oracle/admin/xffdb/scripts/CloneRmanRestore.log append
SQL> startup mount pfile="/u02/app/oracle/admin/xffdb/scripts/initxffdbTempOMF.ora";
ORACLE instance started.
Total System Global Area 2432694552 bytes
Fixed Size                  8898840 bytes
Variable Size             654311424 bytes
Database Buffers         1761607680 bytes
Redo Buffers                7876608 bytes
Database mounted.
SQL> execute dbms_backup_restore.resetCfileSection(dbms_backup_restore.RTYP_DFILE_COPY);
PL/SQL procedure successfully completed.
SQL> execute dbms_backup_restore.resetCfileSection(13);
PL/SQL procedure successfully completed.
SQL> host /u02/soft/bin/rman @/u02/app/oracle/admin/xffdb/scripts/rmanRestoreDatafiles.sql &&sysPassword;
Recovery Manager: Release 18.0.0.0.0 - Production on Wed Feb 28 09:30:08 2018
Version 18.1.0.0.0
Copyright (c) 1982, 2018, Oracle and/or its affiliates.  All rights reserved.
RMAN> connect target *
2>
3> CATALOG START WITH   '/u02/soft/assistants/dbca/templates//Seed_Database.dfb'  NOPROMPT  ;
4>
5> RUN {
6>
7> set newname for datafile 1 to  '/u02/app/oracle/oradata/XFFDB/system01.dbf' ;
8>
9> set newname for datafile 3 to  '/u02/app/oracle/oradata/XFFDB/sysaux01.dbf' ;
10>
11> set newname for datafile 4 to  '/u02/app/oracle/oradata/XFFDB/undotbs01.dbf' ;
12>
13> set newname for datafile 7 to  '/u02/app/oracle/oradata/XFFDB/users01.dbf' ;
14>
15> restore datafile 1;
16>
17> restore datafile 3;
18>
19> restore datafile 4;
20>
21> restore datafile 7; }
22>
connected to target database: SEEDDATA (DBID=1214140, not open)
……………………
SQL> set echo on
SQL> spool /u02/app/oracle/admin/xffdb/scripts/postPDBCreation.log append
SQL> select TABLESPACE_NAME from cdb_tablespaces a,dba_pdbs b where a.con_id=b.con_id and UPPER(b.pdb_name)=UPPER('pdb');
TABLESPACE_NAME
------------------------------
SYSTEM
SYSAUX
UNDOTBS1
TEMP
USERS
SQL> connect "SYS"/"&&sysPassword" as SYSDBA
Connected.
SQL> alter session set container=pdb;
Session altered.
SQL> set echo on
SQL> spool /u02/app/oracle/admin/xffdb/scripts/postPDBCreation.log append
SQL> Select count(*) from dba_registry where comp_id = 'DV' and status='VALID';
  COUNT(*)
----------
         1
SQL> alter session set container=CDB$ROOT;
Session altered.
SQL> exit;
Disconnected from Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.1.0.0.0
[oracle@xifenfei scripts]$

登录oracle 18c

[oracle@xifenfei scripts]$ sqlplus / as sysdba
SQL*Plus: Release 18.0.0.0.0 Production on Wed Feb 28 09:40:08 2018
Version 18.1.0.0.0
Copyright (c) 1982, 2017, Oracle.  All rights reserved.
Connected to:
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.1.0.0.0
SQL> set lines 150
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
BANNER_FULL
------------------------------------------------------------------------------------------------------
BANNER_LEGACY                                                                        CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.1.0.0.0
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production                    0
SQL> col name for a52
SQL> col value for a24
SQL> col description for a50
SQL> set linesize 150
SQL> select a.ksppinm name,b.ksppstvl value,a.ksppdesc description
  2    from x$ksppi a,x$ksppcv b
  3   where a.inst_id = USERENV ('Instance')
  4     and b.inst_id = USERENV ('Instance')
  5     and a.indx = b.indx
  6     and upper(a.ksppinm) LIKE upper('%&param%')
  7  order by name
  8  /
Enter value for param: _exadata_feature_on
old   6:    and upper(a.ksppinm) LIKE upper('%&param%')
new   6:    and upper(a.ksppinm) LIKE upper('%_exadata_feature_on%')
NAME                                                 VALUE                    DESCRIPTION
---------------------------------------------------- ------------------------ -------------------------
_exadata_feature_on                                  TRUE                     Exadata Feature On
SQL> show parameter name;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cdb_cluster_name                     string
cell_offloadgroup_name               string
db_file_name_convert                 string
db_name                              string      xffdb
db_unique_name                       string      xffdb
global_names                         boolean     FALSE
instance_name                        string      xffdb
lock_name_space                      string
log_file_name_convert                string
pdb_file_name_convert                string
processor_group_name                 string
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
service_names                        string      xffdb
SQL>

OSD-04016: 异步 I/O 请求排队时出错

有某客户由于硬件故障，导致数据库无法启动，让我们介入处理
数据库启动报错

Mon Feb 26 17:28:24 2018
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 3 processes
Started redo scan
Completed redo scan
 read 2054 KB redo, 509 data blocks need recovery
Started redo application at
 Thread 1: logseq 41341, block 54
Recovery of Online Redo Log: Thread 1 Group 1 Seq 41341 Reading mem 0
  Mem# 0: E:\ORADATA\ORCL\REDO01.LOG
Completed redo application of 1.77MB
KCF: read, write or open error, block=0x16439 online=1
        file=1 'E:\ORADATA\ORCL\SYSTEM01.DBF'
        error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。'
Errors in file d:\oracle\diag\rdbms\orcl\orcl\trace\orcl_dbw0_4928.trc:
ORA-01243: system tablespace file suffered media failure
ORA-01114: IO error writing block to file 1 (block # 91193)
ORA-01110: data file 1: 'E:\ORADATA\ORCL\SYSTEM01.DBF'
ORA-27070: async read/write failed
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。
DBW0 (ospid: 4928): terminating the instance due to error 1243
Mon Feb 26 17:28:29 2018
Instance terminated by DBW0, pid = 4928

这里错误比较明显，由于io错误，在数据库实例恢复之时，写回block正好在该损坏位置从而使得数据库无法正常实例恢复,进而无法open.

dbv验证文件
通过专业工具对system文件进行了重构system文件,然后dbv检查结果如下

DBVERIFY: Release 11.2.0.1.0 - Production on 星期二 2月 27 17:20:14 2018
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
DBVERIFY - 开始验证: FILE = D:\OK\SYSTEM01.DBF
页 91156 标记为损坏
Corrupt block relative dba: 0x00416414 (file 1, block 91156)
Bad header found during dbv:
Data in bad block:
 type: 229 format: 6 rdba: 0xe1d9e3e7
 last change scn: 0xd682.ffc8c7cb seq: 0x8c flg: 0x8c
 spare1: 0xc0 spare2: 0xf6 spare3: 0x70b3
 consistency value in tail: 0x71f50602
 check value in block header: 0x8195
 computed block checksum: 0x8689
DBVERIFY - 验证完成
检查的页总数: 215040
处理的页总数 (数据): 178112
失败的页总数 (数据): 0
处理的页总数 (索引): 19070
失败的页总数 (索引): 0
处理的页总数 (其他): 3118
处理的总页数 (段)  : 1
失败的总页数 (段)  : 0
空的页总数: 14739
标记为损坏的总页数: 1
流入的页总数: 0
加密的总页数        : 0
最高块 SCN            : 1638554501 (0.1638554501)

这里比较明显重构出来的system文件只有block 91156坏块,这里注意和没有处理之前的坏块不一样

通过dump分析坏块所属对象

Start dump data block from file D:\OK\SYSTEM01.DBF minblk 91156 maxblk 91156
 V10 STYLE FILE HEADER:
	Compatibility Vsn = 186646528=0xb200000
	Db ID=1383974140=0x527dc4fc, Db Name='ORCL'
	Activation ID=0=0x0
	Control Seq=806694=0xc4f26, File size=215040=0x34800
	File Number=1, Blksiz=8192, File Type=3 DATA
Dump all the blocks in range:
buffer tsn: 0 rdba: 0xe1d9e3e7 (903/1696743)
scn: 0xd682.ffc8c7cb seq: 0x8c flg: 0x8c tail: 0x71f50602
frmt: 0x06 chkval: 0x8195 type: 0xe5=unknown
Hex dump of corrupt header 4 = CORRUPT
Start dump data block from file D:\OK\SYSTEM01.DBF minblk 91155 maxblk 91155
 V10 STYLE FILE HEADER:
	Compatibility Vsn = 186646528=0xb200000
	Db ID=1383974140=0x527dc4fc, Db Name='ORCL'
	Activation ID=0=0x0
	Control Seq=806694=0xc4f26, File size=215040=0x34800
	File Number=1, Blksiz=8192, File Type=3 DATA
Dump all the blocks in range:
buffer tsn: 0 rdba: 0x00416413 (1/91155)
scn: 0x0000.613ad8d6 seq: 0x01 flg: 0x06 tail: 0xd8d60601
frmt: 0x02 chkval: 0xcc5f type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Block header dump:  0x00416413
 Object id on Block? Y
 seg/obj: 0x25  csc: 0x00.613ad8ce  itc: 2  flg: -  typ: 2 - INDEX
     fsl: 0  fnx: 0x0 ver: 0x01
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0009.000.0016fda0  0x00c0130e.6f38.01  CB--    0  scn 0x0000.3e6ed294
0x02   0x0009.008.0023e862  0x00c002cf.a217.13  --U-    1  fsc 0x0000.613ad8d6
Start dump data block from file D:\OK\SYSTEM01.DBF minblk 91157 maxblk 91157
 V10 STYLE FILE HEADER:
	Compatibility Vsn = 186646528=0xb200000
	Db ID=1383974140=0x527dc4fc, Db Name='ORCL'
	Activation ID=0=0x0
	Control Seq=806694=0xc4f26, File size=215040=0x34800
	File Number=1, Blksiz=8192, File Type=3 DATA
Dump all the blocks in range:
buffer tsn: 0 rdba: 0x00416415 (1/91157)
scn: 0x0000.6193dc0c seq: 0x01 flg: 0x06 tail: 0xdc0c0601
frmt: 0x02 chkval: 0x8c21 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Block header dump:  0x00416415
 Object id on Block? Y
 seg/obj: 0x25  csc: 0x00.6193dc04  itc: 2  flg: -  typ: 2 - INDEX
     fsl: 0  fnx: 0x0 ver: 0x01
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0006.00c.001ee103  0x00c008a5.8dbd.02  C---    0  scn 0x0000.57303e03
0x02   0x000b.00f.00053100  0x00c008af.1563.09  --U-    1  fsc 0x0000.6193dc0c

这里比较明显，可以确定坏块为index，object_id=0x25=37,通过查询其他库，确定为i_obj2(obj$的index)

使用该文件启动数据库

SQL> alter database rename file 'E:\ORADATA\ORCL\SYSTEM01.DBF' to 'd:\orcl\SYSTEM01.DBF';
数据库已更改。
SQL> recover database;
完成介质恢复。
SQL> alter database open;
数据库已更改。

检查alert日志
发现smon进程由于坏块的存储，出现大量报错,需要处理,不然数据库一段时间后就会crash.

Tue Feb 27 20:31:23 2018
QMNC started with pid=27, OS id=4652
Completed: alter database open
Tue Feb 27 20:31:25 2018
Starting background process CJQ0
Tue Feb 27 20:31:25 2018
CJQ0 started with pid=30, OS id=2172
Tue Feb 27 20:31:25 2018
db_recovery_file_dest_size of 3912 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Hex dump of (file 1, block 91156) in trace file d:\oracle\diag\rdbms\orcl\orcl\trace\orcl_cjq0_2172.trc
Corrupt block relative dba: 0x00416414 (file 1, block 91156)
Bad header found during multiblock buffer read
Data in bad block:
 type: 229 format: 6 rdba: 0xe1d9e3e7
 last change scn: 0xd682.ffc8c7cb seq: 0x8c flg: 0x8c
 spare1: 0xc0 spare2: 0xf6 spare3: 0x70b3
 consistency value in tail: 0x71f50602
 check value in block header: 0x8195
 computed block checksum: 0x8689
Reading datafile 'D:\ORCL\SYSTEM01.DBF' for corruption at rdba: 0x00416414 (file 1, block 91156)
Reread (file 1, block 91156) found same corrupt data
Errors in file d:\oracle\diag\rdbms\orcl\orcl\trace\orcl_smon_3992.trc  (incident=77085):
ORA-01578: ORACLE data block corrupted (file # 1, block # 91156)
ORA-01110: data file 1: 'D:\ORCL\SYSTEM01.DBF'
Incident details in: d:\oracle\diag\rdbms\orcl\orcl\incident\incdir_77085\orcl_smon_3992_i77085.trc
Errors in file d:\oracle\diag\rdbms\orcl\orcl\trace\orcl_cjq0_2172.trc  (incident=77221):
ORA-01578: ORACLE data block corrupted (file # 1, block # 91156)
ORA-01110: data file 1: 'D:\ORCL\SYSTEM01.DBF'
Incident details in: d:\oracle\diag\rdbms\orcl\orcl\incident\incdir_77221\orcl_cjq0_2172_i77221.trc

重建i_obj2 index，参考：http://www.xifenfei.com/?p=5566

18c新特性：alter system cancel sql

根据18c官方描述cancel sql功能是在18c中引起,但是实测发现在oracle 12.2中已经有了cancel sql功能，可以实现终止掉某个sql的当前sql正在执行的sql语句,而不是传统的直接kill某个会话.ALTER SYSTEM CANCEL SQL语句有四个参数分别为：

--会话1
SQL> set lines 150
SQL> select * from v$version;
BANNER                                                                               CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production              0
PL/SQL Release 12.2.0.1.0 - Production                                                    0
CORE    12.2.0.1.0      Production                                                        0
TNS for Linux: Version 12.2.0.1.0 - Production                                            0
NLSRTL Version 12.2.0.1.0 - Production                                                    0
SQL> select sid, serial# from v$session where sid in
  2  (select  sid from v$mystat where rownum=1);
       SID    SERIAL#
---------- ----------
       278       4019
SQL> create table t_xifenfei tablespace users as select * from dba_source;
Table created.
SQL> insert into t_xifenfei select * from t_xifenfei;
274132 rows created.                    <<===没有提交
SQL> select count(*)from t_xifenfei;
  COUNT(*)
----------
    548264
SQL> insert into t_xifenfei select * from t_xifenfei;
548264 rows created.     <<===没有提交
SQL> select count(*)from t_xifenfei;
  COUNT(*)
----------
   1096528
SQL> insert into t_xifenfei select * from t_xifenfei;
--会话2
SQL> select count(*)from t_xifenfei;
  COUNT(*)
----------
    274132
SQL> alter system cancel sql '278,4019';
System altered.
SQL> select count(*)from t_xifenfei;
  COUNT(*)
----------
    274132
--会话1
SQL> insert into t_xifenfei select * from t_xifenfei;
insert into t_xifenfei select * from t_xifenfei
*
ERROR at line 1:
ORA-01013: user requested cancel of current operation
SQL> select count(*)from t_xifenfei;
  COUNT(*)
----------
   1096528

这里可以看到会话1的最后一个insert被cancel，但是前面两个没有提交的insert没有被回滚/提交，看到了cancel sql的功能的实现.

Oracle中的主要字母缩写含义

ACD = Active Change Directory
ACFS = ASM Cluster File System
ADDM = Automatic Database Diagnostic Monitor
ADR = Automatic Diagnostic Repository
ADVM = ASM Dynamic Volume Manager
AIO = Asynchronous I/O
AMDU = ASM Metadata Dump Utility
AMM = Automatic Memory Management
ARC = Archive process
ASH = Active Session History
ASM = Automatic Storage Management
ASMB = ASM Background process
ASMCA = ASM Configuration Assistant
ASMCMD = ASM CoMmanD line utility
ASMLIB = ASM LIBrary tool
ASMM = Automatic Shared Memory Management
ASMSNMP = ASM Simple Network Management Protocol
AT = Allocation Table
ATA = Advanced Technology Attachment
AU = Allocation Unit
AWR = Automatic Workload Repository
BH = Block Header
BMC = Baseboard Management Controller
BS = Block Size
CBO = Cost-Based Optimizer
CF = Control File
CFS = Cluster FileSystem
CHM = Cluster Health Monitor
CIO = Concurrent I/O
CKPT = ChecKPoinT process
CLUVFY = CLUster VeriFy utility
COD = Continuing Operation Directory
CPU = Central Processing Unit or Critical Patch Update
CRM = Customer Relationship Management
CRS = Cluster Ready Services
CSS = Cluster Synchronization Services
CSV = Comma-Separated Values
CVM = Cluster Volume Manager
DB = DataBase
DBCA = DataBase Configuration Assistant
DBFS = DataBase File System
DBM = DataBase Machine
DBMS = DataBase Management Systems
DBPERF = DataBase PERFormance
DBV = DataBase Verification tool
DBW = DataBase Writer process
DD = Disk Directory or Data Description tool
DES = Database Excelleration Systems Inc.
DG = DiskGroup
DH = Disk Header
DIO = Direct I/O
DISM = Dynamic Intimate Shared Memory
DM = Device Mapper
DNFS = Direct Network File System
DRAM = Dynamic Random Access Memory
DSS = Decision Support System
DUL = Data UnLoader
DW = Data Warehouse
EIDE = Enhanced Integrated Drive Electronics
ERP = Enterprise Resource Planning
ETA = Estimated Time of Arrival
ETL = Extract Transform Load
FD = File Directory
FG = FailGroup
FRA = Flash Recovery Area or Fast Recovery Area
FS = FileSystem
FST = Free Segments Table
FTS = Full Table Scan
GC = Grid Control
GI = Grid Infrastructure
GUI = Graphical User Interface
HA = High Availability
HARD = Hardware Assisted Resilient Data
HB = Heart Beat
HBA = Host Bus Adapter
IDE = Integrated Drive Electronics
IIS = Internet Information Services
INST = INSTance
IO = Input/Output
IOPS = IO Per Second
IOT = Index Organized Table
IP = Internet Protocol
IPMI = Intelligent Platform Management Interface
JET = Joint Escalation Team
JFS = Journaled FileSystem
KB = KiloByte
KFED = Kernel Files metadata EDitor
KFOD = Kernel Files Osm disk[group] Discovery
LGWR = redo LoG WRiter process
LLT = Veritas Low Latency Transport protocol
LSNR = LiSteNeR
LUN = Logical Unit Number
LVM = Logical Volume Manager
MAA = Maximum Availability Architecture
MB = MegaByte
NAS = Network Attached Storage
NetApp = Network Appliance
NETCA = NETwork Configuration Assistant
NFS = Network FileSystem
NIC = Network Interface Controller
OCFS = Oracle Cluster FileSystem
OCR = Oracle Cluster Registry
ODM = Oracle Disk Manager
ODS = Operational Data Store
OEL = Oracle Enterprise Linux
OEM = Oracle Enterprise Manager
OID = Oracle Internet Directory
OLAP = OnLine Analytical Processing
OLTP = OnLine Transaction Processing
OMS = Oracle Management Service
OPATCH = Oracle PATCHing utility
OS = Operating System
OSCP = Oracle Storage Compatibility Program
OSW = OS Watcher
OSWFW = OS Watcher For Windows
OTN = Oracle Technology Network
OUI = Oracle Universal Installer
PB = PetaByte
PFILE = Parameter FILE
PGA = Program Global Area
PID = Proces ID
PL/SQL = Procedural Language/Structured Query Language
POC = Proof Of Concept
PROCWATCHER = PROCess WATCHER
PST = Partnership Status Table
PSU = Patch Set Update
PX = Parrallel eXecution
RAC = Real Application Cluster
RAID = Redundant Array of Independent Disks
RAM = Random Access Memory
RBAL = ReBALance process
RCA = Root Cause Analysis
RDA = Remote Diagnostic Agent
RDBMS = Relational DataBase Management System
RHEL = RedHat Enterprise Linux
RM = Resource Manager
RMAN = Recovery MANager
RPM = Resource Package Manager
SAN = Storage Area Network
SAS = Serial Attached SCSI
SATA = Serial Advanced Technology Attachment
SCAN = Single Client Access Name
SCSI = Small Computer System Interface
SGA = System Global Area
SLA = Service Level Agreement
SMF = Service Management Facility
SPFILE = Server Parameter FILE
SQL = Structured Query Language
SRDF = EMC Symmetrix Remote Data Facility
SSD = Solid State Disk
SVCTM = average SerViCe TiMe
SVM = Solaris Volume Manager
TB = TeraByte
TCP = Transmission Control Protocol
TDE = Transparent Data Encryption
TKPROF = Transient Kernel PROFile
TNS = Transparent Network Substrate
UDEV = Unix DEVice manager
UDP = User Datagram Protocol
UFG = Umbilicus ForeGround process
UFS = User FileSystem
VBG = Volume BackGround process
VCS = Veritas Cluster Server
VD = Voting Disk
VDBG = Volume Driver BackGround process
VIP = Virtual Internet Portocol
VLDB = Very Large DataBase
VM = Virtual Machine
VMB = Volume Membership BackGround process
VxFS = Veritas File System
VxVM = Veritas Volume Manager
XDB = XML DataBase