分布式存储crash导致ORACLE坏块原因分析

oracle运行在分布式存储中,我们在实验室模拟所有存储节点故障(整个存储断电),然后加电,数据库启动没有坏块,但是有客户在自己的环境做测试,发现数据库启动有大量坏块(几次测试问题依旧)。
客户异常环境报错汇总
模拟存储故障后,数据库报错信息,主要体现犹豫存储模拟故障,数据库读写io失败ORA-15081,ORA-27072,ORA-27061

Thread 2 cannot allocate new log, sequence 488
Checkpoint not complete
  Current log# 3 seq# 487 mem# 0: +DATADG/testdb/onlinelog/group_3.318.970157559
  Current log# 3 seq# 487 mem# 1: +ARCHDG/testdb/onlinelog/group_3.616.970157559
Thu Mar 08 16:17:08 2018
WARNING: Read Failed. group:2 disk:0 AU:6925 offset:49152 size:16384
WARNING: failed to read mirror side 1 of virtual extent 4 logical extent 0 of
file 313 in group [2.1286527068] from disk DATADG_0000  allocation unit 6925 reason error;
if possible, will try another mirror side
Thu Mar 08 16:17:08 2018
WARNING: Read Failed. group:2 disk:0 AU:6921 offset:16384 size:16384
WARNING: failed to read mirror side 1 of virtual extent 0 logical extent 0
of file 313 in group [2.1286527068] from disk DATADG_0000  allocation unit 6921 reason error;
 if possible, will try another mirror side
Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_lmon_21091.trc:
ORA-00202: control file: '+DATADG/testdb/controlfile/current.313.970157461'
ORA-15081: failed to submit an I/O operation to a disk
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 56729696
Additional information: -1
Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_lgwr_21111.trc:
ORA-00202: control file: '+DATADG/testdb/controlfile/current.313.970157461'
ORA-15081: failed to submit an I/O operation to a disk
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 56696864
Additional information: -1
Thu Mar 08 16:17:08 2018
WARNING: Write Failed. group:1 disk:0 AU:56970 offset:65536 size:16384
WARNING: Write Failed. group:2 disk:0 AU:6921 offset:65536 size:16384
…………
WARNING: failed to write mirror side 1 of virtual extent 147 logical extent
0 of file 335 in group 2 on disk 0 allocation unit 11789
Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_dbw0_21109.trc:
ORA-15080: synchronous I/O operation to a disk failed
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 8192
WARNING: failed to write mirror side 1 of virtual extent 147 logical extent
0 of file 335 in group 2 on disk 0 allocation unit 11789
KCF: read, write or open error, block=0xddec online=1
        file=18 '+DATADG/testdb/datafile/soe.333.970157833'
        error=15081 txt: ''
Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_dbw0_21109.trc:
Thu Mar 08 16:17:09 2018
System state dump requested by (instance=2, osid=21113 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_diag_21081_20180308161709.trc
Thu Mar 08 16:17:09 2018
ORA-1092 : opitsk aborting process
Thu Mar 08 16:17:12 2018
ORA-1092 : opitsk aborting process
Instance terminated by CKPT, pid = 21113

asm日志报错信息,主要也是体现在io的读写错误上Linux-x86_64 Error: 5: Input/output error

Thu Mar 08 16:17:20 2018
Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_9636.trc:
ORA-17503: ksfdopn:2 Failed to open file +DATADG/testdb/spfiletestdb.ora
ORA-15001: diskgroup "DATADG" does not exist or is not mounted
Thu Mar 08 16:17:37 2018
WARNING: Write Failed. group:3 disk:0 AU:1 offset:4190208 size:4096
Thu Mar 08 16:17:37 2018
WARNING: Write Failed. group:3 disk:0 AU:22 offset:143360 size:4096
Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_20679.trc:
ORA-15080: synchronous I/O operation to a disk failed
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 180504
Additional information: -1
WARNING: Hbeat write to PST disk 0.3916310704 in group 3 failed. [4]
WARNING: failed to write mirror side 1 of virtual extent 0 logical extent 0
of file 255 in group 3 on disk 0 allocation unit 22
Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_20679.trc:
ORA-15081: failed to submit an I/O operation to a disk
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 180504
Additional information: -1
Thu Mar 08 16:17:37 2018
NOTE: process _b001_+asm2 (9546) initiating offline of
disk 0.3916310704 (OCRDG_0000) with mask 0x7e in group 3
NOTE: initiating PST update: grp = 3, dsk = 0/0xe96e28b0, mask = 0x6a, op = clear
GMON updating disk modes for group 3 at 15 for pid 27, osid 9546
ERROR: no read quorum in group: required 1, found 0 disks
Thu Mar 08 16:17:37 2018
NOTE: cache dismounting (not clean) group 3/0x4CAED85D (OCRDG)
WARNING: Offline for disk OCRDG_0000 in mode 0x7f failed.
NOTE: messaging CKPT to quiesce pins Unix process pid: 9505, image: oracle@dbtest02 (B000)
Thu Mar 08 16:17:37 2018
NOTE: halting all I/Os to diskgroup 3 (OCRDG)
Thu Mar 08 16:17:37 2018
NOTE: LGWR doing non-clean dismount of group 3 (OCRDG)
NOTE: LGWR sync ABA=11.6 last written ABA 11.6
WARNING: Write Failed. group:3 disk:0 AU:22 offset:102400 size:4096
Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_20679.trc:
ORA-15080: synchronous I/O operation to a disk failed
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 180424
Additional information: -1
WARNING: failed to write mirror side 1 of virtual extent 0 logical extent
0 of file 255 in group 3 on disk 0 allocation unit 22
Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_20679.trc:
ORA-15081: failed to submit an I/O operation to a disk
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 180424
Additional information: -1
WARNING: Write Failed. group:3 disk:0 AU:22 offset:147456 size:4096
Errors in file /oracle/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_20679.trc:
ORA-15080: synchronous I/O operation to a disk failed
ORA-27072: File I/O error
…………
Thu Mar 08 16:17:37 2018
ERROR: -9(Error 27061, OS Error (Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 512)
)
ERROR: -9(Error 27061, OS Error (Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 512)
)
ERROR: -9(Error 27061, OS Error (Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 512)
)

存储恢复之后,asm mount磁盘组没有报任何错误

SQL> ALTER DISKGROUP ALL MOUNT /* asm agent call crs *//* {0:0:2} */
NOTE: Diskgroups listed in ASM_DISKGROUPS are
         ARCHDG
         DATADG
NOTE: Diskgroup used for Voting files is:
         OCRDG
Diskgroup with spfile:OCRDG
Diskgroup used for OCR is:OCRDG
NOTE: cache registered group ARCHDG number=1 incarn=0xb81889ff
NOTE: cache began mount (first) of group ARCHDG number=1 incarn=0xb81889ff
NOTE: cache registered group DATADG number=2 incarn=0xb8288a00
NOTE: cache began mount (first) of group DATADG number=2 incarn=0xb8288a00
NOTE: cache registered group OCRDG number=3 incarn=0xb8288a01
NOTE: cache began mount (first) of group OCRDG number=3 incarn=0xb8288a01
NOTE: Loaded library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so
NOTE: Assigning number (1,0) to disk (/dev/oracleasm/disks/ARCHDISK01)
NOTE: Assigning number (2,0) to disk (/dev/oracleasm/disks/DATADISK01)
NOTE: Assigning number (3,0) to disk (/dev/oracleasm/disks/OCRDISK01)
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 5 for pid 23, osid 3303
NOTE: cache opening disk 0 of grp 1: ARCHDG_0000 path:/dev/oracleasm/disks/ARCHDISK01
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache mounting (first) external redundancy group 1/0xB81889FF (ARCHDG)
* allocate domain 1, invalid = TRUE
NOTE: attached to recovery domain 1
NOTE: starting recovery of thread=1 ckpt=13.6287 group=1 (ARCHDG)
NOTE: starting recovery of thread=2 ckpt=12.7985 group=1 (ARCHDG)
NOTE: advancing ckpt for group 1 (ARCHDG) thread=1 ckpt=13.6287
NOTE: advancing ckpt for group 1 (ARCHDG) thread=2 ckpt=12.7985
NOTE: cache recovered group 1 to fcn 0.122934
NOTE: redo buffer size is 256 blocks (1053184 bytes)
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (ARCHDG)
Process LGWR (pid 3287) is running at high priority QoS for Exadata I/O
NOTE: LGWR found thread 1 closed at ABA 13.6286
NOTE: LGWR mounted thread 1 for diskgroup 1 (ARCHDG)
NOTE: LGWR opening thread 1 at fcn 0.122934 ABA 14.6287
NOTE: cache mounting group 1/0xB81889FF (ARCHDG) succeeded
NOTE: cache ending mount (success) of group ARCHDG number=1 incarn=0xb81889ff
NOTE: GMON heartbeating for grp 2
GMON querying group 2 at 7 for pid 23, osid 3303
NOTE: cache opening disk 0 of grp 2: DATADG_0000 path:/dev/oracleasm/disks/DATADISK01
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache mounting (first) external redundancy group 2/0xB8288A00 (DATADG)
* allocate domain 2, invalid = TRUE
NOTE: attached to recovery domain 2
NOTE: starting recovery of thread=1 ckpt=12.2323 group=2 (DATADG)
NOTE: starting recovery of thread=2 ckpt=11.2681 group=2 (DATADG)
NOTE: advancing ckpt for group 2 (DATADG) thread=1 ckpt=12.2326
NOTE: advancing ckpt for group 2 (DATADG) thread=2 ckpt=11.2687
NOTE: cache recovered group 2 to fcn 0.21395
NOTE: redo buffer size is 512 blocks (2101760 bytes)
NOTE: LGWR attempting to mount thread 1 for diskgroup 2 (DATADG)
NOTE: LGWR found thread 1 closed at ABA 12.2325
NOTE: LGWR mounted thread 1 for diskgroup 2 (DATADG)
NOTE: LGWR opening thread 1 at fcn 0.21395 ABA 13.2326
NOTE: cache mounting group 2/0xB8288A00 (DATADG) succeeded
NOTE: cache ending mount (success) of group DATADG number=2 incarn=0xb8288a00
NOTE: GMON heartbeating for grp 3
GMON querying group 3 at 9 for pid 23, osid 3303
NOTE: cache opening disk 0 of grp 3: OCRDG_0000 path:/dev/oracleasm/disks/OCRDISK01
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache mounting (first) external redundancy group 3/0xB8288A01 (OCRDG)
* allocate domain 3, invalid = TRUE
Thu Mar 08 17:00:24 2018
NOTE: attached to recovery domain 3
NOTE: starting recovery of thread=1 ckpt=13.55 group=3 (OCRDG)
NOTE: starting recovery of thread=2 ckpt=11.7 group=3 (OCRDG)
NOTE: advancing ckpt for group 3 (OCRDG) thread=1 ckpt=13.55
NOTE: advancing ckpt for group 3 (OCRDG) thread=2 ckpt=11.7
NOTE: cache recovered group 3 to fcn 0.851
NOTE: redo buffer size is 512 blocks (2101760 bytes)
Thu Mar 08 17:00:24 2018
NOTE: LGWR attempting to mount thread 1 for diskgroup 3 (OCRDG)
NOTE: LGWR found thread 1 closed at ABA 13.54
NOTE: LGWR mounted thread 1 for diskgroup 3 (OCRDG)
NOTE: LGWR opening thread 1 at fcn 0.851 ABA 14.55
NOTE: cache mounting group 3/0xB8288A01 (OCRDG) succeeded
NOTE: cache ending mount (success) of group OCRDG number=3 incarn=0xb8288a01
Thu Mar 08 17:00:24 2018
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup ARCHDG was mounted
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 2
SUCCESS: diskgroup DATADG was mounted
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 3
SUCCESS: diskgroup OCRDG was mounted
SUCCESS: ALTER DISKGROUP ALL MOUNT /* asm agent call crs *//* {0:0:2} */

数据库虽然启动成功了,但是大量的坏块生产ORA-01578,ORA-01110

ALTER DATABASE OPEN /* db agent *//* {2:40834:2} */
This instance was first to open
Beginning crash recovery of 2 threads
 parallel recovery started with 7 processes
Started redo scan
Completed redo scan
 read 107566 KB redo, 10569 data blocks need recovery
Started redo application at
 Thread 1: logseq 767, block 101851
 Thread 2: logseq 486, block 91861
Recovery of Online Redo Log: Thread 1 Group 1 Seq 767 Reading mem 0
  Mem# 0: +DATADG/testdb/onlinelog/group_1.314.970157463
  Mem# 1: +ARCHDG/testdb/onlinelog/group_1.614.970157463
Recovery of Online Redo Log: Thread 2 Group 4 Seq 486 Reading mem 0
  Mem# 0: +DATADG/testdb/onlinelog/group_4.319.970157561
  Mem# 1: +ARCHDG/testdb/onlinelog/group_4.617.970157561
Recovery of Online Redo Log: Thread 1 Group 2 Seq 768 Reading mem 0
  Mem# 0: +DATADG/testdb/onlinelog/group_2.315.970157463
  Mem# 1: +ARCHDG/testdb/onlinelog/group_2.615.970157463
Recovery of Online Redo Log: Thread 2 Group 3 Seq 487 Reading mem 0
  Mem# 0: +DATADG/testdb/onlinelog/group_3.318.970157559
  Mem# 1: +ARCHDG/testdb/onlinelog/group_3.616.970157559
Thu Mar 08 17:01:11 2018
Hex dump of (file 12, block 126469) in trace file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_p001_4090.trc
Corrupt block relative dba: 0x0301ee05 (file 12, block 126469)
Fractured block found during crash/instance recovery
Data in bad block:
 type: 32 format: 2 rdba: 0x0301ee05
 last change scn: 0x0000.00446cbb seq: 0x1 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xeac22001
 check value in block header: 0xbd0c
 computed block checksum: 0x8679
Reading datafile '+DATADG/testdb/datafile/soe.327.970157825' for corruption at rdba: 0x0301ee05 (file 12, block 126469)
Reread (file 12, block 126469) found same corrupt data (no logical check)
Completed redo application of 84.56MB
Completed crash recovery at
 Thread 1: logseq 768, block 102395, scn 4586960
 Thread 2: logseq 487, block 101664, scn 4587050
 10569 data blocks read, 10569 data blocks written, 107566 redo k-bytes read
ARCH: STARTING ARCH PROCESSES
Thu Mar 08 17:01:12 2018
ARC0 started with pid=43, OS id=4151
Thu Mar 08 17:01:13 2018
ARC0: Archival started
ARCH: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Thu Mar 08 17:01:13 2018
ARC1 started with pid=44, OS id=4153
Thu Mar 08 17:01:13 2018
ARC2 started with pid=45, OS id=4155
ARC1: Archival started
ARC2: Archival started
Thu Mar 08 17:01:13 2018
ARC3 started with pid=46, OS id=4157
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Thread 1 advanced to log sequence 769 (thread recovery)
Picked broadcast on commit scheme to generate SCNs
Thu Mar 08 17:01:13 2018
Thread 2 advanced to log sequence 488 (thread open)
Thread 2 opened at log sequence 488
  Current log# 4 seq# 488 mem# 0: +DATADG/testdb/onlinelog/group_4.319.970157561
  Current log# 4 seq# 488 mem# 1: +ARCHDG/testdb/onlinelog/group_4.617.970157561
Successful open of redo thread 2
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Mar 08 17:01:13 2018
SMON: enabling cache recovery
Archived Log entry 1249 added for thread 2 sequence 487 ID 0xa41f5c11 dest 1:
[4046] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:4294332240 end:4294332460 diff:220 (2 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is AL32UTF8
No Resource Manager plan active
Starting background process GTX0
Thu Mar 08 17:01:14 2018
GTX0 started with pid=55, OS id=4181
Starting background process RCBG
Thu Mar 08 17:01:14 2018
RCBG started with pid=57, OS id=4185
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Thu Mar 08 17:01:14 2018
QMNC started with pid=61, OS id=4195
Thu Mar 08 17:01:15 2018
Hex dump of (file 9, block 43523) in trace file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_p009_4175.trc
Corrupt block relative dba: 0x0240aa03 (file 9, block 43523)
Fractured block found during buffer read
Data in bad block:
 type: 32 format: 2 rdba: 0x0240aa03
 last change scn: 0x0000.0044ccc4 seq: 0x1 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xd94f2001
 check value in block header: 0xb09c
 computed block checksum: 0x158b
Reading datafile '+DATADG/testdb/datafile/soe.324.970157821' for corruption at rdba: 0x0240aa03 (file 9, block 43523)
Reread (file 9, block 43523) found same corrupt data (no logical check)
Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_p009_4175.trc  (incident=216418):
ORA-01578: ORACLE data block corrupted (file # 9, block # 43523)
ORA-01110: data file 9: '+DATADG/testdb/datafile/soe.324.970157821'
Incident details in: /oracle/app/db/diag/rdbms/testdb/testdb2/incident/incdir_216418/testdb2_p009_4175_i216418.trc
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Thu Mar 08 17:01:15 2018
Hex dump of (file 14, block 71173) in trace file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_p003_4094.trc
Corrupt block relative dba: 0x03811605 (file 14, block 71173)
Fractured block found during buffer read
Data in bad block:
 type: 32 format: 2 rdba: 0x03811605
 last change scn: 0x0000.0043916d seq: 0x1 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x93662001
 check value in block header: 0xad98
 computed block checksum: 0x20b
Reading datafile '+DATADG/testdb/datafile/soe.329.970157831' for corruption at rdba: 0x03811605 (file 14, block 71173)
Reread (file 14, block 71173) found same corrupt data (no logical check)
Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_p003_4094.trc  (incident=216306):
ORA-01578: ORACLE data block corrupted (file # 14, block # 71173)
ORA-01110: data file 14: '+DATADG/testdb/datafile/soe.329.970157831'
Incident details in: /oracle/app/db/diag/rdbms/testdb/testdb2/incident/incdir_216306/testdb2_p003_4094_i216306.trc
Hex dump of (file 12, block 112385) in trace file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_smon_4005.trc
Corrupt block relative dba: 0x0301b701 (file 12, block 112385)
Fractured block found during buffer read
Data in bad block:
 type: 6 format: 2 rdba: 0x0301b701
 last change scn: 0x0000.003cb146 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x1d030601
 check value in block header: 0x1e62
 computed block checksum: 0x7a47
Reading datafile '+DATADG/testdb/datafile/soe.327.970157825' for corruption at rdba: 0x0301b701 (file 12, block 112385)
Reread (file 12, block 112385) found same corrupt data (no logical check)
Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_smon_4005.trc  (incident=216170):
ORA-01578: ORACLE data block corrupted (file # 12, block # 112385)
ORA-01110: data file 12: '+DATADG/testdb/datafile/soe.327.970157825'
Incident details in: /oracle/app/db/diag/rdbms/testdb/testdb2/incident/incdir_216170/testdb2_smon_4005_i216170.trc
Errors in file /oracle/app/db/diag/rdbms/testdb/testdb2/trace/testdb2_p003_4094.trc  (incident=216307):
ORA-01578: ORACLE data block corrupted (file # , block # )
Incident details in: /oracle/app/db/diag/rdbms/testdb/testdb2/incident/incdir_216307/testdb2_p003_4094_i216307.trc
Completed: ALTER DATABASE OPEN /* db agent *//* {2:40834:2} */

rman 检查坏块信息,发现有大量坏块,而且主要坏块类型是FRACTURED(折断)

SQL> select * from v$database_block_corruption;
     FILE#     BLOCK#	  BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
	12     126469	       1		  0 FRACTURED
	12     112385	       1		  0 FRACTURED
	14	71173	       1		  0 FRACTURED
	 9	43523	       1		  0 FRACTURED
	15	44801	       1		  0 FRACTURED
	15	44855	       1		  0 FRACTURED
	15	44862	       1		  0 FRACTURED
	11	56252	       1		  0 FRACTURED
	12     116481	       1		  0 CHECKSUM
	12     116535	       1		  0 FRACTURED
	15	46903	       1		  0 FRACTURED
	12     108289	       1		  0 FRACTURED
	12     108343	       1		  0 FRACTURED
	 8	95872	       1		  0 FRACTURED
	 8	99712	       1		  0 FRACTURED
	 8     102783	       1		  0 FRACTURED
	 8     104974	       1		  0 FRACTURED
	 8     105856	       1		  0 FRACTURED
	 8     105998	       1		  0 FRACTURED
	 8     109120	       1		  0 FRACTURED
	 8     110016	       1		  0 FRACTURED
	 8     110092	       1		  0 FRACTURED
	 8     116226	       1		  0 FRACTURED
	 8     116231	       1		  0 FRACTURED
	 8     119295	       1		  0 FRACTURED
	 8     130559	       1		  0 FRACTURED
	12     113163	       1		  0 FRACTURED
	12     118283	       1		  0 FRACTURED
	12     119553	       1		  0 FRACTURED
	12     119608	       1		  0 FRACTURED
	12     120576	       1		  0 FRACTURED
	12     120632	       1		  0 FRACTURED
	12     120639	       1		  0 FRACTURED
	12     123407	       1		  0 FRACTURED
	 9	38411	       1		  0 FRACTURED
	 9	67647	       1		  0 FRACTURED
	 9     109068	       1		  0 FRACTURED
	 9     109070	       1		  0 FRACTURED
	11	99839	       1		  0 FRACTURED
	11     104960	       1		  0 FRACTURED
	13	63214	       1		  0 FRACTURED
	13	65097	       1		  0 FRACTURED
	13	71183	       1		  0 FRACTURED
	13	77319	       1		  0 FRACTURED
	13     103104	       1		  0 FRACTURED
	13     107072	       1		  0 FRACTURED
	13     110082	       1		  0 FRACTURED
	13     111106	       1		  0 FRACTURED
	13     114752	       1		  0 FRACTURED
	14	72196	       1		  0 FRACTURED
	14	75275	       1		  0 FRACTURED
	 7	76293	       1		  0 FRACTURED
	 7	76296	       1		  0 FRACTURED
	 7	76415	       1		  0 FRACTURED
	 7	76864	       1		  0 FRACTURED
	15	49976	       1		  0 FRACTURED
	15	81413	       1		  0 FRACTURED
	 2	61512	       1		  0 FRACTURED
	19	35338	       1		  0 FRACTURED
	19	42687	       1		  0 FRACTURED
60 rows selected.

对比实验室和客户环境配置
实验室环境

[oracle@i-q2ghx82t ~]$ more /etc/issue
CentOS release 6.8 (Final)
Kernel \r on an \m
SQL> select path from v$asm_disk;
PATH
----------------------------------------
/dev/asm-disk011
/dev/asm-disk001
/dev/asm-disk014
/dev/asm-disk002
/dev/asm-disk003
/dev/asm-disk015
/dev/asm-disk012
/dev/asm-disk013
8 rows selected.
[root@i-q2ghx82t ~]# fdisk -l|grep vd
Disk /dev/vda: 107.4 GB, 107374182400 bytes
/dev/vda1   *           1       13055   104856576   83  Linux
Disk /dev/vdb: 34.4 GB, 34359738368 bytes
Disk /dev/vdc: 107.4 GB, 107374182400 bytes
Disk /dev/vdd: 268.4 GB, 268435456000 bytes
/dev/vdd1               1       32635   262140606   83  Linux
Disk /dev/vde: 1073.7 GB, 1073741824000 bytes
Disk /dev/vdf: 53.7 GB, 53687091200 bytes
/dev/vdf1               1      104025    52428568+  83  Linux
Disk /dev/vdg: 107.4 GB, 107374182400 bytes
Disk /dev/vdh: 1073.7 GB, 1073741824000 bytes
Disk /dev/vdi: 107.4 GB, 107374182400 bytes
Disk /dev/vdj: 1073.7 GB, 1073741824000 bytes
Disk /dev/vdk: 1073.7 GB, 1073741824000 bytes
Disk /dev/vdl: 1073.7 GB, 1073741824000 bytes
[oracle@i-q2ghx82t ~]$ ls -l /dev/asm-*
lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk001 -> vdg
lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk002 -> vdc
lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk003 -> vdi
lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk011 -> vde
lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk012 -> vdh
lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk013 -> vdl
lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk014 -> vdj
lrwxrwxrwx 1 root root 3 Mar 10 17:05 /dev/asm-disk015 -> vdk
[root@i-q2ghx82t ~]# fdisk -l /dev/vdj
Disk /dev/vdj: 1073.7 GB, 1073741824000 bytes
16 heads, 63 sectors/track, 2080507 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

客户环境

[oracle@dbtest02 ~]$ more /etc/issue
Red Hat Enterprise Linux Server release 6.6 (Santiago)
Kernel \r on an \m
SQL> select path from v$asm_disk;
PATH
----------------------------------------
/dev/oracleasm/disks/ARCHDISK01
/dev/oracleasm/disks/OCRDISK01
/dev/oracleasm/disks/DATADISK01
[root@dbtest02 ~]# oracleasm listdisks
ARCHDISK01
DATADISK01
OCRDISK01
[root@dbtest02 ~]# oracleasm querydisk -p DATADISK01
Disk "DATADISK01" is a valid ASM disk
/dev/vdc1: LABEL="DATADISK01" TYPE="oracleasm"
[root@dbtest02 ~]# oracleasm querydisk -p ARCHDISK01
Disk "ARCHDISK01" is a valid ASM disk
/dev/vde1: LABEL="ARCHDISK01" TYPE="oracleasm"
[root@dbtest02 ~]# oracleasm querydisk -p OCRDISK01
Disk "OCRDISK01" is a valid ASM disk
/dev/vdd1: LABEL="OCRDISK01" TYPE="oracleasm"
[root@dbtest02 ~]# fdisk -l|grep vd
Disk /dev/vda: 53.7 GB, 53687091200 bytes
/dev/vda1               1        6528    52427776   83  Linux
Disk /dev/vdb: 17.2 GB, 17179869184 bytes
Disk /dev/vdc: 1073.7 GB, 1073741824000 bytes
/dev/vdc1               1     2080507  1048575496+  83  Linux
Disk /dev/vdd: 107.4 GB, 107374182400 bytes
/dev/vdd1               1      208050   104857168+  83  Linux
Disk /dev/vde: 536.9 GB, 536870912000 bytes
/dev/vde1               1     1040253   524287480+  83  Linux
Disk /dev/vdf: 859.0 GB, 858993459200 bytes
/dev/vdf1               1     1664406   838860592+  83  Linux
[root@dbtest02 ~]# fdisk -l /dev/vdf
Disk /dev/vdf: 859.0 GB, 858993459200 bytes
16 heads, 63 sectors/track, 1664406 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x1bdbf439
   Device Boot      Start         End      Blocks   Id  System
/dev/vdf1               1     1664406   838860592+  83  Linux
Partition 1 does not start on physical sector boundary.

这里比较明显:1)实验室环境使用的是udev,而客户环境使用的是asmlib.2)客户环境使用了分区,而实验室环境直接使用裸盘.

尝试在实验室环境使用asmlib

[root@i-q2ghx82t tmp]# ls -l *oracleasm*
-rw-r--r-- 1 root root 36288 Mar 10 13:09 kmod-oracleasm-2.0.8-13.el6_8.x86_64.rpm
-rw-r--r-- 1 root root 17168 Mar 10 13:09 oracleasmlib-2.0.12-1.el6.x86_64.rpm
-rw-r--r-- 1 root root 74984 Mar 10 13:09 oracleasm-support-2.1.8-1.el6.x86_64.rpm
[root@i-q2ghx82t tmp]# rpm -ivh kmod-oracleasm-2.0.8-13.el6_8.x86_64.rpm
warning: kmod-oracleasm-2.0.8-13.el6_8.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID ec551f03: NOKEY
Preparing...                ########################################### [100%]
   1:kmod-oracleasm         ########################################### [100%]
[root@i-q2ghx82t tmp]# rpm -ivh oracleasm*.rpm
warning: oracleasmlib-2.0.12-1.el6.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID ec551f03: NOKEY
Preparing...                ########################################### [100%]
   1:oracleasm-support      ########################################### [ 50%]
   2:oracleasmlib           ########################################### [100%]
[root@i-q2ghx82t tmp]#
[root@i-q2ghx82t tmp]# rpm -aq|grep oracleasm
kmod-oracleasm-2.0.8-13.el6_8.x86_64
oracleasmlib-2.0.12-1.el6.x86_64
oracleasm-support-2.1.8-1.el6.x86_64
[root@i-q2ghx82t dev]# oracleasm createdisk ASMVDL /dev/vdl
Device "/dev/vdl" is not a partition

测试比较明显,对于没有分区的分布式存储中的盘,无法直接被asmlib创建asm disk.如果要使用asmlib,需要对磁盘进行分区.

存储特征
通过咨询分布式存储开发,确认几点:
0. 整个存储是基于网络技术实现分布式存储内部数据同步
1. 整个分布式存储使用ssd卡,默认三副本
2. 三副本中有两个副本写入成功才会反馈给应用(数据库),io成功
3. 整个分布式存储,没有引入任何的cache.
4. 存储的最小io单元是4k,由于ssd卡决定的
5. 分布式存储划分出来的lun是自动实现快对齐的

故障原因
由于存储最小单元是4k,但是在os层面物理扇区为4k,逻辑扇区为512,客户为了使用asmlib,对磁盘进行了默认分区,而没有考虑块对齐.会使得在存储整体crash的时候,很多block由于没有块对齐,很可能被进行了拆分(本来一个4k的io,存储上对对应的一个原子io完成,但是现在这个os层面的4k被拆分成了多个io的可能性,使得os层面的4k的io无法有了原则性),进而使得在数据库主机,存储突发异常的时候,有可能发生坏块.

处理建议
1. 在linux 6开始,对于asmlib,建议使用udev
2. 如果坚持asmlib,分区的时候考虑块对齐,不然会出现较大的性能下降,而且还会引起坏块的风险

ORA-12754: Feature 'startup' is disabled due to missing capability 'Runtime Environment'

ORACLE 18C启动报ORA-12754
从http://edelivery.oracle.com网上可以下载oracle 18c for exadata,在普通的linux平台可以正常安装,但是无法正常启动实例(dbca无法创建库),报错信息为:ORA-12754: Feature ‘startup’ is disabled due to missing capability ‘Runtime Environment’.
dbca-12754


由于普通的linux而非真实oracle一体机,而该版本中做了限制,因此无法启动数据库.根据oracle的作风,对于引进的新特性,oracle一般都提供隐含参数或者event控制,通过分析确定有_exadata_feature_on参数进行控制.
修改脚本创建库参数
由于我们无法在dbca图形化的界面中增加新参数,因此可以通过生成创建库脚本,然后通过修改参数文件,增加_exadata_feature_on=true

[oracle@xifenfei scripts]$ ls -ltr *.ora
-rw-r----- 1 oracle oinstall 2252 Feb 28 09:22 init.ora
-rw-r----- 1 oracle oinstall 2264 Feb 28 09:28 initxffdbTempOMF.ora
-rw-r----- 1 oracle oinstall 2386 Feb 28 09:29 initxffdbTemp.ora

然后重新执行xffdb.sh脚本创建库

[oracle@xifenfei scripts]$ sh xffdb.sh
You should Add this entry in the /etc/oratab: xffdb:/u02/soft:Y
SQL*Plus: Release 18.0.0.0.0 Production on Wed Feb 28 09:29:48 2018
Version 18.1.0.0.0
Copyright (c) 1982, 2017, Oracle.  All rights reserved.
Enter new password for SYS:
Enter new password for SYSTEM:
Enter password for SYS:
Connected to an idle instance.
SQL> spool /u02/app/oracle/admin/xffdb/scripts/CloneRmanRestore.log append
SQL> startup mount pfile="/u02/app/oracle/admin/xffdb/scripts/initxffdbTempOMF.ora";
ORACLE instance started.
Total System Global Area 2432694552 bytes
Fixed Size                  8898840 bytes
Variable Size             654311424 bytes
Database Buffers         1761607680 bytes
Redo Buffers                7876608 bytes
Database mounted.
SQL> execute dbms_backup_restore.resetCfileSection(dbms_backup_restore.RTYP_DFILE_COPY);
PL/SQL procedure successfully completed.
SQL> execute dbms_backup_restore.resetCfileSection(13);
PL/SQL procedure successfully completed.
SQL> host /u02/soft/bin/rman @/u02/app/oracle/admin/xffdb/scripts/rmanRestoreDatafiles.sql &&sysPassword;
Recovery Manager: Release 18.0.0.0.0 - Production on Wed Feb 28 09:30:08 2018
Version 18.1.0.0.0
Copyright (c) 1982, 2018, Oracle and/or its affiliates.  All rights reserved.
RMAN> connect target *
2>
3> CATALOG START WITH   '/u02/soft/assistants/dbca/templates//Seed_Database.dfb'  NOPROMPT  ;
4>
5> RUN {
6>
7> set newname for datafile 1 to  '/u02/app/oracle/oradata/XFFDB/system01.dbf' ;
8>
9> set newname for datafile 3 to  '/u02/app/oracle/oradata/XFFDB/sysaux01.dbf' ;
10>
11> set newname for datafile 4 to  '/u02/app/oracle/oradata/XFFDB/undotbs01.dbf' ;
12>
13> set newname for datafile 7 to  '/u02/app/oracle/oradata/XFFDB/users01.dbf' ;
14>
15> restore datafile 1;
16>
17> restore datafile 3;
18>
19> restore datafile 4;
20>
21> restore datafile 7; }
22>
connected to target database: SEEDDATA (DBID=1214140, not open)
……………………
SQL> set echo on
SQL> spool /u02/app/oracle/admin/xffdb/scripts/postPDBCreation.log append
SQL> select TABLESPACE_NAME from cdb_tablespaces a,dba_pdbs b where a.con_id=b.con_id and UPPER(b.pdb_name)=UPPER('pdb');
TABLESPACE_NAME
------------------------------
SYSTEM
SYSAUX
UNDOTBS1
TEMP
USERS
SQL> connect "SYS"/"&&sysPassword" as SYSDBA
Connected.
SQL> alter session set container=pdb;
Session altered.
SQL> set echo on
SQL> spool /u02/app/oracle/admin/xffdb/scripts/postPDBCreation.log append
SQL> Select count(*) from dba_registry where comp_id = 'DV' and status='VALID';
  COUNT(*)
----------
         1
SQL> alter session set container=CDB$ROOT;
Session altered.
SQL> exit;
Disconnected from Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.1.0.0.0
[oracle@xifenfei scripts]$

登录oracle 18c

[oracle@xifenfei scripts]$ sqlplus / as sysdba
SQL*Plus: Release 18.0.0.0.0 Production on Wed Feb 28 09:40:08 2018
Version 18.1.0.0.0
Copyright (c) 1982, 2017, Oracle.  All rights reserved.
Connected to:
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.1.0.0.0
SQL> set lines 150
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
BANNER_FULL
------------------------------------------------------------------------------------------------------
BANNER_LEGACY                                                                        CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.1.0.0.0
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production                    0
SQL> col name for a52
SQL> col value for a24
SQL> col description for a50
SQL> set linesize 150
SQL> select a.ksppinm name,b.ksppstvl value,a.ksppdesc description
  2    from x$ksppi a,x$ksppcv b
  3   where a.inst_id = USERENV ('Instance')
  4     and b.inst_id = USERENV ('Instance')
  5     and a.indx = b.indx
  6     and upper(a.ksppinm) LIKE upper('%&param%')
  7  order by name
  8  /
Enter value for param: _exadata_feature_on
old   6:    and upper(a.ksppinm) LIKE upper('%&param%')
new   6:    and upper(a.ksppinm) LIKE upper('%_exadata_feature_on%')
NAME                                                 VALUE                    DESCRIPTION
---------------------------------------------------- ------------------------ -------------------------
_exadata_feature_on                                  TRUE                     Exadata Feature On
SQL> show parameter name;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cdb_cluster_name                     string
cell_offloadgroup_name               string
db_file_name_convert                 string
db_name                              string      xffdb
db_unique_name                       string      xffdb
global_names                         boolean     FALSE
instance_name                        string      xffdb
lock_name_space                      string
log_file_name_convert                string
pdb_file_name_convert                string
processor_group_name                 string
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
service_names                        string      xffdb
SQL>

OSD-04016: 异步 I/O 请求排队时出错

有某客户由于硬件故障,导致数据库无法启动,让我们介入处理
数据库启动报错

Mon Feb 26 17:28:24 2018
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 3 processes
Started redo scan
Completed redo scan
 read 2054 KB redo, 509 data blocks need recovery
Started redo application at
 Thread 1: logseq 41341, block 54
Recovery of Online Redo Log: Thread 1 Group 1 Seq 41341 Reading mem 0
  Mem# 0: E:\ORADATA\ORCL\REDO01.LOG
Completed redo application of 1.77MB
KCF: read, write or open error, block=0x16439 online=1
        file=1 'E:\ORADATA\ORCL\SYSTEM01.DBF'
        error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。'
Errors in file d:\oracle\diag\rdbms\orcl\orcl\trace\orcl_dbw0_4928.trc:
ORA-01243: system tablespace file suffered media failure
ORA-01114: IO error writing block to file 1 (block # 91193)
ORA-01110: data file 1: 'E:\ORADATA\ORCL\SYSTEM01.DBF'
ORA-27070: async read/write failed
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。
DBW0 (ospid: 4928): terminating the instance due to error 1243
Mon Feb 26 17:28:29 2018
Instance terminated by DBW0, pid = 4928

20180227223540


这里错误比较明显,由于io错误,在数据库实例恢复之时,写回block正好在该损坏位置从而使得数据库无法正常实例恢复,进而无法open.

dbv验证文件
通过专业工具对system文件进行了重构system文件,然后dbv检查结果如下

DBVERIFY: Release 11.2.0.1.0 - Production on 星期二 2月 27 17:20:14 2018
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
DBVERIFY - 开始验证: FILE = D:\OK\SYSTEM01.DBF
页 91156 标记为损坏
Corrupt block relative dba: 0x00416414 (file 1, block 91156)
Bad header found during dbv:
Data in bad block:
 type: 229 format: 6 rdba: 0xe1d9e3e7
 last change scn: 0xd682.ffc8c7cb seq: 0x8c flg: 0x8c
 spare1: 0xc0 spare2: 0xf6 spare3: 0x70b3
 consistency value in tail: 0x71f50602
 check value in block header: 0x8195
 computed block checksum: 0x8689
DBVERIFY - 验证完成
检查的页总数: 215040
处理的页总数 (数据): 178112
失败的页总数 (数据): 0
处理的页总数 (索引): 19070
失败的页总数 (索引): 0
处理的页总数 (其他): 3118
处理的总页数 (段)  : 1
失败的总页数 (段)  : 0
空的页总数: 14739
标记为损坏的总页数: 1
流入的页总数: 0
加密的总页数        : 0
最高块 SCN            : 1638554501 (0.1638554501)

这里比较明显重构出来的system文件只有block 91156坏块,这里注意和没有处理之前的坏块不一样

通过dump分析坏块所属对象

Start dump data block from file D:\OK\SYSTEM01.DBF minblk 91156 maxblk 91156
 V10 STYLE FILE HEADER:
	Compatibility Vsn = 186646528=0xb200000
	Db ID=1383974140=0x527dc4fc, Db Name='ORCL'
	Activation ID=0=0x0
	Control Seq=806694=0xc4f26, File size=215040=0x34800
	File Number=1, Blksiz=8192, File Type=3 DATA
Dump all the blocks in range:
buffer tsn: 0 rdba: 0xe1d9e3e7 (903/1696743)
scn: 0xd682.ffc8c7cb seq: 0x8c flg: 0x8c tail: 0x71f50602
frmt: 0x06 chkval: 0x8195 type: 0xe5=unknown
Hex dump of corrupt header 4 = CORRUPT
Start dump data block from file D:\OK\SYSTEM01.DBF minblk 91155 maxblk 91155
 V10 STYLE FILE HEADER:
	Compatibility Vsn = 186646528=0xb200000
	Db ID=1383974140=0x527dc4fc, Db Name='ORCL'
	Activation ID=0=0x0
	Control Seq=806694=0xc4f26, File size=215040=0x34800
	File Number=1, Blksiz=8192, File Type=3 DATA
Dump all the blocks in range:
buffer tsn: 0 rdba: 0x00416413 (1/91155)
scn: 0x0000.613ad8d6 seq: 0x01 flg: 0x06 tail: 0xd8d60601
frmt: 0x02 chkval: 0xcc5f type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Block header dump:  0x00416413
 Object id on Block? Y
 seg/obj: 0x25  csc: 0x00.613ad8ce  itc: 2  flg: -  typ: 2 - INDEX
     fsl: 0  fnx: 0x0 ver: 0x01
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0009.000.0016fda0  0x00c0130e.6f38.01  CB--    0  scn 0x0000.3e6ed294
0x02   0x0009.008.0023e862  0x00c002cf.a217.13  --U-    1  fsc 0x0000.613ad8d6
Start dump data block from file D:\OK\SYSTEM01.DBF minblk 91157 maxblk 91157
 V10 STYLE FILE HEADER:
	Compatibility Vsn = 186646528=0xb200000
	Db ID=1383974140=0x527dc4fc, Db Name='ORCL'
	Activation ID=0=0x0
	Control Seq=806694=0xc4f26, File size=215040=0x34800
	File Number=1, Blksiz=8192, File Type=3 DATA
Dump all the blocks in range:
buffer tsn: 0 rdba: 0x00416415 (1/91157)
scn: 0x0000.6193dc0c seq: 0x01 flg: 0x06 tail: 0xdc0c0601
frmt: 0x02 chkval: 0x8c21 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Block header dump:  0x00416415
 Object id on Block? Y
 seg/obj: 0x25  csc: 0x00.6193dc04  itc: 2  flg: -  typ: 2 - INDEX
     fsl: 0  fnx: 0x0 ver: 0x01
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0006.00c.001ee103  0x00c008a5.8dbd.02  C---    0  scn 0x0000.57303e03
0x02   0x000b.00f.00053100  0x00c008af.1563.09  --U-    1  fsc 0x0000.6193dc0c

这里比较明显,可以确定坏块为index,object_id=0x25=37,通过查询其他库,确定为i_obj2(obj$的index)

使用该文件启动数据库

SQL> alter database rename file 'E:\ORADATA\ORCL\SYSTEM01.DBF' to 'd:\orcl\SYSTEM01.DBF';
数据库已更改。
SQL> recover database;
完成介质恢复。
SQL> alter database open;
数据库已更改。

检查alert日志
发现smon进程由于坏块的存储,出现大量报错,需要处理,不然数据库一段时间后就会crash.

Tue Feb 27 20:31:23 2018
QMNC started with pid=27, OS id=4652
Completed: alter database open
Tue Feb 27 20:31:25 2018
Starting background process CJQ0
Tue Feb 27 20:31:25 2018
CJQ0 started with pid=30, OS id=2172
Tue Feb 27 20:31:25 2018
db_recovery_file_dest_size of 3912 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Hex dump of (file 1, block 91156) in trace file d:\oracle\diag\rdbms\orcl\orcl\trace\orcl_cjq0_2172.trc
Corrupt block relative dba: 0x00416414 (file 1, block 91156)
Bad header found during multiblock buffer read
Data in bad block:
 type: 229 format: 6 rdba: 0xe1d9e3e7
 last change scn: 0xd682.ffc8c7cb seq: 0x8c flg: 0x8c
 spare1: 0xc0 spare2: 0xf6 spare3: 0x70b3
 consistency value in tail: 0x71f50602
 check value in block header: 0x8195
 computed block checksum: 0x8689
Reading datafile 'D:\ORCL\SYSTEM01.DBF' for corruption at rdba: 0x00416414 (file 1, block 91156)
Reread (file 1, block 91156) found same corrupt data
Errors in file d:\oracle\diag\rdbms\orcl\orcl\trace\orcl_smon_3992.trc  (incident=77085):
ORA-01578: ORACLE data block corrupted (file # 1, block # 91156)
ORA-01110: data file 1: 'D:\ORCL\SYSTEM01.DBF'
Incident details in: d:\oracle\diag\rdbms\orcl\orcl\incident\incdir_77085\orcl_smon_3992_i77085.trc
Errors in file d:\oracle\diag\rdbms\orcl\orcl\trace\orcl_cjq0_2172.trc  (incident=77221):
ORA-01578: ORACLE data block corrupted (file # 1, block # 91156)
ORA-01110: data file 1: 'D:\ORCL\SYSTEM01.DBF'
Incident details in: d:\oracle\diag\rdbms\orcl\orcl\incident\incdir_77221\orcl_cjq0_2172_i77221.trc

重建i_obj2 index,参考:http://www.xifenfei.com/?p=5566

18c新特性:alter system cancel sql

根据18c官方描述cancel sql功能是在18c中引起,但是实测发现在oracle 12.2中已经有了cancel sql功能,可以实现终止掉某个sql的当前sql正在执行的sql语句,而不是传统的直接kill某个会话.ALTER SYSTEM CANCEL SQL语句有四个参数分别为:
cancel_sql

--会话1
SQL> set lines 150
SQL> select * from v$version;
BANNER                                                                               CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production              0
PL/SQL Release 12.2.0.1.0 - Production                                                    0
CORE    12.2.0.1.0      Production                                                        0
TNS for Linux: Version 12.2.0.1.0 - Production                                            0
NLSRTL Version 12.2.0.1.0 - Production                                                    0
SQL> select sid, serial# from v$session where sid in
  2  (select  sid from v$mystat where rownum=1);
       SID    SERIAL#
---------- ----------
       278       4019
SQL> create table t_xifenfei tablespace users as select * from dba_source;
Table created.
SQL> insert into t_xifenfei select * from t_xifenfei;
274132 rows created.                    <<===没有提交
SQL> select count(*)from t_xifenfei;
  COUNT(*)
----------
    548264
SQL> insert into t_xifenfei select * from t_xifenfei;
548264 rows created.     <<===没有提交
SQL> select count(*)from t_xifenfei;
  COUNT(*)
----------
   1096528
SQL> insert into t_xifenfei select * from t_xifenfei;
--会话2
SQL> select count(*)from t_xifenfei;
  COUNT(*)
----------
    274132
SQL> alter system cancel sql '278,4019';
System altered.
SQL> select count(*)from t_xifenfei;
  COUNT(*)
----------
    274132
--会话1
SQL> insert into t_xifenfei select * from t_xifenfei;
insert into t_xifenfei select * from t_xifenfei
*
ERROR at line 1:
ORA-01013: user requested cancel of current operation
SQL> select count(*)from t_xifenfei;
  COUNT(*)
----------
   1096528

这里可以看到会话1的最后一个insert被cancel,但是前面两个没有提交的insert没有被回滚/提交,看到了cancel sql的功能的实现.

Oracle中的主要字母缩写含义

ACD = Active Change Directory
ACFS = ASM Cluster File System
ADDM = Automatic Database Diagnostic Monitor
ADR = Automatic Diagnostic Repository
ADVM = ASM Dynamic Volume Manager
AIO = Asynchronous I/O
AMDU = ASM Metadata Dump Utility
AMM = Automatic Memory Management
ARC = Archive process
ASH = Active Session History
ASM = Automatic Storage Management
ASMB = ASM Background process
ASMCA = ASM Configuration Assistant
ASMCMD = ASM CoMmanD line utility
ASMLIB = ASM LIBrary tool
ASMM = Automatic Shared Memory Management
ASMSNMP = ASM Simple Network Management Protocol
AT = Allocation Table
ATA = Advanced Technology Attachment
AU = Allocation Unit
AWR = Automatic Workload Repository
BH = Block Header
BMC = Baseboard Management Controller
BS = Block Size
CBO = Cost-Based Optimizer
CF = Control File
CFS = Cluster FileSystem
CHM = Cluster Health Monitor
CIO = Concurrent I/O
CKPT = ChecKPoinT process
CLUVFY = CLUster VeriFy utility
COD = Continuing Operation Directory
CPU = Central Processing Unit or Critical Patch Update
CRM = Customer Relationship Management
CRS = Cluster Ready Services
CSS = Cluster Synchronization Services
CSV = Comma-Separated Values
CVM = Cluster Volume Manager
DB = DataBase
DBCA = DataBase Configuration Assistant
DBFS = DataBase File System
DBM = DataBase Machine
DBMS = DataBase Management Systems
DBPERF = DataBase PERFormance
DBV = DataBase Verification tool
DBW = DataBase Writer process
DD = Disk Directory or Data Description tool
DES = Database Excelleration Systems Inc.
DG = DiskGroup
DH = Disk Header
DIO = Direct I/O
DISM = Dynamic Intimate Shared Memory
DM = Device Mapper
DNFS = Direct Network File System
DRAM = Dynamic Random Access Memory
DSS = Decision Support System
DUL = Data UnLoader
DW = Data Warehouse
EIDE = Enhanced Integrated Drive Electronics
ERP = Enterprise Resource Planning
ETA = Estimated Time of Arrival
ETL = Extract Transform Load
FD = File Directory
FG = FailGroup
FRA = Flash Recovery Area or Fast Recovery Area
FS = FileSystem
FST = Free Segments Table
FTS = Full Table Scan
GC = Grid Control
GI = Grid Infrastructure
GUI = Graphical User Interface
HA = High Availability
HARD = Hardware Assisted Resilient Data
HB = Heart Beat
HBA = Host Bus Adapter
IDE = Integrated Drive Electronics
IIS = Internet Information Services
INST = INSTance
IO = Input/Output
IOPS = IO Per Second
IOT = Index Organized Table
IP = Internet Protocol
IPMI = Intelligent Platform Management Interface
JET = Joint Escalation Team
JFS = Journaled FileSystem
KB = KiloByte
KFED = Kernel Files metadata EDitor
KFOD = Kernel Files Osm disk[group] Discovery
LGWR = redo LoG WRiter process
LLT = Veritas Low Latency Transport protocol
LSNR = LiSteNeR
LUN = Logical Unit Number
LVM = Logical Volume Manager
MAA = Maximum Availability Architecture
MB = MegaByte
NAS = Network Attached Storage
NetApp = Network Appliance
NETCA = NETwork Configuration Assistant
NFS = Network FileSystem
NIC = Network Interface Controller
OCFS = Oracle Cluster FileSystem
OCR = Oracle Cluster Registry
ODM = Oracle Disk Manager
ODS = Operational Data Store
OEL = Oracle Enterprise Linux
OEM = Oracle Enterprise Manager
OID = Oracle Internet Directory
OLAP = OnLine Analytical Processing
OLTP = OnLine Transaction Processing
OMS = Oracle Management Service
OPATCH = Oracle PATCHing utility
OS = Operating System
OSCP = Oracle Storage Compatibility Program
OSW = OS Watcher
OSWFW = OS Watcher For Windows
OTN = Oracle Technology Network
OUI = Oracle Universal Installer
PB = PetaByte
PFILE = Parameter FILE
PGA = Program Global Area
PID = Proces ID
PL/SQL = Procedural Language/Structured Query Language
POC = Proof Of Concept
PROCWATCHER = PROCess WATCHER
PST = Partnership Status Table
PSU = Patch Set Update
PX = Parrallel eXecution
RAC = Real Application Cluster
RAID = Redundant Array of Independent Disks
RAM = Random Access Memory
RBAL = ReBALance process
RCA = Root Cause Analysis
RDA = Remote Diagnostic Agent
RDBMS = Relational DataBase Management System
RHEL = RedHat Enterprise Linux
RM = Resource Manager
RMAN = Recovery MANager
RPM = Resource Package Manager
SAN = Storage Area Network
SAS = Serial Attached SCSI
SATA = Serial Advanced Technology Attachment
SCAN = Single Client Access Name
SCSI = Small Computer System Interface
SGA = System Global Area
SLA = Service Level Agreement
SMF = Service Management Facility
SPFILE = Server Parameter FILE
SQL = Structured Query Language
SRDF = EMC Symmetrix Remote Data Facility
SSD = Solid State Disk
SVCTM = average SerViCe TiMe
SVM = Solaris Volume Manager
TB = TeraByte
TCP = Transmission Control Protocol
TDE = Transparent Data Encryption
TKPROF = Transient Kernel PROFile
TNS = Transparent Network Substrate
UDEV = Unix DEVice manager
UDP = User Datagram Protocol
UFG = Umbilicus ForeGround process
UFS = User FileSystem
VBG = Volume BackGround process
VCS = Veritas Cluster Server
VD = Voting Disk
VDBG = Volume Driver BackGround process
VIP = Virtual Internet Portocol
VLDB = Very Large DataBase
VM = Virtual Machine
VMB = Volume Membership BackGround process
VxFS = Veritas File System
VxVM = Veritas Volume Manager
XDB = XML DataBase

DBMS_STATS收集子分区表导致library cache lock等待

有客户反馈一个问题,业务中有一张表无论是查询还是dml操作都非常慢,让我们介入分析
数据库版本
打上了170814比较新的psu

[oracle@xffdb1 ~]$ $ORACLE_HOME/OPatch/opatch lsinventory
Oracle 中间补丁程序安装程序版本 11.2.0.3.15
版权所有 (c) 2018, Oracle Corporation。保留所有权利。
Oracle Home       : /u01/app/oracle/product/11.2.0/db_1
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/11.2.0/db_1/oraInst.loc
OPatch version    : 11.2.0.3.15
OUI version       : 11.2.0.4.0
Log file location : /u01/app/oracle/product/11.2.0/db_1/cfgtoollogs/opatch/opatch2018-01-06_16-20-22下午_1.log
--------------------------------------------------------------------------------
Local Machine Information::
Hostname: xffdb1
ARU platform id: 226
ARU platform description:: Linux x86-64
已安装的顶级产品 (1):
Oracle Database 11g                                                  11.2.0.4.0
此 Oracle 主目录中已安装 1 个产品。
中间补丁程序 (2) :
Patch  26609929     : applied on Fri Sep 15 14:28:51 CST 2017
Unique Patch ID:  21482966
Patch description:  "OCW Patch Set Update : 11.2.0.4.170814 (26609929)"
Patch  26609445     : applied on Fri Sep 15 14:28:04 CST 2017
Unique Patch ID:  21482382
Patch description:  "Database Patch Set Update : 11.2.0.4.170814 (26609445)"

awr数据结果
20180203235626
20180203235701


这里显示大量的library cache lock等待
做systemstate分析

                    Resource Holder State
   LOCK: Handle=0x325d938e08    74: 74: is waiting for PIN: Handle=0x325d938e08
    Enq TX-00E0001B-000004D3   488: 488: is waiting for 74:
              Mutex 1e7fbf6c   322: 322: is waiting for 74:
    Enq TX-02850009-00000086   630: 630: is waiting for 74:
    Enq TX-02BC000A-0000009E    ??? Blocker
    Enq TX-0075000E-000010BE    ??? Blocker
    PIN: Handle=0x325d938e08    ??? Blocker
    Enq TX-049A000C-00000002    ??? Blocker
    Enq TX-034C001D-00000038    ??? Blocker
    Enq TX-052C0012-00000002   669: 669: is waiting for 74:
    Enq TX-04620004-00000003    ??? Blocker
    Enq TX-009B0020-00000DA6    ??? Blocker
              Mutex af4db5a8   242: 242: is waiting for 74:
    Enq TX-041E0002-00000005    ??? Blocker
              Mutex 663b253d   727: 727: is waiting for 74:
    Enq TX-04FA0008-00000003   124: 124: is waiting for 69:
    Enq TX-031D0010-0000005B    ??? Blocker
    Enq TX-02BB0004-000000A2    ??? Blocker
    Enq TX-0248001A-000000CD    69: 69: is waiting for Enq TX-02BC000A-0000009E
    Enq TX-03D70008-00000002    ??? Blocker
    Enq TX-02B1001D-00000081    ??? Blocker
    Enq TX-0423001A-00000003    ??? Blocker
    Enq TX-051A0007-00000003   506: 506: is waiting for 74:
              Mutex 2aceb8e9   602: 602: is waiting for 74:
              Mutex c6b2e0f4   196: 196: is waiting for 74:
    Enq TX-00A30020-0000055A    ??? Blocker
    Enq TX-00D70015-00000315    ??? Blocker
    Enq TX-03B30006-00000012    ??? Blocker
    Enq TX-008C0003-00001005    ??? Blocker
    Enq TX-05470014-00000006   219: 219: is waiting for 74:
    Enq TX-054E0018-00000006   673: 673: is waiting for 74:
              Mutex f28c06f8   279: 279: is waiting for 74:
    Enq TX-03D30012-00000002    ??? Blocker
    Enq TX-055C001B-00000005   333: 333: is waiting for Enq TX-01CF000B-00000B2F
              Mutex 2ff03da9   276: 276: is waiting for 74:
    Enq TX-030A000A-000000B6    ??? Blocker
    Enq TX-00530004-000023DF   212: 212: is waiting for 74:
    Enq TX-05550017-00000002   469: 469: is waiting for 74:
              Mutex 724ba5e3   177: 177: is waiting for 74:
    Enq TX-03C10007-00000015    ??? Blocker
    Enq TX-02E30006-00000079    ??? Blocker
                       IPC 6      6 Blocker
    Enq TX-0289000B-0000005F   449: 449: is waiting for 74:
    Enq TX-021E0002-000001DA    ??? Blocker
              Mutex a9fcc7b8   131: 131: is waiting for 74:
    Enq TX-0483001F-00000009    ??? Blocker
    Enq TX-00800010-00000BA0    ??? Blocker
              Mutex c0af249e   666: 666: is waiting for 74:
              Mutex 94389ed4    78: 78: is waiting for 74:
              Mutex 4ac40611   143: 143: is waiting for 74:
              Mutex 24c1c387   168: 168: is waiting for 74:
              Mutex 930636c9   752: 752: is waiting for 74:
    Enq TX-04660001-00000003    ??? Blocker
    Enq TX-01B6000A-00000022    ??? Blocker
    Enq TX-0166000F-000002C7    ??? Blocker
    Enq TX-02FF0008-000000A1    ??? Blocker
    Enq TX-01E10001-00000023    ??? Blocker
    Enq TX-0327000E-0000009E    ??? Blocker
    Enq TX-05120016-00000002   615: 615: is waiting for 74:
    Enq TX-039F001D-00000011    ??? Blocker
    Enq TX-01CF000B-00000B2F    ??? Blocker
    Enq TX-01130020-000001B6    ??? Blocker
    Enq TX-052A0021-00000002   635: 635: is waiting for 74:
              Mutex ebf68fee   579: 579: is waiting for 74:
    Enq TX-02350017-000000D6    ??? Blocker
    Enq TX-00BB0000-00000598    ??? Blocker
    Enq TX-0443000E-00000003    ??? Blocker
              Mutex e7bae014   344: 344: is waiting for 74:
              Mutex fe251793   616: 616: is waiting for 74:
              Mutex 750494ae   180: 180: is waiting for 74:
    Enq TX-01230010-000001C0    ??? Blocker
    Enq TX-0543001C-00000003   119: 119: is waiting for 124:
    Enq TX-04E60014-00000005    ??? Blocker
    Enq TX-00F1000A-000002AB    ??? Blocker
              Mutex aa83fbd7   651: 651: is waiting for 74:
    Enq TX-058A0002-00000003    ??? Blocker
    Enq TX-03B00016-0000000F    ??? Blocker
              Mutex 26e065a4   150: 150: is waiting for 74:
    Enq TX-0219001B-000001BE    ??? Blocker
    Enq TX-00B9001D-00000069    ??? Blocker
    Enq TX-01110020-00000198    ??? Blocker
    Enq TX-04F3001B-00000002    ??? Blocker
    Enq TX-04A60015-00000002    ??? Blocker
    Enq TX-02D30015-0000008A    ??? Blocker
    Enq TX-01540018-000001B7    ??? Blocker
    Enq TX-02240019-0000001C    ??? Blocker
              Mutex e595002d   502: 502: is waiting for 74:
              Mutex 1661a4cb   732: 732: is waiting for 74:
    Enq TX-00320010-00003A12    ??? Blocker
    Enq TX-020F001F-00000027    ??? Blocker
    Enq TX-0387000A-0000004D   515: 515: is waiting for 74:
    Enq TX-01260014-0000012D    ??? Blocker
    Enq TX-02EE001D-0000005F    ??? Blocker
    Enq TX-004A0001-00000882   603: 603: is waiting for Enq TX-02240019-0000001C
              Mutex 4a18a781   258: 258: is waiting for 74:
   LOCK: Handle=0x325aa4c428   643: 643: is waiting for 74:
    Enq TX-037A0002-0000006E   118: 118: is waiting for 74:
    Enq TX-02460020-0000006D   493: 493: is waiting for Enq TX-00F1000A-000002AB
PROCESS 74: J000
  ----------------------------------------
  SO: 0x31a2b2e440, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0x31a2b2e440, name=process, file=ksu.h LINE:12721, pg=0
  (process) Oracle pid:74, ser:153, calls cur/top: 0x305d24c378/0x305f3f5eb0
            flags : (0x0) -
            flags2: (0x10),  flags3: (0x10)
            intr error: 0, call error: 0, sess error: 0, txn error 0
            intr queue: empty
    ksudlp FALSE at location: 0
  (post info) last post received: 0 0 80
              last post received-location: kji.h LINE:3691 ID:kjata: wake up enqueue owner
              last process to post me: 0x3182a6fca0 1 6
              last post sent: 0 0 26
              last post sent-location: ksa2.h LINE:285 ID:ksasnd
              last process posted by me: 0x3182a75038 2 6
    (latch info) wait_event=0 bits=0x0
    Process Group: DEFAULT, pseudo proc: 0x3162ddbe90
    O/S info: user: oracle, term: UNKNOWN, ospid: 103036
    OSD pid info: Unix process pid: 103036, image: oracle@xffdb1 (J000)
    Short stack dump:
      ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10
      <-skgpwwait()+178<-ksliwat()+2046<-kslwaitctx()+163<-kjusuc()+3400
      <-ksipgetctxi()+1759<-kqlmPin()+2943<-kqlmClusterLock()+237<-kglpnal()+4072
      <-kglpin()+1381<-qostobkglcrt1()+640<-qostobkglcrt()+255<-qospsts()+1639
      <-spefcmpa()+196<-spefmccallstd()+235<-pextproc()+41<-peftrusted()+150
      <-psdexsp()+255<-rpiswu2()+1776<-psdextp()+700<-pefccal()+726<-pefcal()+224
      <-pevm_FCAL()+169<-pfrinstr_FCAL()+75<-pfrrun_no_tool()+63<-pfrrun()+627
      <-plsql_run()+649<-peidxr_run()+263<-peidxexe()+79<-kkxdexe()+338
      <-kkxmpexe()+241<-kgmexwi()+605<-kgmexec()+2193<-evapls()+813
      <-evaopn2()+808<-kkxmexcs()+133<-opiexe()+20880<-kpoal8()+2380
      <-opiodr()+917<-kpoodr()+1401<-upirtrc()+2436<-kpurcsc()+98
      <-kpuexec()+10790<-OCIStmtExecute()+39<-jslvec_execcb1()+672
      <-jslvswu()+56<-jslve_execute0()+2257<-jslve_execute()+332
      <-rpiswu2()+1776<-kkjex1e()+379<-kkjsexe()+711<-kkjrdp()+694
      <-opirip()+958<-opidrv()+603<-sou2o()+103<-opimai_real()+250
      <-ssthrdmain()+265<-main()+201<-__libc_start_main()+253
    SO: 0x31c382f010, type: 4, owner: 0x31a2b2e440, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0x31a2b2e440, name=session, file=ksu.h LINE:12729, pg=0
    (session) sid: 6001 ser: 679 trans: 0x311e36fa28, creator: 0x31a2b2e440
              flags: (0x8010041) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
              flags2: (0x40009) -/-/INC
              DID: , short-term DID:
              txn branch: (nil)
              edition#: 100              oct: 170, prv: 0, sql: 0x323de178e8, psql: 0x32b5d0fc98, user: 0/SYS
    ksuxds FALSE at location: 0
    service name: SYS$USERS
    client details:
      O/S info: user: oracle, term: UNKNOWN, ospid: 103036
      machine: xffdb1 program: oracle@xffdb1 (J000)
      application name: DBMS_SCHEDULER, hash value=2478762354
      action name: ORA$AT_OS_OPT_SY_1594, hash value=1524069073
    Current Wait Stack:
     0: waiting for 'library cache pin'
        handle address=0x325d938e08, pin address=0x325baec3e0, 100*mode+namespace=0x850fe00010003
        wait_id=46488 seq_num=46502 snap_id=1
        wait times: snap=6 min 2 sec, exc=6 min 2 sec, total=6 min 2 sec
        wait times: max=15 min 0 sec, heur=6 min 2 sec
        wait counts: calls=727 os=727
        in_wait=1 iflags=0x15a2
    There are 2136 sessions blocked by this session.

发现主要是74号进程阻塞了其他的
做hanganalyze分析

-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (xffdb.xffdbsv1)
                   os id: 82310
              process id: 224, oracle@xffdb1
              session id: 4
        session serial #: 12237
    }
    is waiting for 'enq: TX - row lock contention' with wait info:
    {
                      p1: 'name|mode'=0x54580006
                      p2: 'usn<<16 | slot'=0x21e0002
                      p3: 'sequence'=0x1da
            time in wait: 3 min 48 sec
           timeout after: never
                 wait id: 1260
                blocking: 0 sessions
             current sql: UPDATE t_xifenfei subpartition(p201801_s32)
                            SET INDICATORCODE   = :1,
                                LOWERLIMIT      = :2,
                                UPPERLIMIT      = :3,
                                AVGUNITPRICE    = :4,
                                TRADEITEMNAME   = :5,
             short stack: ………………
            wait history:
              * time between current wait and wait #1: 0.000220 sec
              1.       event: 'gc cr block 2-way'
                 time waited: 0.000341 sec
                     wait id: 1259            p1: ''=0x6
                                              p2: ''=0x3590
                                              p3: ''=0x44b
              * time between wait #1 and #2: 0.000284 sec
              2.       event: 'gc current block 3-way'
                 time waited: 0.000663 sec
                     wait id: 1258            p1: ''=0xa
                                              p2: ''=0x3ad02
                                              p3: ''=0x2000001
              * time between wait #2 and #3: 0.000383 sec
              3.       event: 'gc cr block 2-way'
                 time waited: 0.000288 sec
                     wait id: 1257            p1: ''=0x5
                                              p2: ''=0x2e20
                                              p3: ''=0x957
    }
    and is blocked by
 => Oracle session identified by:
    {
                instance: 3 (xffdb.xffdbsv3)
                   os id: 39472
              process id: 587, oracle@xffdb3
              session id: 2215
        session serial #: 8213
    }
    which is waiting for 'library cache lock' with wait info:
    {
                      p1: 'handle address'=0x327f5ecea0
                      p2: 'lock address'=0x327a5ab660
                      p3: '100*mode+namespace'=0x850fe00010002
            time in wait: 3 min 43 sec
           timeout after: 11 min 16 sec
                 wait id: 2343
                blocking: 3 sessions
             current sql: SELECT COUNT(*) COUNT
                          FROM t_xifenfei subpartition(p201801_s32)
                         WHERE TRADEITEMID = '679EA6DE428F414D014B3D5EC8DE5E32'
             short stack:  ………………
                 wait history:
              * time between current wait and wait #1: 0.000908 sec
              1.       event: 'SQL*Net message from client'
                 time waited: 0.001074 sec
                     wait id: 2342            p1: 'driver id'=0x54435000
                                              p2: '#bytes'=0x1
              * time between wait #1 and #2: 0.000033 sec
              2.       event: 'SQL*Net message to client'
                 time waited: 0.000003 sec
                     wait id: 2341            p1: 'driver id'=0x54435000
                                              p2: '#bytes'=0x1
              * time between wait #2 and #3: 0.000069 sec
              3.       event: 'SQL*Net message from client'
                 time waited: 0.001232 sec
                     wait id: 2340            p1: 'driver id'=0x54435000
                                              p2: '#bytes'=0x1
    }
    and is blocked by
 => Oracle session identified by:
    {
                instance: 1 (xffdb.xffdbsv1)
                   os id: 103036
              process id: 74, oracle@xffdb1 (J000)
              session id: 6001
        session serial #: 679
    }
    which is waiting for 'library cache pin' with wait info:
    {
                      p1: 'handle address'=0x325d938e08
                      p2: 'pin address'=0x325baec3e0
                      p3: '100*mode+namespace'=0x850fe00010003
            time in wait: 3 min 47 sec
           timeout after: 11 min 12 sec
                 wait id: 46488
                blocking: 2099 sessions
             current sql: call dbms_stats.gather_database_stats_job_proc (  )
             short stack: …………
            wait history:
              * time between current wait and wait #1: 0.000038 sec
              1.       event: 'library cache lock'
                 time waited: 0.012353 sec
                     wait id: 46487           p1: 'handle address'=0x325d938e08
                                              p2: 'lock address'=0x3259aee680
                                              p3: '100*mode+namespace'=0x850fe00010003
              * time between wait #1 and #2: 0.002509 sec
              2.       event: 'enq: IV -  contention'
                 time waited: 0.001079 sec
                     wait id: 46486           p1: 'type|mode'=0x49560005
                                              p2: 'id1'=0x53594e43
                                              p3: 'id2'=0x15
              * time between wait #2 and #3: 0.000085 sec
              3.       event: 'enq: IV -  contention'
                 time waited: 0.001817 sec
                     wait id: 46485           p1: 'type|mode'=0x49560005
                                              p2: 'id1'=0x4c4f434b
                                              p3: 'id2'=0x15
    }
Chain 1 Signature: 'library cache pin'<='library cache lock'<='enq: TX - row lock contention'
Chain 1 Signature Hash: 0xa08ff7bf

到这里基本上可以确定是由于自动任务收集统计信息导致系统出现大量的library cache lock和library cache pin,另外可以确定大部分被阻塞在library cache 相关的select和dml语句都集中在t_xifenfei这个子分区表中,通过查询mos,发现相关bug:Bug 19790972 – “library cache lock” waits due to DBMS_STATS gather of stats for a subpartition
20180204002507


osw配置私网监控(oswprvtnet)

OS Watcher Black Box (OSWbb) 是一种 UNIX shell 脚本的集合,主要用于收集和归档操作系统和网络的度量,以便为诊断性能问题提供支持。OSWbb 作为服务器上的一组后台进程来运行,定期收集OS 数据,其调用 Unix 实用程序,如 vmstat、netstat 和 iostat 等。在最近的gi psu中已经把osw包含在tfa中了,但是默认不启用私网的监控,如果需要可以进行人工配置
OSWbb 使用 traceroute 命令获取这些私有网络的状态。每个操作系统对 traceroute 命令使用的参数都略有不同。基本 oswbb 目录中的示例Exampleprivate.net 文件中包含每个操作系统使用的语法。因此,这将导致在 UNIX 平台中的输出显示有所不同。OSWbb 按指定间隔运行 private.net 文件,并在归档目录下的 oswprvtnet 子目录中存储数据。数据存储在每小时归档的文件中。文件中的每个条目都包含一个时间戳,前缀为 ***,嵌入在顶部输出中。
tfa中含osw

[root@xifenfei01 ~]# ps -ef|grep osw
grid      17673      1  0 Jan30 ?        00:00:07 /bin/sh ./OSWatcher.sh 30 48 NONE /u01/app/grid/tfa/repository/suptools/xifenfei01/oswbb/grid/archive
grid      17841  17673  0 Jan30 ?        00:00:01 /bin/sh ./OSWatcherFM.sh 48 /u01/app/grid/tfa/repository/suptools/xifenfei01/oswbb/grid/archive
root      64159  64120  0 00:30 pts/0    00:00:00 grep osw

测试prvtnet.sh

[root@xifenfei01 oswbb]# cd /u01/app/grid/tfa/repository/suptools/xifenfei01/oswbb/grid/oswbb
[root@xifenfei01 oswbb]# more prvtnet.sh
echo "zzz ***"`date`
traceroute -r -F xifenfei01-priv
traceroute -r -F xifenfei02-priv
######################################################################
# DO NOT DELETE THE FOLLOWING LINE!!!!!!!!!!!!!!!!!!!!!
######################################################################
rm locks/lock.file
[root@xifenfei01 oswbb]# ./prvtnet.sh
zzz ***Tue Jan 30 18:02:49 CST 2018
traceroute to xifenfei01-priv (192.168.102.11), 30 hops max, 60 byte packets
 1  xifenfei01-priv (192.168.102.11)  0.010 ms  0.003 ms  0.002 ms
traceroute to xifenfei02-priv (192.168.102.14), 30 hops max, 60 byte packets
 1  xifenfei02-priv (192.168.102.14)  0.081 ms  0.076 ms  0.087 ms
rm: cannot remove `locks/lock.file': No such file or directory

配置private.net正式收集私网信息

[root@xifenfei01 oswbb]# find . -name OSWatcher.sh | xargs grep -i "private"
# Run traceroute for private networks if file private.net exists
if [ -x private.net ]; then
   ./private.net >> $OSWBB_ARCHIVE_DEST/oswprvtnet/${hostn}_prvtnet_$hour 2>&1 &

这里重要的是:
./OSWatcher.sh:# Run traceroute for private networks if file private.net exists
表示,在OSWatcher.sh中调用的文件名称是private.net
将prvtnet.sh复制为供OSWatcher.sh脚本调用的文件名称

[root@xifenfei01 oswbb]# cp prvtnet.sh private.net
[root@xifenfei01 oswbb]# chown grid:oinstall prvtnet.net
[root@xifenfei01 oswbb]# chmod +x prvtnet.net

关注内容

示例 1: 接口处于启动和响应状态:
traceroute to X.X.X.X, (X.X.X.X) 30 hops max, 1492 byte packets
1 X.X.X.X 1.015 ms 0.766 ms 0.755 ms
示例 2: 目标接口不是在直接连接的网络上,所以需要验证地址
         是否正确或其插入的交换机是否位于相同的 VLAN 上(或其他问题):
traceroute to X.X.X.X, (X.X.X.X) 30 hops max, 40 byte packets
traceroute: host X.X.X.X is not on a directly-attached network
示例 3: 网络无法访问:
traceroute to X.X.X.X, (X.X.X.X) 30 hops max, 40 byte packets
Network is unreachable
示例 4: 网络无法访问:
xifenfei01_prvtnet_18.01.30.2000.dat-zzz ***Tue Jan 30 20:27:43 CST 2018
xifenfei01_prvtnet_18.01.30.2000.dat-traceroute to xifenfei01-priv (192.168.102.11), 30 hops max, 60 byte packets
xifenfei01_prvtnet_18.01.30.2000.dat- 1  xifenfei01-priv (192.168.102.11)  0.014 ms  0.004 ms  0.005 ms
xifenfei01_prvtnet_18.01.30.2000.dat-traceroute to xifenfei02-priv (192.168.102.14), 30 hops max, 60 byte packets
xifenfei01_prvtnet_18.01.30.2000.dat- 1  * * *
xifenfei01_prvtnet_18.01.30.2000.dat- 2  * * *
xifenfei01_prvtnet_18.01.30.2000.dat- 3  * * *
xifenfei01_prvtnet_18.01.30.2000.dat- 4  * * *
xifenfei01_prvtnet_18.01.30.2000.dat- 5  * * *
xifenfei01_prvtnet_18.01.30.2000.dat- 6  * * *
xifenfei01_prvtnet_18.01.30.2000.dat- 7  * * *
xifenfei01_prvtnet_18.01.30.2000.dat- 8  * * *
xifenfei01_prvtnet_18.01.30.2000.dat- 9  * * *
xifenfei01_prvtnet_18.01.30.2000.dat-10  * * *
xifenfei01_prvtnet_18.01.30.2000.dat-11  * * *
xifenfei01_prvtnet_18.01.30.2000.dat-12  * * *
xifenfei01_prvtnet_18.01.30.2000.dat-13  * * *
xifenfei01_prvtnet_18.01.30.2000.dat-14  * * *
xifenfei01_prvtnet_18.01.30.2000.dat-15  * * *
xifenfei01_prvtnet_18.01.30.2000.dat-16  * * *
xifenfei01_prvtnet_18.01.30.2000.dat-17  * * *
xifenfei01_prvtnet_18.01.30.2000.dat-18  * * *
xifenfei01_prvtnet_18.01.30.2000.dat-19  * * *
xifenfei01_prvtnet_18.01.30.2000.dat-20  * * *
xifenfei01_prvtnet_18.01.30.2000.dat-21  * * *
xifenfei01_prvtnet_18.01.30.2000.dat-22  * * *
xifenfei01_prvtnet_18.01.30.2000.dat-23  * * *
xifenfei01_prvtnet_18.01.30.2000.dat-24  * * *
xifenfei01_prvtnet_18.01.30.2000.dat-25  * * *
xifenfei01_prvtnet_18.01.30.2000.dat-26  * * *
xifenfei01_prvtnet_18.01.30.2000.dat-27  * * *
xifenfei01_prvtnet_18.01.30.2000.dat-28  * * *
xifenfei01_prvtnet_18.01.30.2000.dat-29  * * *
xifenfei01_prvtnet_18.01.30.2000.dat-30  * * *
xifenfei01_prvtnet_18.01.30.2000.dat:zzz ***Warning. Traceroute response is spanning snapshot intervals.

reliable message等待

在一次给客户的rac poc测试中,发现大量reliable message等待,严重影响性能
数据库版本
11.2.0.4+最新psu(180116)

SYS@xffdb1>select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
PL/SQL Release 11.2.0.4.0 - Production
CORE    11.2.0.4.0      Production
TNS for Linux: Version 11.2.0.4.0 - Production
NLSRTL Version 11.2.0.4.0 - Production
Elapsed: 00:00:00.00
SYS@xffdb1>exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
[oracle@xifenfei01 ~]$ opatch lspatches
26609929;OCW Patch Set Update : 11.2.0.4.170814 (26609929)
26925576;Database Patch Set Update : 11.2.0.4.180116 (26925576)
OPatch succeeded.

监控em发现大量other等待
3


查询发现大量reliable message等

SYS@xffdb1>@event
         INST_ID USERNAME                       EVENT                                                      COUNT(*)
---------------- ------------------------------ -------------------------------------------------- ----------------
               1 SOE                            reliable message                                                 16
               1 SOE                            gc current request                                                4
               1 SOE                            gc cr request                                                     5
               1 SOE                            db file sequential read                                          17
               1 SYS                            db file scattered read                                            1
               2 SOE                            library cache: mutex X                                            2
               2 SOE                            latch: ges resource hash list                                     2
               2 SOE                            gc current request                                                1
               2 SOE                            gc cr request                                                     4
               2 SOE                            db file sequential read                                          12
               2                                db file parallel write                                            1
               2 SOE                            SQL*Net message to client                                         1
12 rows selected.

进一步查询发现大量Result Cache: Channel

SYS@xffdb1>SELECT CHANNEL,
  2    SUM(wait_count) sum_wait_count
  3  FROM GV$CHANNEL_WAITS
  4  GROUP BY CHANNEL
  5  ORDER BY SUM(wait_count) DESC;
CHANNEL                                                            SUM_WAIT_COUNT
---------------------------------------------------------------- ----------------
Result Cache: Channel                                                   379367698
RBR channel                                                                128968
kxfp control signal channel                                                115091
MMON remote action broadcast channel                                         4646
obj broadcast channel                                                         395
LCK0 ksbxic channel                                                            17
parameters to cluster db instances - broadcast channel                          4
service operations - broadcast channel                                          3
kill job broadcast - broadcast channel                                          2

查看result_cache_max_size参数

SYS@xffdb1>show parameter result_cache_max_size;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
result_cache_max_size                big integer 335552K

调整result_cache_max_size参数

SYS@xffdb1>alter system set result_cache_max_size=0;
System altered.
SYS@xffdb1>show parameter result_cache_max_size;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
result_cache_max_size                big integer 0

重启数据库,再次测试,发现相关等待消失,系统也更加稳定

SYS@xffdb1>@event
         INST_ID USERNAME                       EVENT                                                      COUNT(*)
---------------- ------------------------------ -------------------------------------------------- ----------------
               1 SOE                            library cache: mutex X                                            1
               1 SOE                            gc current request                                                3
               1 DBSNMP                         gc cr request                                                     1
               1 SOE                            gc cr request                                                     7
               1 SOE                            db file sequential read                                          18
               1                                db file parallel write                                            1
               2 SOE                            library cache: mutex X                                            2
               2 SOE                            latch free                                                        1
               2 SOE                            gc current request                                                4
               2 SOE                            gc cr request                                                     4
               2 SOE                            db file sequential read                                          12
               2                                db file parallel write                                            3
12 rows selected.

4


参考:Very High Waits for ‘reliable message’ After Upgrade to 11.2.0.4 When Using Result Cache (Doc ID 1951729.1)

重建oraInventory解决ORA-20001

数据库启动报ORA-20001: Latest xml inventory is not loaded into table错误

Completed: ALTER DATABASE OPEN
2018-01-23T23:46:27.924841+08:00
CJQ0 started with pid=54, OS id=6653
2018-01-23T23:46:31.705550+08:00
Unable to obtain current patch information due to error: 20001,
  ORA-20001: Latest xml inventory is not loaded into table
ORA-06512: at "SYS.DBMS_QOPATCH", line 777
ORA-06512: at "SYS.DBMS_QOPATCH", line 864
ORA-06512: at "SYS.DBMS_QOPATCH", line 2222
ORA-06512: at "SYS.DBMS_QOPATCH", line 740
ORA-06512: at "SYS.DBMS_QOPATCH", line 2247
===========================================================
Dumping current patch information
===========================================================
Unable to obtain current patch information due to error: 20001
===========================================================

查询相关sql报错

SYS@xffdb>select * from OPATCH_XML_INV ;
ERROR:
ORA-29913: error in executing ODCIEXTTABLEFETCH callout
ORA-29400: data cartridge error
KUP-04004: error while reading file /u03/app/oracle/product/12.2.0.1/dbhome/QOpatch/qopiprep.bat
no rows selected
Elapsed: 00:00:00.58
SYS@xffdb>select xmltransform(dbms_qopatch.get_opatch_lsinventory(), dbms_qopatch.GET_OPATCH_XSLT()) from dual ;
ERROR:
ORA-20001: Latest xml inventory is not loaded into table
ORA-06512: at "SYS.DBMS_QOPATCH", line 777
ORA-06512: at "SYS.DBMS_QOPATCH", line 864
ORA-06512: at "SYS.DBMS_QOPATCH", line 2222
ORA-06512: at "SYS.DBMS_QOPATCH", line 740
ORA-06512: at "SYS.DBMS_QOPATCH", line 2247
no rows selected
Elapsed: 00:00:00.63

datapatch -prereq报错

[oracle@xifenfei ~]$ $ORACLE_HOME/OPatch/datapatch -prereq
SQL Patching tool version 12.2.0.1.0 Production on Tue Jan 23 18:11:32 2018
Copyright (c) 2012, 2017, Oracle.  All rights reserved.
Connecting to database...OK
Note:  Datapatch will only apply or rollback SQL fixes for PDBs
       that are in an open state, no patches will be applied to closed PDBs.
       Please refer to Note: Datapatch: Database 12c Post Patch SQL Automation
       (Doc ID 1585822.1)
Queryable inventory could not determine the current opatch status.
Execute 'select dbms_sqlpatch.verify_queryable_inventory from dual'
and/or check the invocation log
/u03/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_4909_2018_01_23_18_11_32/sqlpatch_invocation.log
for the complete error.
Prereq check failed, exiting without installing any patches.
Please refer to MOS Note 1609718.1 and/or the invocation log
/u03/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_4909_2018_01_23_18_11_32/sqlpatch_invocation.log
for information on how to resolve the above errors.
SQL Patching tool complete on Tue Jan 23 18:11:45 2018

分析qopiprep.bat文件

cd $ORACLE_HOME
PATH=/bin:/usr/bin
export PATH
# sed tried to convert from one encoding to other in presence of LC_ALL
# or LANG settings. Since opatch returning UTF-8 based encoding we do not
# need such a conversion. So safely skip it
LANG=en_US.UTF-8
export LANG
LC_ALL=''
export LC_ALL
# Option: "-retry 0" avoids retries in case of locked inventory.
# Option: "-invPtrLoc" is required for non-central-inventory
# locations. $OPATCH_PREP_LSINV_OPTS which may set by users
# in the environment to configure special OPatch options
# ("-jdk" is another good candidate that may require configuration!).
# Option: "-all" gives information on all Oracle Homes
# installed in the central inventory.  With that information, the
# patches of non-RDBMS homes could be fetched.
DBSID=$ORACLE_SID
ORABASE=`$ORACLE_HOME/bin/orabasehome`
rm -rf $ORABASE/rdbms/log/xml_file_$DBSID.xml
$ORACLE_HOME/OPatch/opatch lsinventory -xml  $ORABASE/rdbms/log/xml_file_$DBSID.xml -retry 0 -invPtrLoc $ORACLE_HOME/oraInst.loc >> $ORABASE/rdbms/log/stout_$DBSID.txt
cat $ORABASE/rdbms/log/xml_file_$DBSID.xml | sed 's/^ *//' | tr '\n' ' '
echo "UIJSVTBOEIZBEFFQBL"
rm $ORABASE/rdbms/log/xml_file_$DBSID.xml
rm $ORABASE/rdbms/log/stout_$DBSID.txt

这里主要是$ORACLE_HOME/OPatch/opatch lsinventory可能异常,测试该功能
qopatch_log日志

[oracle@xifenfei ~]$ tail -f /u03/app/oracle/product/12.2.0.1/dbhome/QOpatch/qopatch_log.log
 LOG file opened at 01/23/18 18:48:55
KUP-05007:   Warning: Intra source concurrency disabled because the preprocessor option is being used.
Field Definitions for table OPATCH_XML_INV
  Record format DELIMITED BY NEWLINE
  Data in file has same endianness as the platform
  Reject rows with all null fields
  Fields in Data Source:
    XML_INVENTORY                   CHAR (100000000)
      Terminated by "UIJSVTBOEIZBEFFQBL"
      Trim whitespace same as SQL Loader
KUP-04004: error while reading file /u03/app/oracle/product/12.2.0.1/dbhome/QOpatch/qopiprep.bat
KUP-04017: OS message: Error 0
KUP-04017: OS message: LsInventorySession failed: RawInventory gets null OracleHomeInfo
cat: /u03/app/oracle/product/12.2.0.1/dbhome/rdbms/log/xml_file_xffdb.xml: No such file or direc
KUP-04118: operation "pipe read", location "skudmir:2"

opatch lsinventory验证

[oracle@xifenfei ~]$ /u03/app/oracle/product/12.2.0.1/dbhome/OPatch/opatch lsinventory
Oracle Interim Patch Installer version 12.2.0.1.6
Copyright (c) 2018, Oracle Corporation.  All rights reserved.
Oracle Home       : /u03/app/oracle/product/12.2.0.1/dbhome
Central Inventory : /u01/app/oraInventory
   from           : /u03/app/oracle/product/12.2.0.1/dbhome/oraInst.loc
OPatch version    : 12.2.0.1.6
OUI version       : 12.2.0.1.4
Log file location : /u03/app/oracle/product/12.2.0.1/dbhome/cfgtoollogs/opatch/opatch2018-01-23_23-50-29PM_1.log
List of Homes on this system:
  Home name= OraDB12Home1, Location= "/u01/app/oracle/product/12.2.0/dbhome_1"
LsInventorySession failed: RawInventory gets null OracleHomeInfo
OPatch failed with error code 73

现在到这一步,可以确定判断opatch lsinventory运行异常,导致DBMS_QOPATCH无法正常工作,而引起opatch异常的原因是由于RawInventory gets null OracleHomeInfo
分析inventory.xml 文件

[oracle@xifenfei ContentsXML]$ cat inventory.xml
<?xml version="1.0" standalone="yes" ?>
<!-- Copyright (c) 1999, 2016, Oracle and/or its affiliates.
All rights reserved. -->
<!-- Do not modify the contents of this file by hand. -->
<INVENTORY>
<VERSION_INFO>
   <SAVED_WITH>12.2.0.1.0</SAVED_WITH>
   <MINIMUM_VER>2.1.0.6.0</MINIMUM_VER>
</VERSION_INFO>
<HOME_LIST>
<HOME NAME="OraDB12Home1" LOC="/u01/app/oracle/product/12.2.0/dbhome_1" TYPE="O" IDX="1"/>
</HOME_LIST>
<COMPOSITEHOME_LIST>
</COMPOSITEHOME_LIST>
</INVENTORY>

因为该机器上安装过三个版本的oracle,12.2 beta,11.2.0.4,12.2.0.1,现在oracle home只有第一个beta的,因此这个部分肯定异常,导致后面的12.2正式版无法获取到oraclehome
重建oraInventory

[oracle@xifenfei app]$ cd $ORACLE_HOME/oui/bin
[oracle@xifenfei bin]$ ./runInstaller -silent -ignoreSysPrereqs -attachHome ORACLE_HOME="/u01/app/oracle/product/12.2.0/dbhome_1" ORACLE_HOME_NAME="OraDB12betaHome1"
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB.   Actual 3935 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
'AttachHome' was successful.
[oracle@xifenfei bin]$ ./runInstaller -silent -ignoreSysPrereqs -attachHome ORACLE_HOME="/u02/app/oracle/product/11.2.0.4/dbhome" ORACLE_HOME_NAME="OraDb11g_home1"
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB.   Actual 3935 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
'AttachHome' was successful.
[oracle@xifenfei bin]$ ./runInstaller -silent -ignoreSysPrereqs -attachHome ORACLE_HOME="/u03/app/oracle/product/12.2.0.1/dbhome" ORACLE_HOME_NAME="OraDb122g_home1"
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB.   Actual 3935 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
'AttachHome' was successful.
--验证inventory.xml 文件
[oracle@xifenfei ContentsXML]$ cat inventory.xml
<?xml version="1.0" standalone="yes" ?>
<!-- Copyright (c) 1999, 2018, Oracle and/or its affiliates.
All rights reserved. -->
<!-- Do not modify the contents of this file by hand. -->
<INVENTORY>
<VERSION_INFO>
   <SAVED_WITH>12.2.0.1.4</SAVED_WITH>
   <MINIMUM_VER>2.1.0.6.0</MINIMUM_VER>
</VERSION_INFO>
<HOME_LIST>
<HOME NAME="OraDB12betaHome1" LOC="/u01/app/oracle/product/12.2.0/dbhome_1" TYPE="O" IDX="1"/>
<HOME NAME="OraDb11g_home1" LOC="/u02/app/oracle/product/11.2.0.4/dbhome" TYPE="O" IDX="2"/>
<HOME NAME="OraDb122g_home1" LOC="/u03/app/oracle/product/12.2.0.1/dbhome" TYPE="O" IDX="3"/>
</HOME_LIST>
<COMPOSITEHOME_LIST>
</COMPOSITEHOME_LIST>
</INVENTORY>

验证opatch lsinventory

[oracle@xifenfei bin]$ opatch lsinv
Oracle Interim Patch Installer version 12.2.0.1.6
Copyright (c) 2018, Oracle Corporation.  All rights reserved.
Oracle Home       : /u03/app/oracle/product/12.2.0.1/dbhome
Central Inventory : /u01/app/oraInventory
   from           : /u03/app/oracle/product/12.2.0.1/dbhome/oraInst.loc
OPatch version    : 12.2.0.1.6
OUI version       : 12.2.0.1.4
Log file location : /u03/app/oracle/product/12.2.0.1/dbhome/cfgtoollogs/opatch/opatch2018-01-24_00-19-55AM_1.log
Lsinventory Output file location : /u03/app/oracle/product/12.2.0.1/dbhome/cfgtoollogs/opatch/lsinv/lsinventory2018-01-24_00-19-55AM.txt
--------------------------------------------------------------------------------
Local Machine Information::
Hostname: xifenfei
ARU platform id: 226
ARU platform description:: Linux x86-64
Installed Top-level Products (1):
Oracle Database 12c                                                  12.2.0.1.0
There are 1 products installed in this Oracle Home.
There are no Interim patches installed in this Oracle Home.
--------------------------------------------------------------------------------
OPatch succeeded.

验证dbms_qopatch工作正常

[oracle@xifenfei ContentsXML]$ $ORACLE_HOME/OPatch/datapatch -prereq
SQL Patching tool version 12.2.0.1.0 Production on Wed Jan 24 00:21:48 2018
Copyright (c) 2012, 2017, Oracle.  All rights reserved.
Connecting to database...OK
Note:  Datapatch will only apply or rollback SQL fixes for PDBs
       that are in an open state, no patches will be applied to closed PDBs.
       Please refer to Note: Datapatch: Database 12c Post Patch SQL Automation
       (Doc ID 1585822.1)
Determining current state...done
Adding patches to installation queue and performing prereq checks...done
Installation queue:
  For the following PDBs: CDB$ROOT PDB$SEED
    Nothing to roll back
    Nothing to apply
SQL Patching tool complete on Wed Jan 24 00:21:55 2018
SYS@xffdb>select dbms_sqlpatch.verify_queryable_inventory from dual;
VERIFY_QUERYABLE_INVENTORY
--------------------------------------------------------------------------------------------------
OK
Elapsed: 00:00:01.03
SYS@xffdb>select xmltransform(dbms_qopatch.get_opatch_lsinventory(), dbms_qopatch.GET_OPATCH_XSLT()) from dual ;
XMLTRANSFORM(DBMS_QOPATCH.GET_OPATCH_LSINVENTORY(),DBMS_QOPATCH.GET_OPATCH_XSLT())
----------------------------------------------------------------------------------------------------------
Oracle Querayable Patch Interface 1.0
-----------------------------------------
Elapsed: 00:00:01.09
SYS@xffdb>

通过修复错误的oraInventory解决ORA-20001问题

18c新特性:Scalable Sequences(自适应序列)

作为18c的新特性,其实在oracle 12.2 引入了Scalable Sequences作为一个隐藏特性,为了改善在高并发系统中,特别是使用seq作为index,大量插入记录导致index的争用,oracle自己实现了以前我们需要人工去自定义seq的方式(instance+sid+seq的类似算法方式),该功能将在oracle 18c中正式推出,我这里带领大家先体会下
SCALE/NOSCALE
When SCALE is specified, a numeric offset is affixed to the beginning of the sequence. This offset if of the form iii||sss||, where,
iii denotes a three digit instance offset given by (instance_id % 100) + 100,
sss denotes a three digits session offset given by (session_id % 1000), and
|| is the concatenation operator

EXTEND/NOEXTEND
When EXTEND is specified with the SCALE keyword, the generated sequence values are all of length (x+y), where x is the length of the scalable offset (default 6), and y is the maximum number of digits in the sequence maxvalue/minvalue. Thus, for an ascending sequence with maxvalue 100 and SCALABLE EXTEND specified, the generated sequence values are of the form iii||sss||001, iii||sss||002, …,iii||sss||100
The default setting for the SCALE clause is NOEXTEND. With the NOEXTEND setting, the generated sequence values are at most as wide as the maximum number of digits in the sequence maxvalue/minvalue. This setting is useful for integration with existing applications where sequences are used to populate fixed width columns. On invocation of NEXTVAL on a sequence with SCALABLE NOEXTEND specified, a user error is thrown if the generated value requires more digits of representation than the sequence’s maxvalue/minvalue.

SCALE的算法就是(instance_id % 100)(_kqdsn_instance_digits) + 100||(session_id % 1000)(_kqdsn_cpu_digits)+seq(EXTEND/NOEXTEND确定是否固定宽度)

Scalable Sequences语法

CREATE | ALTER SEQUENCE [ schema. ]sequence
[ { INCREMENT BY | START WITH } integer
| { MAXVALUE integer | NOMAXVALUE }
| { MINVALUE integer | NOMINVALUE }
| { CYCLE | NOCYCLE }
| { CACHE integer | NOCACHE }
| { ORDER | NOORDER }
| { SCALE {EXTEND | NOEXTEND} | NOSCALE}
]

Scalable Sequences测试

[oracle@xifenfei ~]$ sqlplus / as sysdba
SQL*Plus: Release 12.2.0.1.0 Production on Tue Jan 23 02:33:49 2018
Copyright (c) 1982, 2016, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
BANNER                                                                               CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production              0
PL/SQL Release 12.2.0.1.0 - Production                                                    0
CORE    12.2.0.1.0      Production                                                        0
TNS for Linux: Version 12.2.0.1.0 - Production                                            0
NLSRTL Version 12.2.0.1.0 - Production                                                    0
SQL> set line 150
SQL> col "Paramete"  for a30
SQL> col "Session Value"  for a20
SQL> col "Instance Value" for a20
SQL> col "Is Default?" for a20
SQL> SELECT a.ksppinm "Parameter", b.ksppstvl "Session Value", c.ksppstvl "Instance Value",c.ksppstdf "Default?"
  2  FROM x$ksppi a, x$ksppcv b, x$ksppsv c WHERE a.indx = b.indx  AND a.indx = c.indx  AND lower(a.ksppinm)
  3  in ('_kqdsn_instance_digits', '_kqdsn_cpu_digits');
Parameter                                                Session Value        Instance Value       Default?
-------------------------------------------------------- -------------------- -------------------- ---------
_kqdsn_instance_digits                                   2                    2                    TRUE
_kqdsn_cpu_digits                                        3                    3                    TRUE
SQL> select instance_number from v$instance;
INSTANCE_NUMBER
---------------
              1
SQL> select sys_context('userenv','sid') from dual;
SYS_CONTEXT('USERENV','SID')
----------------------------------------------------------------------------
275
SQL> create sequence seq_xff start with 1 increment by 1 minvalue 1 maxvalue 100 scale extend;
Sequence created.
SQL> select seq_xff.nextval from dual;
   NEXTVAL
----------
 101275001

验证效果

--另外一个会话
[oracle@xifenfei ~]$ sqlplus / as sysdba
SQL*Plus: Release 12.2.0.1.0 Production on Tue Jan 23 02:45:14 2018
Copyright (c) 1982, 2016, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
SQL> select sys_context('userenv','sid') from dual;
SYS_CONTEXT('USERENV','SID')
---------------------------------------------------------
4
SQL> select instance_number from v$instance;
INSTANCE_NUMBER
---------------
              1
SQL> select seq_xff.nextval from dual;
   NEXTVAL
----------
 101004002

这里可以看出来seq的值是固定长度的.而且随着sid或者inst_id 不同而不同,从而实现减少大量数据集中在一个block而引起的各种争用
测试scale noextend

SQL> create sequence seq_xifenfei start with 1 increment by 1 minvalue 1 maxvalue 100 scale noextend;
Sequence created.
SQL> select seq_xifenfei.nextval from dual;
select seq_xifenfei.nextval from dual
       *
ERROR at line 1:
ORA-64603: NEXTVAL cannot be instantiated for seq_xifenfei. Widen the sequence
by 4 digits or alter sequence with SCALE EXTEND.
SQL> drop sequence seq_xifenfei;
Sequence dropped.
SQL> create sequence seq_xifenfei start with 1 increment by 1 minvalue 1 maxvalue 1000000 scale noextend;
Sequence created.
SQL>  select seq_xifenfei.nextval from dual;
   NEXTVAL
----------
   1010041

这里可以看出来scale noextend的长度是随着seq值的改变而改变,而且max值不能小于seq本身长度.