Oracle安全警示录:加错裸设备导致redo异常

最近一个朋友数据库异常了,咨询我,通过分析日志发现对方人员根本不懂aix中的裸设备和Oracle数据库然后就直接使用OEM创建新表空间,导致了数据库crash而且不能正常启动

Thread 1 advanced to log sequence 4395
  Current log# 1 seq# 4395 mem# 0: /dev/rorcl_redo01
Thu Jun 12 19:28:38 2014
/* OracleOEM */ CREATE SMALLFILE TABLESPACE "XIFENFEI" LOGGING DATAFILE '/dev/orcl_redo04' SIZE 2000M EXTENT MANAGEMENT
LOCAL SEGMENT SPACE MANAGEMENT  AUTO
ORA-1119 signalled during: /* OracleOEM */ CREATE SMALLFILE TABLESPACE "XIFENFEI" LOGGING DATAFILE '/dev/orcl_redo04'
SIZE 2000M EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT  AUTO ...
Thu Jun 12 19:36:23 2014
/* OracleOEM */ CREATE SMALLFILE TABLESPACE "XIFENFEI" LOGGING DATAFILE '/dev/orcl_redo03' SIZE 2000M EXTENT MANAGEMENT
LOCAL SEGMENT SPACE MANAGEMENT  AUTO
Thu Jun 12 19:43:56 2014
ORA-604 signalled during: /* OracleOEM */ CREATE SMALLFILE TABLESPACE "XIFENFEI" LOGGING DATAFILE '/dev/orcl_redo03'
SIZE 2000M EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT  AUTO ...
Thu Jun 12 19:48:11 2014
/* OracleOEM */ CREATE SMALLFILE TABLESPACE "XIFENFEI" LOGGING DATAFILE '/dev/rorcl_redo03' SIZE 2000M EXTENT
MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT  AUTO
Thu Jun 12 19:48:11 2014
ORA-1537 signalled during: /* OracleOEM */ CREATE SMALLFILE TABLESPACE "XIFENFEI" LOGGING DATAFILE '/dev/rorcl_redo03'
 SIZE 2000M EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT  AUTO ...
Thu Jun 12 19:48:20 2014
/* OracleOEM */ CREATE SMALLFILE TABLESPACE "XIFENFEI" LOGGING DATAFILE '/dev/rorcl_redo04' SIZE 2000M EXTENT
MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT  AUTO
ORA-1537 signalled during: /* OracleOEM */ CREATE SMALLFILE TABLESPACE "XIFENFEI" LOGGING DATAFILE '/dev/rorcl_redo04'
SIZE 2000M EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT  AUTO ...
Fri Jun 13 00:50:37 2014
Trace dumping is performing id=[cdmp_20140613005032]
Fri Jun 13 00:50:40 2014
Reconfiguration started (old inc 4, new inc 6)
List of nodes:
 0
 Global Resource Directory frozen
 * dead instance detected - domain 0 invalid = TRUE
…………
Fri Jun 13 00:50:40 2014
Beginning instance recovery of 1 threads
Reconfiguration complete
Fri Jun 13 00:50:41 2014
 parallel recovery started with 7 processes
Fri Jun 13 00:50:43 2014
Started redo scan
Fri Jun 13 00:50:43 2014
Errors in file /oracle/admin/orcl/bdump/orcl1_smon_213438.trc:
ORA-00316: log 3 of thread 2, type 0 in header is not log file
ORA-00312: online log 3 thread 2: '/dev/rorcl_redo03'
Fri Jun 13 00:50:43 2014
Errors in file /oracle/admin/orcl/bdump/orcl1_smon_213438.trc:
ORA-00316: log 3 of thread 2, type 0 in header is not log file
ORA-00312: online log 3 thread 2: '/dev/rorcl_redo03'
SMON: terminating instance due to error 316
Fri Jun 13 00:50:43 2014
Errors in file /oracle/admin/orcl/bdump/orcl1_lgwr_335980.trc:
ORA-00316: log  of thread , type  in header is not log file
Instance terminated by SMON, pid = 213438

从这里可以看出来,在使用OEM创建表空间的过程中犯了两个错误
1. 未分清楚aix的块设备和字符设备的命名方式
2. 对于2节点正在使用的current redo作为不适用设备当作未使用设备来创建新表空间
由于创建表空间的使用了错误的文件和错误的设备,导致2节点的当前redo(/dev/rorcl_redo03)被损坏(因为先读redo header,所以数据库中优先反馈出来的是ORA-00316: log of thread , type in header is not log file).从而导致数据库2节点先crash,然后节点1进行实例恢复,但是由于2节点的current redo已经损坏,导致实例恢复无法完成,从而两个节点都crash.因为是rac的一个节点的当前redo损坏,数据库无法正常.
如果有备份该数据库可以使用备份还原进行恢复,如果没有备份只能使用强制拉库的方法抢救数据.希望不要发生一个大的数据丢失悲剧
介绍这个案例希望给大家以警示:对数据库的裸设备操作请谨慎,不清楚切不可乱操作,否则后果严重

记录一次ORA-00600[kdxlin:psno out of range]/ORA-00600[3020]/ORA-00600[4000]/ORA-00600[4193]的数据库恢复

尝试recover database,遭遇ORA-00600[kdxlin:psno out of range]/ORA-00600[3020]/ORA-00354错误

Media Recovery Log
Recovery of Online Redo Log: Thread 1 Group 1 Seq 5645 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\PRODUCT\10.1.0\ORADATA\GTGS\REDO01.LOG
Mon Jun 09 15:36:10 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\bdump\gtgs_p001_9604.trc:
ORA-00600: internal error code, arguments: [kdxlin:psno out of range], [], [], [], [], [], [], []
Mon Jun 09 15:36:12 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\bdump\gtgs_p002_9592.trc:
ORA-00600: internal error code, arguments: [3020], [3], [23337], [12606249], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 3, block# 23337)
ORA-10564: tablespace SYSAUX
ORA-01110: data file 3: 'D:\ORACLE\PRODUCT\10.1.0\ORADATA\GTGS\SYSAUX01.DBF'
ORA-10560: block type 'FIRST LEVEL BITMAP BLOCK'
Mon Jun 09 15:36:12 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\bdump\gtgs_p001_9604.trc:
ORA-10562: Error occurred while applying redo to data block (file# 3, block# 20142)
ORA-10564: tablespace SYSAUX
ORA-01110: data file 3: 'D:\ORACLE\PRODUCT\10.1.0\ORADATA\GTGS\SYSAUX01.DBF'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 47841
ORA-00600: internal error code, arguments: [kdxlin:psno out of range], [], [], [], [], [], [], []
Mon Jun 09 15:36:13 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\bdump\gtgs_p002_9592.trc:
ORA-00600: internal error code, arguments: [3020], [3], [23337], [12606249], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 3, block# 23337)
ORA-10564: tablespace SYSAUX
ORA-01110: data file 3: 'D:\ORACLE\PRODUCT\10.1.0\ORADATA\GTGS\SYSAUX01.DBF'
ORA-10560: block type 'FIRST LEVEL BITMAP BLOCK'
Errors with log
Mon Jun 09 15:36:14 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\bdump\gtgs_p000_9600.trc:
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 2357 change 25400286 time 06/06/2014 04:00:41
ORA-00334: archived log: 'D:\ORACLE\PRODUCT\10.1.0\ORADATA\GTGS\REDO02.LOG'
Mon Jun 09 15:36:14 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\bdump\gtgs_p000_9600.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [1], [1490], [6401], [], [], [], []
Mon Jun 09 15:36:16 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\bdump\gtgs_p000_9600.trc:
ORA-10562: Error occurred while applying redo to data block (file# 1, block# 1490)
ORA-10564: tablespace SYSTEM
ORA-01110: data file 1: 'D:\ORACLE\PRODUCT\10.1.0\ORADATA\GTGS\SYSTEM01.DBF'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 203
ORA-00600: internal error code, arguments: [kddummy_blkchk], [1], [1490], [6401], [], [], [], []
Media Recovery failed with error 12801
ORA-283 signalled during: ALTER DATABASE RECOVER  database  ...

因为数据库允许少量丢失数据,且redo文件发生损坏,直接使用隐含参数屏蔽redo前滚,尝试强制拉库,报ORA-00704,ORA-00600[4000]错误

Mon Jun 09 15:57:51 2014
SMON: enabling cache recovery
Mon Jun 09 15:57:51 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\udump\gtgs_ora_8664.trc:
ORA-00600: 内部错误代码, 参数: [4000], [1], [], [], [], [], [], []
Mon Jun 09 15:57:52 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\udump\gtgs_ora_8664.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [4000], [1], [], [], [], [], [], []
Mon Jun 09 15:57:52 2014
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Mon Jun 09 15:57:52 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\bdump\gtgs_pmon_9760.trc:
ORA-00704: bootstrap process failure
Mon Jun 09 15:57:52 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\bdump\gtgs_reco_5244.trc:
ORA-00704: bootstrap process failure
Mon Jun 09 15:57:52 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\bdump\gtgs_smon_7096.trc:
ORA-00704: bootstrap process failure
Mon Jun 09 15:57:53 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\bdump\gtgs_ckpt_7924.trc:
ORA-00704: bootstrap process failure
Mon Jun 09 15:57:53 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\bdump\gtgs_lgwr_708.trc:
ORA-00704: bootstrap process failure
Mon Jun 09 15:57:53 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\bdump\gtgs_dbw0_7400.trc:
ORA-00704: bootstrap process failure
Mon Jun 09 15:57:53 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\bdump\gtgs_mman_9836.trc:
ORA-00704: bootstrap process failure
Instance terminated by USER, pid = 8664
ORA-1092 signalled during: alter database open resetlogs...

对数据库启动过程做10046,然后使用bbed修改scn绕过该错误,然后继续尝试打开数据库,报ORA-00604/ORA-00607/ORA-00600[4193]错误

Mon Jun 09 16:01:09 2014
SMON: enabling cache recovery
Mon Jun 09 16:01:10 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\udump\gtgs_ora_7548.trc:
ORA-00600: 内部错误代码, 参数: [4193], [57], [51], [], [], [], [], []
Mon Jun 09 16:01:10 2014
Doing block recovery for file 1 block 397
Block recovery range from rba 2.3.0 to scn 0.1073741830
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\PRODUCT\10.1.0\ORADATA\GTGS\REDO02.LOG
Block recovery stopped at EOT rba 2.5.16
Block recovery completed at rba 2.5.16, scn 0.1073741830
Doing block recovery for file 1 block 9
Block recovery range from rba 2.3.0 to scn 0.1073741829
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\PRODUCT\10.1.0\ORADATA\GTGS\REDO02.LOG
Block recovery completed at rba 2.5.16, scn 0.1073741830
Mon Jun 09 16:01:11 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\udump\gtgs_ora_7548.trc:
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-00607: 当更改数据块时出现内部错误
ORA-00600: 内部错误代码, 参数: [4193], [57], [51], [], [], [], [], []
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Mon Jun 09 16:01:11 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\bdump\gtgs_reco_9176.trc:
ORA-00604: error occurred at recursive SQL level
Mon Jun 09 16:01:11 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\bdump\gtgs_smon_7932.trc:
ORA-00604: error occurred at recursive SQL level
Mon Jun 09 16:01:12 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\bdump\gtgs_ckpt_7428.trc:
ORA-00604: error occurred at recursive SQL level
Mon Jun 09 16:01:12 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\bdump\gtgs_lgwr_6936.trc:
ORA-00604: error occurred at recursive SQL level
Mon Jun 09 16:01:12 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\bdump\gtgs_dbw0_404.trc:
ORA-00604: error occurred at recursive SQL level
Mon Jun 09 16:01:12 2014
Errors in file d:\oracle\product\10.1.0\admin\gtgs\bdump\gtgs_mman_7968.trc:
ORA-00604: error occurred at recursive SQL level
Instance terminated by USER, pid = 7548
ORA-1092 signalled during: ALTER DATABASE OPEN...

该错误的原因是因为数据库在启动的过程中,会事先利用上次数据库运行过程中system undo segment header指向的block,而该block异常,所以出现该错误,使用bbed/dul之类的工具清除掉undo seg header 指向block指针,然后数据库启动会重新分配一个block,从而实现数据库正常启动.

记录一次ORA-600 kccpb_sanity_check_2和ORA-600 kcbgtcr_13 错误恢复

晚上朋友告诉我数据库不能open,请求技术支持,检查alert日志发现ORA-00600[kccpb_sanity_check_2]错误导致数据库无法正常mount

Fri Jun  6 23:36:08 2014
alter database mount
Fri Jun  6 23:36:08 2014
This instance was first to mount
Fri Jun  6 23:36:12 2014
Errors in file /on3000/oracle/admin/on3000/udump/on30001_ora_295198.trc:
ORA-00600:内部错误代码, 参数:[kccpb_sanity_check_2], [18045], [17928], [0x000000000], [], [], [], []
ORA-600 signalled during: alter database mount...

依次替换三个控制文件依然无法解决该问题。查询MOS得到解释为[435436.1]

ORA-600 [kccpb_sanity_check_2] indicates that the seq# of the last read block is
higher than the seq# of the control file header block. This is indication of
the lost write of the header block during commit of the previous cf transaction.

出现该故障的原因是因为写丢失导致,而解决该故障的方法有

1) restore a backup of a controlfile and recover
OR
2) recreate the controlfile
OR
3) restore the database from last good backup and recover

该数据库为无备份非归档数据库,因此只能重建控制文件来解决ORA-00600[kccpb_sanity_check_2]故障,通过重建控制文件数据库mount成功.但是在open的过程中又出现需要一个不存在的归档日志(数据库一个节点5月5日异常,另外一个节点5月23日异常,到6月6日我接手中间进行了N多操作,未细细分析原因).
Oracle Database Recovery Check Result检查结果
数据库SCN
1
控制文件中关于数据文件SCN
2
数据文件头SCN
3
REDO SCN
4
这里明显表示thread#缺少归档,导致恢复过程出现如下提示
5


最后没有办法只能使用_allow_resetlogs_corruption参数跳过redo,然后去open数据库,很不幸出现更加悲催的ORA-00704和ORA-00600[kcbgtcr_13]错误,导致数据库open失败

Sat Jun  7 00:28:58 2014
SMON: enabling cache recovery
Sat Jun  7 00:28:58 2014
Errors in file /on3000/oracle/admin/on3000/udump/on30001_ora_344084.trc:
ORA-00600: 内部错误代码, 参数: [kcbgtcr_13], [], [], [], [], [], [], []
Sat Jun  7 00:28:59 2014
Errors in file /on3000/oracle/admin/on3000/udump/on30001_ora_344084.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [kcbgtcr_13], [], [], [], [], [], [], []
Sat Jun  7 00:28:59 2014
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 344084
ORA-1092 signalled during: alter database open...

对启动过程做10046发现

*** 2014-06-07 00:28:58.528
ksedmp: internal or fatal error
ORA-00600: 内部错误代码, 参数: [kcbgtcr_13], [], [], [], [], [], [], []
Current SQL statement for this session:
select ctime, mtime, stime from obj$ where obj# = :1
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              000000004 ? 10538629C ?
ksedmp+0290          bl       ksedst               104A2CDB0 ?
ksfdmp+0018          bl       03F2735C
kgerinv+00dc         bl       _ptrgl
kgeasnmierr+004c     bl       kgerinv              000000000 ? 10564B4CC ?
                                                   000000004 ? 70000006D3FD6F0 ?
                                                   000000000 ?
kcbassertbd+0074     bl       kgeasnmierr          110195490 ? 110486310 ?
                                                   10564BD54 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 11043DC90 ?
kcbgtcr+2a68         bl       kcbassertbd          FFFFFFFFFFE7D00 ?
                                                   FFFFFFFFFE7D28 ?
kturbk1+0258         bl       kcbgtcr              000000000 ? 104D23384 ?
                                                   104D2330C ? 000000000 ?
ktrgcm+1294          bl       kturbk1              F0000000F ? FFFFFFFFFFE84B8 ?
                                                   000000000 ? 000000000 ?
                                                   FFFFFFFFFFE84D0 ? 000000000 ?
ktrgtc+0660          bl       ktrgcm               1104B7600 ?
kdsgrp+0094          bl       ktrgtc               0000006C8 ? 000000000 ?
                                                   FFFFFFFFFFE8D80 ?
                                                   28242043335E5162 ?
                                                   103A06D48 ? 70000006F6C87B8 ?
                                                   1051C3528 ?
kdsfbrcb+0298        bl       kdsgrp               1044726E4 ?
                                                   433000000000003B ?
                                                   FFFFFFFFFFE8E30 ?
qertbFetchByRowID+0  bl       03F27E18
69c
qerstFetch+00ec      bl       01F9482C
opifch2+141c         bl       03F25E6C
opifch+003c          bl       opifch2              FFFFFFFFFFEC3D8 ? 000000000 ?
                                                   FFFFFFFFFFEA360 ?
opiodr+0ae0          bl       _ptrgl
rpidrus+01bc         bl       opiodr               5FFFEC840 ? 200000000 ?
                                                   FFFFFFFFFFEC850 ? 500000000 ?
skgmstack+00c8       bl       _ptrgl
rpidru+0088          bl       skgmstack            11049A820 ? 110195490 ?
                                                   000000002 ? 000000000 ?
                                                   FFFFFFFFFFEC3E8 ?
rpiswu2+034c         bl       _ptrgl
rpidrv+095c          bl       rpiswu2              70000006F6C7250 ? 1104851E8 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   1101B8C48 ? 000000000 ?
rpifch+0050          bl       rpidrv               5FFFEC840 ? 500000000 ?
                                                   FFFFFFFFFFEC850 ? 06CD75F70 ?
kqdpts+0134          bl       rpifch               11022A3E0 ?
kqrlfc+0258          bl       kqdpts               000000000 ?
kqlbplc+00b4         bl       03F28AF0
kqlblfc+0204         bl       kqlbplc              10011A9A8 ?
adbdrv+1978          bl       01F944B4
opiexe+2c08          bl       adbdrv
opiosq0+19f0         bl       opiexe               FFFFFFFFFFF9550 ? 100000000 ?
                                                   FFFFFFFFFFF8F20 ?
kpooprx+0168         bl       opiosq0              3693644A8 ? 700000010003520 ?
                                                   700000069364428 ?
                                                   A4000110195490 ?
kpoal8+0400          bl       kpooprx              FFFFFFFFFFFB774 ?
                                                   FFFFFFFFFFFB500 ?
                                                   1B0000001B ? 100000001 ?
                                                   000000000 ? A40000000000A4 ?
                                                   000000000 ? 11039FA18 ?
opiodr+0ae0          bl       _ptrgl
ttcpip+1020          bl       _ptrgl
opitsk+1124          bl       01F971E8
opiino+0990          bl       opitsk               1E00000000 ? 000000000 ?
opiodr+0ae0          bl       _ptrgl
opidrv+0484          bl       01F96034
sou2o+0090           bl       opidrv               3C02D9A29C ? 4A006E298 ?
                                                   FFFFFFFFFFFF6B0 ?
opimai_real+01bc     bl       01F939B4
main+0098            bl       opimai_real          000000000 ? 000000000 ?
__start+0070         bl       main                 000000000 ? 000000000 ?
--------------------- Binary Stack Dump ---------------------
--undo cr构造记录
Dump of buffer cache at level 1 for tsn=4, rdba=16777293
BH (700000035fb77b8) file#: 4 rdba: 0x0100004d (4/77) class: 46 ba: 7000000357b4000
  set: 10 blksize: 8192 bsi: 0 set-flg: 2 pwbcnt: 0
  dbwrid: 0 obj: -1 objn: 0 tsn: 4 afn: 4
  hash: [700000035f969c8,70000006d3fd868] lru: [700000035fb7728,70000006d6acff0]
  ckptq: [NULL] fileq: [NULL] objq: [700000035fb7798,7000000693932c0]
  st: CR md: NULL tch: 0
  cr: [scn: 0x0.1],[xid: 0x0.0.0],[uba: 0x0.0.0],[cls: 0x0.19afb5fb],[sfl: 0x0]
  flags:
BH (700000035f969c8) file#: 4 rdba: 0x0100004d (4/77) class: 46 ba: 7000000353d6000
  set: 9 blksize: 8192 bsi: 0 set-flg: 2 pwbcnt: 0
  dbwrid: 0 obj: -1 objn: 0 tsn: 4 afn: 4
  hash: [700000035ff9398,700000035fb77b8] lru: [700000035f96938,70000006d6aca98]
  ckptq: [NULL] fileq: [NULL] objq: [700000035f969a8,700000069390250]
  st: CR md: NULL tch: 0
  cr: [scn: 0x0.1],[xid: 0x0.0.0],[uba: 0x0.0.0],[cls: 0x0.19afb5fa],[sfl: 0x0]
  flags:
BH (700000035ff9398) file#: 4 rdba: 0x0100004d (4/77) class: 46 ba: 700000035f70000
  set: 12 blksize: 8192 bsi: 0 set-flg: 2 pwbcnt: 0
  dbwrid: 0 obj: -1 objn: 0 tsn: 4 afn: 4
  hash: [700000035fd86b8,700000035f969c8] lru: [700000035ff9528,70000006d6adaa0]
  ckptq: [NULL] fileq: [NULL] objq: [7000000693973c8,7000000693973c8]
  st: CR md: NULL tch: 0
  cr: [scn: 0x0.1],[xid: 0x0.0.0],[uba: 0x0.0.0],[cls: 0x0.19afb5f9],[sfl: 0x0]
  flags:
BH (700000035fd86b8) file#: 4 rdba: 0x0100004d (4/77) class: 46 ba: 700000035b94000
  set: 11 blksize: 8192 bsi: 0 set-flg: 2 pwbcnt: 0
  dbwrid: 0 obj: -1 objn: 0 tsn: 4 afn: 4
  hash: [700000035fb76a8,700000035ff9398] lru: [700000035fd8848,70000006d6ad548]
  ckptq: [NULL] fileq: [NULL] objq: [700000069396398,700000069396398]
  st: CR md: NULL tch: 0
  cr: [scn: 0x0.1],[xid: 0x0.0.0],[uba: 0x0.0.0],[cls: 0x0.19afb5f8],[sfl: 0x0]
  flags:
BH (700000035fb76a8) file#: 4 rdba: 0x0100004d (4/77) class: 46 ba: 7000000357b2000
  set: 10 blksize: 8192 bsi: 0 set-flg: 2 pwbcnt: 0
  dbwrid: 0 obj: -1 objn: 0 tsn: 4 afn: 4
  hash: [700000035f968b8,700000035fd86b8] lru: [700000035fb7948,700000035fb7838]
  ckptq: [NULL] fileq: [NULL] objq: [7000000693932c0,700000035fb78a8]
  st: CR md: NULL tch: 0
  cr: [scn: 0x0.1],[xid: 0x0.0.0],[uba: 0x0.0.0],[cls: 0x0.19afb5f7],[sfl: 0x0]
  flags:
BH (700000035f968b8) file#: 4 rdba: 0x0100004d (4/77) class: 46 ba: 7000000353d4000
  set: 9 blksize: 8192 bsi: 0 set-flg: 2 pwbcnt: 0
  dbwrid: 0 obj: -1 objn: 0 tsn: 4 afn: 4
  hash: [70000006d3fd868,700000035fb76a8] lru: [700000035f96b58,700000035f96a48]
  ckptq: [NULL] fileq: [NULL] objq: [700000069390250,700000035f96ab8]
  st: CR md: NULL tch: 0
  cr: [scn: 0x0.1],[xid: 0x0.0.0],[uba: 0x0.0.0],[cls: 0x0.19afb5f6],[sfl: 0x0]
  flags:
WAIT #5: nam='db file sequential read' ela= 135 file#=4 block#=77 blocks=1 obj#=-1 tim=1107709395109
on-disk scn: 0x0.19af5d47
BH (700000035fb77b8) file#: 4 rdba: 0x0100004d (4/77) class: 46 ba: 7000000357b4000
  set: 10 blksize: 8192 bsi: 0 set-flg: 2 pwbcnt: 0
  dbwrid: 0 obj: -1 objn: 0 tsn: 4 afn: 4
  hash: [700000035f969c8,70000006d3fd868] lru: [700000035fb7728,70000006d6acff0]
  ckptq: [NULL] fileq: [NULL] objq: [700000035fb7798,7000000693932c0]
  st: CR md: NULL tch: 0
  cr: [scn: 0x0.1],[xid: 0x0.0.0],[uba: 0x0.0.0],[cls: 0x0.19afb5fb],[sfl: 0x0]
  flags:
Dump of buffer cache at level 10 for tsn=4, rdba=16777293
BH (700000035fb77b8) file#: 4 rdba: 0x0100004d (4/77) class: 46 ba: 7000000357b4000
  set: 10 blksize: 8192 bsi: 0 set-flg: 2 pwbcnt: 0
  dbwrid: 0 obj: -1 objn: 0 tsn: 4 afn: 4
  hash: [700000035f969c8,70000006d3fd868] lru: [700000035fb7728,70000006d6acff0]
  ckptq: [NULL] fileq: [NULL] objq: [700000035fb7798,7000000693932c0]
  st: CR md: NULL tch: 0
  cr: [scn: 0x0.1],[xid: 0x0.0.0],[uba: 0x0.0.0],[cls: 0x0.19afb5fb],[sfl: 0x0]
  flags:
  buffer tsn: 4 rdba: 0x0100004d (4/77)
  scn: 0x0000.19af5d47 seq: 0x01 flg: 0x04 tail: 0x5d470201
  frmt: 0x02 chkval: 0x6d2e type: 0x02=KTU UNDO BLOCK
--obj$ block dump记录
Block header dump:  0x0040007a
 Object id on Block? Y
 seg/obj: 0x12  csc: 0x00.19afb597  itc: 1  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000f.012.0002c79c  0x0100004d.6113.1c  --U-    1  fsc 0x0000.19afb598

这里可以知道,数据库在读取obj$的时候使用到了undo cr块的构造,由于某种原因导致构造cr块失败,从而出现ORA-00600[kcbgtcr_13]错误,而因为obj$又在bootstarp$里面,因此又出现ORA-704.所以解决该问题的方法就是让数据库在查询obj$表的时候不再进行cr块构造,比如使用bbed提交事务等方法解决.这里使用bbed提交事务(bbed模拟提交事务一之修改itl),数据库启动成功

SQL> startup
ORACLE 例程已经启动。
Total System Global Area 1610612736 bytes
Fixed Size                  2084400 bytes
Variable Size             973078992 bytes
Database Buffers          620756992 bytes
Redo Buffers               14692352 bytes
数据库装载完毕。
数据库已经打开。

ORA-27086: skgfglk: unable to lock file – already in use

使用nas存储存放控制文件,数据文件的数据库服务器,因为突然断电后,数据库系统无法正常启动,报ORA-27086: skgfglk: unable to lock file – already in use错误,通过分析是因为netapp的nfs锁导致该故障.本文为同事的处理过程记录
数据库启动报错

[oraprod@erpdb dbs]$ sqlplus "/ as sysdba"
SQL*Plus: Release 9.2.0.6.0 - Production on Thu May 29 22:03:00 2014
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area  581506668 bytes
Fixed Size                   452204 bytes
Variable Size             402653184 bytes
Database Buffers          167772160 bytes
Redo Buffers               10629120 bytes
ORA-00205: error in identifying controlfile, check alert log for more info

alert日志

Thu May 29 23:37:56 2014
ORA-00202: controlfile: '/erpdata/PROD/proddata/cntrl01.dbf'
ORA-27086: skgfglk: unable to lock file - already in use
Linux Error: 11: Resource temporarily unavailable
Additional information: 8
Thu May 29 23:37:56 2014
ORA-205 signalled during: ALTER DATABASE   MOUNT...

查看控制文件

[oraprod@erpdb ~]$ ls -ltr /erpdata/PROD/proddata/cntrl01.dbf
-rw-r-----  1 oraprod dba 16293888 May 30  2014 /erpdata/PROD/proddata/cntrl01.dbf
[oraprod@erpdb ~]$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              97G  4.3G   87G   5% /
/dev/sda1             190M   12M  169M   7% /boot
none                  8.0G     0  8.0G   0% /dev/shm
/dev/sdb1             577G  169G  379G  31% /erpdata
/dev/sdb2             241G   16G  213G   7% /oracle
/dev/sda3             9.7G   55M  9.1G   1% /tmp
192.168.1.13:/vol/vol1/backup
                      1.4T  199G  1.3T  14% /erpdbbak
192.168.1.11:/vol/vol1/erpdb
                      150G   84G   67G  56% /erpdata/PROD

这里可以看出来,控制文件是存在的,但是控制文件是存放在nfs中,重试umount/mount /erpdata/PROD文件系统,问题依旧.检查nfs系统所在存储(netapp存储)锁情况

nas设备锁情况

netapp2*> priv set advanced
netapp2*> lock status -h
======== NLM host erpdb.xifenfei.com
10688 0x01b70b28:0x00b83a2c 0:0 1 GRANTED (0x000000005fc63a58)
10688 0x01b70b28:0x00b83a2b 0:0 1 GRANTED (0x000000005fc63398)
10686 0x01b70b28:0x0135d0b3 0:0 1 GRANTED (0x0000000065f73a58)
10686 0x01b70b28:0x0135d0b2 0:0 1 GRANTED (0x00000000161a3b78)
10686 0x01b70b28:0x0135d0b1 0:0 1 GRANTED (0x00000001286246f8)
…………
10686 0x01b70b28:0x01770860 0:0 1 GRANTED (0x000000007e8fb938)
10690 0x01b70b28:0x007adc2a 0:0 1 GRANTED (0x0000000133e7e818)
10690 0x01b70b28:0x007adc29 0:0 1 GRANTED (0x00000000161a3c98)
10690 0x01b70b28:0x007adc28 0:0 1 GRANTED (0x000000017a2606f8)

使用相关命令解锁

storage*> sm_mon -l erpdb.xifenfei.com
storage*> lock status -h

启动数据库恢复正常

查询mos文档,发现相关文章主要有:
NFS Locking Problems Encountered During Database Startup (Doc ID 236794.1)
ORA-1157 ORA-1110 ORA-27086 Starting up Database (Doc ID 145194.1)

数据库恢复遭遇ORA-00600[3705]

某个客户在一台机器上装3个oracle数据库,机器蓝屏后,使用pe拷贝出来所有数据文件,redo文件,控制文件等,在尝试恢复过程中,三个库都出现同样的ORA-600[3705]错误,在以前的数据库恢复中对于redo异常,使用过N次类似方法出来都未出问题,但是在ORACLE 9.2.0.1版本中确实出现诡异现象,这里记录处理过程和大家分享
尝试恢复数据库

C:\Documents and Settings\Administrator>set oracle_sid=telnet
C:\Documents and Settings\Administrator>sqlplus/nolog
SQL*Plus: Release 9.2.0.1.0 - Production on 星期日 5月 25 13:04:29 2014
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.
SQL> conn / as sysdba
已连接。
SQL> select open_mode from v$database;
OPEN_MODE
----------
MOUNTED
SQL> alter database open;
alter database open
*
ERROR 位于第 1 行:
ORA-01113: 文件 2 需要介质恢复
ORA-01110: 数据文件 2: 'E:\ORACLE\ORADATA\TELNET\UNDOTBS01.DBF'
SQL> recover database;
ORA-00283: 恢复会话因错误而取消
ORA-00322: 日志 2 (线程 1) 不是当前副本
ORA-00312: 联机日志 2 线程 1: 'E:\ORACLE\ORADATA\TELNET\REDO02.LOG'

redo异常,尝试使用_allow_resetlogs_corruption参数启动数据库

SQL> startup mount pfile='c:\pfile.txt'
ORACLE 例程已经启动。
Total System Global Area  126950220 bytes
Fixed Size                   453452 bytes
Variable Size             109051904 bytes
Database Buffers           16777216 bytes
Redo Buffers                 667648 bytes
数据库装载完毕。
SQL> recover database until cancel;
ORA-00279: 更改 66763056 (在 05/12/2014 09:50:10 生成) 对于线程 1 是必需的
ORA-00289: 建议: E:\ORACLE\ORA92\RDBMS\ARC00312.001
ORA-00280: 更改 66763056 对于线程 1 是按序列 # 312 进行的
指定日志: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: 警告: RECOVER 成功但 OPEN RESETLOGS 将出现如下错误
ORA-01194: 文件1需要更多的恢复来保持一致性
ORA-01110: 数据文件 1: 'E:\ORACLE\ORADATA\TELNET\SYSTEM01.DBF'
ORA-01112: 未启动介质恢复
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR 位于第 1 行:
ORA-03113: 通信通道的文件结束

查看alert日志

Sun May 25 15:35:02 2014
ALTER DATABASE RECOVER    CANCEL
ORA-1547 signalled during: ALTER DATABASE RECOVER    CANCEL  ...
Sun May 25 15:35:02 2014
ALTER DATABASE RECOVER CANCEL
ORA-1112 signalled during: ALTER DATABASE RECOVER CANCEL ...
Sun May 25 15:35:09 2014
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 66763056
Resetting resetlogs activation ID 0 (0x0)
Sun May 25 15:35:20 2014
Assigning activation ID 2117757301 (0x7e3a6975)
Sun May 25 15:35:20 2014
Errors in file e:\oracle\admin\telnet\bdump\telnet_lgwr_12140.trc:
ORA-00600: internal error code, arguments: [3705], [1], [1], [1], [1], [], [], []
Sun May 25 15:35:56 2014
Errors in file e:\oracle\admin\telnet\bdump\telnet_lgwr_12140.trc:
ORA-00600: internal error code, arguments: [3705], [1], [1], [1], [1], [], [], []
LGWR: terminating instance due to error 600
Instance terminated by LGWR, pid = 12140

创建控制文件继续恢复

SQL> recover database until cancel;
ORA-00283: 恢复会话因错误而取消
ORA-00600: 内部错误代码,参数: [2130], [0], [1], [2], [], [], [], []
SQL> alter database backup controlfile to trace as 'c:\ctl.txt';
数据库已更改。
SQL> alter database open;
alter database open
*
ERROR 位于第 1 行:
ORA-03113: 通信通道的文件结束
SQL> startup nomount
ORACLE 例程已经启动。
Total System Global Area  126950220 bytes
Fixed Size                   453452 bytes
Variable Size             109051904 bytes
Database Buffers           16777216 bytes
Redo Buffers                 667648 bytes
SQL> CREATE CONTROLFILE REUSE DATABASE "TELNET" NORESETLOGS  NOARCHIVELOG
  2      MAXLOGFILES 50
  3      MAXLOGMEMBERS 5
  4      MAXDATAFILES 100
  5      MAXINSTANCES 1
  6      MAXLOGHISTORY 226
  7  LOGFILE
  8    GROUP 1 'E:\ORACLE\ORADATA\TELNET\REDO01.LOG'  SIZE 100M,
  9    GROUP 2 'E:\ORACLE\ORADATA\TELNET\REDO02.LOG'  SIZE 100M,
 10    GROUP 3 'E:\ORACLE\ORADATA\TELNET\REDO03.LOG'  SIZE 100M
 11  DATAFILE
 12    'E:\ORACLE\ORADATA\TELNET\SYSTEM01.DBF',
 13    'E:\ORACLE\ORADATA\TELNET\UNDOTBS01.DBF',
 14    'E:\ORACLE\ORADATA\TELNET\CWMLITE01.DBF',
 15    'E:\ORACLE\ORADATA\TELNET\DRSYS01.DBF',
 16    'E:\ORACLE\ORADATA\TELNET\EXAMPLE01.DBF',
 17    'E:\ORACLE\ORADATA\TELNET\INDX01.DBF',
 18    'E:\ORACLE\ORADATA\TELNET\ODM01.DBF',
 19    'E:\ORACLE\ORADATA\TELNET\TOOLS01.DBF',
 20    'E:\ORACLE\ORADATA\TELNET\USERS01.DBF',
 21    'E:\ORACLE\ORADATA\TELNET\XDB01.DBF'
 22  CHARACTER SET ZHS16GBK
 23  ;
控制文件已创建
SQL> recover database;
ORA-00283: 恢复会话因错误而取消
ORA-00264: 不要求恢复
SQL> alter database open;
数据库已更改。

记录一次系统回滚段坏块恢复

在上一篇中深入分析一次ORA-00314错误的数据库继续恢复,出现file 1 block 403和404等坏块,使得后面的恢复进一步写入了复杂境地.更加麻烦的是,这个库里面有一张核心表有40个字段,包括long,nvarchar2等一些字段,但是使用aul,dul,odu挖取都出现异常(表的行数正确,但是有些列的数据不能正常被挖出来),原因是该表中有特殊字段数据,被逼无赖只能继续拉库
数据库启动报错

SQL> startup mount pfile='c:\pfile.txt'
ORACLE 例程已经启动。
Total System Global Area  452984832 bytes
Fixed Size                  1291120 bytes
Variable Size             201329808 bytes
Database Buffers          243269632 bytes
Redo Buffers                7094272 bytes
数据库装载完毕。
SQL> recover database;
完成介质恢复。
SQL> alter database open;
alter database open upgrade
*
第 1 行出现错误:
ORA-01092: ORACLE 实例终止。强制断开连接

查看alert日志

SMON: enabling cache recovery
Fri May 16 22:49:53 2014
Hex dump of (file 1, block 404) in trace file c:\oracle\product\10.2.0\admin\interlib\udump\interlib_ora_2788.trc
Corrupt block relative dba: 0x00400194 (file 1, block 404)
Fractured block found during buffer read
Data in bad block:
 type: 0 format: 0 rdba: 0x00000000
 last change scn: 0x0000.00000000 seq: 0x0 flg: 0x00
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x00000000
 check value in block header: 0x0
 block checksum disabled
Reread of rdba: 0x00400194 (file 1, block 404) found same corrupted data
Fri May 16 22:49:55 2014
Errors in file c:\oracle\product\10.2.0\admin\interlib\udump\interlib_ora_2788.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 404)
ORA-01110: data file 1: 'C:\ORADATA\INTERLIB\SYSTEM01.DBF'
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604

跟踪数据库启动过程
这里可以清晰的看到是因为block 404有坏块,数据库递归sql访问到该坏块报错,从而数据库无法继续open,对数据库启动过程做10046跟踪

PARSING IN CURSOR #2 len=148 dep=1 uid=0 oct=6 lid=0 tim=80533759 hv=3540833987 ad='2a6ce1ec'
update undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,xactsqn=:8,scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12,spare1=:13 where us#=:1
END OF STMT
PARSE #2:c=62500,e=211766,p=14,cr=85,cu=0,mis=1,r=0,dep=1,og=4,tim=80533755
BINDS #2:
kkscoacd
 Bind#0
  oacdty=01 mxl=32(09) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=2a6ceac2  bln=32  avl=09  flg=09
  value="_SYSSMU1$"
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=099b9b7c  bln=24  avl=02  flg=05
  value=2
 Bind#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=099b9b58  bln=24  avl=02  flg=05
  value=9
 Bind#3
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=099b9b34  bln=24  avl=02  flg=05
  value=2
 Bind#4
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=099b9b10  bln=24  avl=02  flg=05
  value=1
 Bind#5
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=099b9aec  bln=24  avl=04  flg=05
  value=13332
 Bind#6
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=099b9ac8  bln=24  avl=04  flg=05
  value=24672
 Bind#7
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=099b9aa4  bln=24  avl=05  flg=05
  value=47727002
 Bind#8
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=099b9a80  bln=24  avl=01  flg=05
  value=0
 Bind#9
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=099b9a5c  bln=24  avl=01  flg=05
  value=0
 Bind#10
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=099b9a38  bln=24  avl=02  flg=05
  value=1
 Bind#11
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=099b9a14  bln=24  avl=02  flg=05
  value=1
 Bind#12
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=099b9ba0  bln=22  avl=02  flg=05
  value=1
WAIT #2: nam='db file sequential read' ela= 10794 file#=1 block#=404 blocks=1 obj#=-1 tim=80546515
Hex dump of (file 1, block 404)
Dump of memory from 0x1FB9C000 to 0x1FB9E000
1FB9C000 00000000 00000000 00000000 00000000  [................]
        Repeat 286 times
1FB9D1F0 00000000 00000000 00000000 00000001  [................]
1FB9D200 0000A200 0001D831 00000000 05010000  [....1...........]
1FB9D210 00007F30 00000000 00000000 00000000  [0...............]
1FB9D220 00000000 00000000 00000000 00000000  [................]
  Repeat 221 times
Corrupt block relative dba: 0x00400194 (file 1, block 404)
Fractured block found during buffer read
Data in bad block:
 type: 0 format: 0 rdba: 0x00000000
 last change scn: 0x0000.00000000 seq: 0x0 flg: 0x00
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x00000000
 check value in block header: 0x0
 block checksum disabled
Reread of rdba: 0x00400194 (file 1, block 404) found same corrupted data
EXEC #2:c=0,e=45015,p=1,cr=1,cu=3,mis=1,r=0,dep=1,og=4,tim=80578919
ERROR #2:err=1578 tim=811723
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 404)
ORA-01110: data file 1: 'C:\ORADATA\INTERLIB\SYSTEM01.DBF'
EXEC #1:c=328125,e=3468857,p=52,cr=619,cu=10,mis=0,r=0,dep=0,og=1,tim=82331842
ERROR #1:err=1092 tim=811898

dbv检测坏块

C:\oracle\product\10.2.0\db_1\BIN>dbv  FILE = C:\ORADATA\INTERLIB\SYSTEM01.DBF
DBVERIFY: Release 10.2.0.3.0 - Production on 星期一 5月 19 15:22:41 2014
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
DBVERIFY - 开始验证: FILE = C:\ORADATA\INTERLIB\SYSTEM01.DBF
页 403 标记为损坏
Corrupt block relative dba: 0x00400193 (file 1, block 403)
Bad header found during dbv:
Data in bad block:
 type: 70 format: 1 rdba: 0x00030030
 last change scn: 0x0000.3a6c2e79 seq: 0x0 flg: 0x00
 spare1: 0x4c spare2: 0x45 spare3: 0x2
 consistency value in tail: 0x00070000
 check value in block header: 0x1
 block checksum disabled
页 404 流入 - 很可能是介质损坏
Corrupt block relative dba: 0x00400194 (file 1, block 404)
Fractured block found during dbv:
Data in bad block:
 type: 0 format: 0 rdba: 0x00000000
 last change scn: 0x0000.00000000 seq: 0x0 flg: 0x00
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x00000000
 check value in block header: 0x0
 block checksum disabled
页 498 流入 - 很可能是介质损坏
Corrupt block relative dba: 0x004001f2 (file 1, block 498)
Fractured block found during dbv:
Data in bad block:
 type: 6 format: 2 rdba: 0x004001f2
 last change scn: 0x0000.02d7eb9d seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x37290601
 check value in block header: 0x6c5e
 computed block checksum: 0xc2b5
页 61078 流入 - 很可能是介质损坏
Corrupt block relative dba: 0x0040ee96 (file 1, block 61078)
Fractured block found during dbv:
Data in bad block:
 type: 6 format: 2 rdba: 0x0040ee96
 last change scn: 0x0000.02d5cf11 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xa6e30601
 check value in block header: 0x51d4
 computed block checksum: 0xf572
页 61147 流入 - 很可能是介质损坏
Corrupt block relative dba: 0x0040eedb (file 1, block 61147)
Fractured block found during dbv:
Data in bad block:
 type: 6 format: 2 rdba: 0x0040eedb
 last change scn: 0x0000.02d7f7e6 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x6ed80601
 check value in block header: 0x4ae8
 computed block checksum: 0x893f
页 61502 流入 - 很可能是介质损坏
Corrupt block relative dba: 0x0040f03e (file 1, block 61502)
Fractured block found during dbv:
Data in bad block:
 type: 6 format: 2 rdba: 0x0040f03e
 last change scn: 0x0000.02d810dd seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xaf4c0601
 check value in block header: 0xd99b
 computed block checksum: 0xbf91
页 61989 流入 - 很可能是介质损坏
Corrupt block relative dba: 0x0040f225 (file 1, block 61989)
Fractured block found during dbv:
Data in bad block:
 type: 6 format: 2 rdba: 0x0040f225
 last change scn: 0x0000.02d80f65 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xdff70601
 check value in block header: 0x4e2a
 computed block checksum: 0xd092
DBVERIFY - 验证完成
检查的页总数: 62720
处理的页总数 (数据): 37740
失败的页总数 (数据): 0
处理的页总数 (索引): 7021
失败的页总数 (索引): 0
处理的页总数 (其它): 1784
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 16169
标记为损坏的总页数: 7
流入的页总数: 5
最高块 SCN            : 316431787 (0.316431787)
C:\oracle\product\10.2.0\db_1\BIN>

根据我们的经验,对数据库启动影响很大的,主要是403,404,498三个坏块,在10201的库中查询得出结论(403,404是rollback,498是seq$).那现在我们可以理解在数据库启动过程中需要undo$表中相关列信息,但是由于rollback对应的block有坏块,使得数据库无法操作update操作,从而无法正常启动.

SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
  3   WHERE FILE_ID = &FILE_ID
  4     AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
输入 file_id 的值:  1
原值    3:  WHERE FILE_ID = &FILE_ID
新值    3:  WHERE FILE_ID = 1
输入 block_id 的值:  404
原值    4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
新值    4:    AND 404 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
OWNER
------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE       TABLESPACE_NAME                PARTITION_NAME
------------------ ------------------------------ ------------------------------
SYS
SYSTEM
ROLLBACK           SYSTEM

常用方法
设置event跳过坏块,使用隐含参数屏蔽回滚段

  _corrupted_rollback_segments= _SYSSMU1$, _SYSSMU2$, _SYSSMU3$, _SYSSMU4$, _SYSSMU5$, _SYSSMU6$, _SYSSMU7$, _SYSSMU8$, _SYSSMU9$, _SYSSMU10$, _SYSSMU11$, _SYSSMU12$, _SYSSMU13$, _SYSSMU14$, _SYSSMU15$, _SYSSMU16$, _SYSSMU17$, _SYSSMU18$, _SYSSMU19$, _SYSSMU20$
  undo_management          = MANUAL
  event                    = 10231 trace name context forever, level 10, 10232 trace name context forever, level 10, 10233 trace name context forever, level 10
SQL> startup mount pfile='c:\pfile.txt'
ORACLE 例程已经启动。
Total System Global Area  452984832 bytes
Fixed Size                  1291120 bytes
Variable Size             201329808 bytes
Database Buffers          243269632 bytes
Redo Buffers                7094272 bytes
数据库装载完毕。
SQL> recover database;
完成介质恢复。
SQL> alter database open;
alter database open upgrade
*
第 1 行出现错误:
ORA-01092: ORACLE 实例终止。强制断开连接

这里可以看出来,数据库在update undo$的时候因为rollback异常无法通过屏蔽回滚段和跳过坏块的方法来解决。因为是system undo系统块,想法就是从别的相同版本库中拷贝一个相同位置块过来试试看。

拷贝数据块

C:\oracle\product\10.2.0\db_1\BIN>bbed listfile=c:\bbed.txt
Password:
BBED: Release 2.0.0.0.0 - Limited Production on Mon May 19 10:47:14 2014
(c) Copyright 2000 Oracle Corporation.  All rights reserved.
************* !!! For Oracle Internal Use only !!! ***************
BBED> set blocksize 8192
        BLOCKSIZE       8192
BBED> set mode edit
        MODE            Edit
BBED> copy file 2 block 405 to file 1 block 405
 File: C:\ORADATA\INTERLIB\SYSTEM01.DBF (1)
 Block: 405              Offsets:    0 to  511           Dba:0x00400195
------------------------------------------------------------------------
 02a20000 94014000 d4c50800 00000104 60890000 00000500 36000000 66004545
 0000e81f 401fc81e 201ea01d 481da01c f81b501b e81a801a d8197019 08196018
 f8179017 e8168016 1816c815 68152815 d8145814 f8137813 18139812 3812f811
 a8116811 18119810 3810b80f 580fd80e 780e380e e80da80d 580d180d c80c480c
 f00b480b e00a380a d0092809 c0087008 1008d007 80074007 f006b006 60062006
 d0052805 a8046004 2004d003 d802b401 6001e000 00000000 0a001800 0c001400
 1300ff1d 78100000 78100000 02000000 00000000 0a165f45 00002e00 0201e71e
 94014000 4c004300 03020100 2b05c000 5e15c000 0c001800 0c002000 29000100
 82230000 82230000 02000000 00000000 0a163347 00003200 02014500 94014000
 58004600 03022100 f30ec000 f50ec000 00000000 00000000 03000000 0d000e00
 0e000e00 06c52008 4b5a2705 c51e3337 5f06c520 084b5a27 06c51f4c 28393906
 c520084b 5a2706c5 273e0355 0d000000 ff165f43 0a001800 0c001400 0d004000
 82230000 82230000 02000000 00000000 0a163346 00003200 02014500 94014000
 58004500 03020100 f30ec000 f80ec000 00000000 00000000 06c52008 4b5a2705
 c4201c60 5f000000 0c001800 0c002000 2a000100 82230000 82230000 02000000
 00000000 0a163345 00003200 02014500 94014000 58004400 03022100 f30ec000
 <32 bytes per line>
BBED> sum apply
Check value for File 1, Block 405:
current = 0x8960, required = 0x8960

重新启动数据库

SQL> startup mount pfile='c:\pfile.txt'
ORACLE 例程已经启动。
Total System Global Area  452984832 bytes
Fixed Size                  1291120 bytes
Variable Size             201329808 bytes
Database Buffers          243269632 bytes
Redo Buffers                7094272 bytes
数据库装载完毕。
SQL> recover database;
完成介质恢复。
SQL> alter database open;
alter database open upgrade
*
第 1 行出现错误:
ORA-01092: ORACLE 实例终止。强制断开连接

分析alert日志

Mon May 19 10:54:05 2014
Recovery of Online Redo Log: Thread 1 Group 3 Seq 13 Reading mem 0
  Mem# 0: C:\ORADATA\INTERLIB\REDO03.LOG
Block recovery stopped at EOT rba 13.5.16
Block recovery completed at rba 13.5.16, scn 0.316366922
Doing block recovery for file 1 block 9
Block recovery from logseq 13, block 3 to scn 316366921
Mon May 19 10:54:05 2014
Recovery of Online Redo Log: Thread 1 Group 3 Seq 13 Reading mem 0
  Mem# 0: C:\ORADATA\INTERLIB\REDO03.LOG
Block recovery completed at rba 13.5.16, scn 0.316366922
Mon May 19 10:54:07 2014
Errors in file c:\oracle\product\10.2.0\admin\interlib\udump\interlib_ora_1208.trc:
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-00607: 当更改数据块时出现内部错误
ORA-00600: 内部错误代码, 参数: [4193], [102], [58], [], [], [], [], []
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604

这里是因为我们拷贝了一个其他库的undo段过来,然后数据库启动的时候首先使用到该undo块和rollback segment header 不匹配,所以通过通过修改undo header 来修复,使用bbed修改段头信息,因为在以前的文章中描述过,在此不再重复,具体参考:使用bbed解决ORA-00607/ORA-00600[4194]故障启动数据库

SQL> startup mount pfile='c:\pfile.txt'
ORACLE 例程已经启动。
Total System Global Area  452984832 bytes
Fixed Size                  1291120 bytes
Variable Size             201329808 bytes
Database Buffers          243269632 bytes
Redo Buffers                7094272 bytes
数据库装载完毕。
SQL> recover database;
完成介质恢复。
SQL> alter database open upgrade;
数据库已更改。

但是因为seq$有坏块,数据库启动后,如果使用非系统认证登录数据库会报如下错误

C:\oracle\product\10.2.0\db_1\BIN>sqlplus interlib/oracle
SQL*Plus: Release 10.2.0.3.0 - Production on 星期一 5月 19 16:29:29 2014
Copyright (c) 1982, 2006, Oracle.  All Rights Reserved.
ERROR:
ORA-00600: 内部错误代码, 参数: [6807], [AUDSES$], [144], [], [], [], [], []
请输入用户名:
C:\oracle\product\10.2.0\db_1\BIN>sqlplus / as sysdba
SQL*Plus: Release 10.2.0.3.0 - Production on 星期一 5月 19 16:30:08 2014
Copyright (c) 1982, 2006, Oracle.  All Rights Reserved.
连接到:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
SQL> select object_type from dba_objects where object_name='AUDSES$';
OBJECT_TYPE
-------------------
SEQUENCE

因为seq$有坏块导致该问题,因为该数据库需要重建,使用exp导出来数据,然后重建完成相关工作

深入分析一次ORA-00314错误

运行在win平台上的oracle 10G数据库,因为主机蓝屏,使用pe拷贝出来相关数据库文件,redo文件,控制文件,在恢复数据库过中遇到ORA-00314错误,而无法继续下去

恢复报错

SQL> recover database
ORA-00283: 恢复会话因错误而取消
ORA-00314: 日志 3 (用于线程 1) 要求的序号 1366 与 1363 不匹配
ORA-00312: 联机日志 3 线程 1:
'DC:\ORADATA\INTERLIB\REDO03.LOG'
00314, 00000, "log %s of thread %s, expected sequence# %s doesn't match %s"
// *Cause:  The online log is corrupted or is an old version.
// *Action: Find and install correct version of log or reset logs.

这里提示可以大概看出来,数据库进行恢复的时候,需要sequence 为1366的日志,但是与现在的1363不匹配。通过上面的解释我们很可能知道是redo文件异常或者弄错了redo文件

分析Oracle Database Recovery Check结果
控制文件scn相关信息,从这里可以看到数据库的控制文件scn为47714860,checkpoint scn为47711411
r1
控制文件中关于数据库文件是scn为47711411,而且stop_scn为null
r2
数据文件头scn是47711411,而且fuzzy为yes
r3
redo的相关信息,这里我们可以看出当前redo的sequence为1366,first_scn为47711411,redo的写入顺序是redo01->redo02->redo03->redo01
r4
通过这里整体分析,我们可以知道,数据库为非正常关闭,恢复应该从redo03(sequence 1366)开始进行恢复,但是为什么出现ORA-00314呢?通过dump redo header继续分析

dump redo header 分析
这个里面我们可以看到redo01的sequence 为16进制554等于10进制的1364,scn的范围为:0x000002d6e4ab-0x000002d7455b
redo1


这个里面我们可以看到redo02的sequence 为16进制555等于10进制的1365,scn的范围为:0x000002d7455b-0x000002d804b3
redo2


这个属于异常redo,注意第一个seq: 0x00000556等于十进制的1366(也就是表示该redo的sequence为1366),”Seq# 0000001363,SCN 0x000002d5e1c6-0x000002d6e4ab”表示在redo的文件头有部分信息记录的为sequence为1633,另外这里表示该文件最大scn为0x000002d6e4ab,和redo01的最小scn(0x000002d6e4ab-0x000002d7455b)/redo01的”Prev scn: 0x0000.02d5e1c6″与现在看到的redo03中的最小scn匹配.
redo3


通过这里可以明白,在主机蓝屏的时候由于某种异常,导致redo03中的部分信息修改为了sequence为1366,但是部分信息依然保留它上次的sequence为1363的信息,导致数据库在重新恢复的时候无法正常成功.

处理方法
该故障是由于current redo异常导致,根据经验(具体参考:ORACLE REDO各种异常恢复),一般使用隐含参数屏蔽前滚,然后强制拉库,绝大部分情况能够拉库成功,如果人品不好可能需要使用其他隐含参数甚至bbed等方式处理

执行计划改变导致数据库负载过高

核心生产库突然负载飙升,我的压力来了,通过分析是一条sql的执行计划改变引起该故障,最终通过sql profile固定执行计划解决该问题
数据库主机负载
这里明显表现系统load 偏高,而且还在上升中;top的进程中,占用cpu都计划100%

top - 16:25:39 up 123 days,  1:42,  4 users,  load average: 46.19, 45.08, 43.93
Tasks: 1469 total,  28 running, 1439 sleeping,   0 stopped,   2 zombie
Cpu(s): 45.9%us,  1.1%sy,  0.0%ni, 47.1%id,  5.2%wa,  0.1%hi,  0.6%si,  0.0%st
Mem:  264253752k total, 262605260k used,  1648492k free,   413408k buffers
Swap: 33554424k total,   458684k used, 33095740k free, 67110504k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2622 oracle    18   0  150g  34m  28m R 100.0  0.0  11:58.21 oracleq9db1 (LOCAL=NO)
15881 oracle    21   0  150g  35m  28m R 100.0  0.0  72:14.20 oracleq9db1 (LOCAL=NO)
17214 oracle    19   0  150g  38m  32m R 100.0  0.0  20:47.44 oracleq9db1 (LOCAL=NO)
17705 oracle    21   0  150g  34m  28m R 100.0  0.0  27:33.00 oracleq9db1 (LOCAL=NO)
 6110 oracle    19   0  150g  34m  28m R 99.8  0.0  12:33.34 oracleq9db1 (LOCAL=NO)
 6876 oracle    19   0  150g  34m  27m R 99.5  0.0  12:43.90 oracleq9db1 (LOCAL=NO)
17205 oracle    23   0  150g  34m  27m R 99.5  0.0  21:37.18 oracleq9db1 (LOCAL=NO)
24629 oracle    20   0  150g  35m  29m R 99.5  0.0  28:10.62 oracleq9db1 (LOCAL=NO)
26959 oracle    19   0  150g  34m  27m R 99.5  0.0  47:17.87 oracleq9db1 (LOCAL=NO)
 7655 oracle    18   0  150g  30m  25m R 98.5  0.0   2:28.45 oracleq9db1 (LOCAL=NO)
16377 oracle    18   0  150g  34m  28m R 98.5  0.0  36:07.11 oracleq9db1 (LOCAL=NO)
24637 oracle    18   0  150g  37m  30m R 98.2  0.0  26:39.15 oracleq9db1 (LOCAL=NO)
 6106 oracle    21   0  150g  40m  33m R 97.2  0.0  11:37.75 oracleq9db1 (LOCAL=NO)
28785 oracle    18   0  150g  34m  28m R 96.9  0.0  24:03.29 oracleq9db1 (LOCAL=NO)
24278 oracle    17   0  150g  31m  26m S 96.5  0.0   3:15.51 oracleq9db1 (LOCAL=NO)
24283 oracle    17   0  150g  33m  28m S 96.5  0.0   6:25.26 oracleq9db1 (LOCAL=NO)
 7098 oracle    18   0  150g  32m  27m R 94.6  0.0   2:20.22 oracleq9db1 (LOCAL=NO)
 6874 oracle    17   0  150g  34m  28m R 87.0  0.0  12:02.92 oracleq9db1 (LOCAL=NO)
18206 oracle    16   0  150g  34m  27m R 86.1  0.0  16:11.28 oracleq9db1 (LOCAL=NO)
 7096 oracle    17   0  150g  29m  24m R 85.4  0.0   3:01.72 oracleq9db1 (LOCAL=NO)

数据库等待事件

       SID EVENT
---------- ----------------------------------------------------------------
       183 gc cr request
       185 latch: cache buffers chains
       239 db file sequential read
       292 gc cr request
       406 gc cr request
       410 db file sequential read
       463 gc current request
       572 gc buffer busy acquire
       575 gc buffer busy acquire
       577 latch: cache buffers chains
       629 db file sequential read
       747 gc cr request
       919 latch: cache buffers chains
       974 gc cr request
      1033 log file sync
      1141 db file parallel write
      1153 gc cr request
      1199 db file sequential read
      1378 db file sequential read
      1495 gc cr request
      1540 db file parallel write
      1547 gc buffer busy acquire
      1662 gc cr request
      1715 gc buffer busy acquire
      1770 SQL*Net message to client
      1830 latch: cache buffers chains
      1884 gc cr request
      2113 db file sequential read
      2173 db file sequential read
      2229 rdbms ipc reply
      2292 db file sequential read
      2341 db file sequential read
      2348 gc cr request
      2460 gc cr request
      2632 gc cr request
      2684 gc cr request
      2687 db file sequential read
      2749 db file sequential read
      2913 gc cr request
      2967 db file sequential read
      3038 gc cr request
      3087 SQL*Net message to client
      3089 gc cr request
      3194 db file sequential read
      3195 db file sequential read
      3309 latch: cache buffers chains
      3371 gc cr request
      3485 gc cr request
      3535 gc cr request

可以这里有很多gc cr request等待和cache buffers chains等待,第一反应就是很可能系统某条sql执行计划不正确导致逻辑读剧增.

分析awr报告
故障时候的逻辑读 top sql可以看出来cdwjdd67x27mh的sql逻辑读异常大
awr1
分析正常时间点awr报告中cdwjdd67x27mh逻辑读情况
awr2
通过对比可以发现两个awr中,cdwjdd67x27mh 语句执行次数差不多,但是单次逻辑读从800多突变为34000多,增加了40多倍
cdwjdd67x27mh 语句分析
执行计划

17:36:08 sys@Q9DB>select * from table(dbms_xplan.display_awr('cdwjdd67x27mh'));
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID cdwjdd67x27mh
--------------------
select * from ( select          *                               from                            GE_TAOBAO_BILL           o
                 WHERE 1=1                       and
o.CREATED_TIME >= :1            and o.CREATED_TIME < :2
                                and o.REC_SITE_ID = :3
             and o.STATUS_ID = :4
             and o.SERVICE_TYPE = :5 ) where rownum
<= 36000
Plan hash value: 647855111
----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name                      | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                           |       |       |     5 (100)|          |        |
|   1 |  COUNT STOPKEY                       |                           |       |       |            |          |        |
|   2 |   FILTER                             |                           |       |       |            |          |        |
|   3 |    PARTITION RANGE ITERATOR          |                           |     1 |   455 |     5   (0)| 00:00:01 |   KEY |   KEY |
|   4 |     TABLE ACCESS BY LOCAL INDEX ROWID| GE_TAOBAO_BILL            |     1 |   455 |     5   (0)| 00:00:01 |   KEY |   KEY |
|   5 |      INDEX RANGE SCAN                | IDX_TAOBAO_BILL_CR_ISDP_S |     1 |       |     4   (0)| 00:00:01 |   KEY |   KEY |
----------------------------------------------------------------------------------------------------------------------------------
SQL_ID cdwjdd67x27mh
--------------------
select * from ( select          *                               from                            GE_TAOBAO_BILL           o
                 WHERE 1=1                       and
o.CREATED_TIME >= :1            and o.CREATED_TIME < :2
                                and o.REC_SITE_ID = :3
             and o.STATUS_ID = :4
             and o.SERVICE_TYPE = :5 ) where rownum
<= 36000
Plan hash value: 2979024279
---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name                     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                          |       |       |    13 (100)|          |       ||
|   1 |  COUNT STOPKEY                       |                          |       |       |            |          |       ||
|   2 |   FILTER                             |                          |       |       |            |          |       ||
|   3 |    PARTITION RANGE ITERATOR          |                          |     1 |   455 |    13   (0)| 00:00:01 |   KEY |   KEY |
|   4 |     TABLE ACCESS BY LOCAL INDEX ROWID| GE_TAOBAO_BILL           |     1 |   455 |    13   (0)| 00:00:01 |   KEY |   KEY |
|   5 |      INDEX RANGE SCAN                | IDX_TAOBAO_BILL_CR_REC_S |     4 |       |     9   (0)| 00:00:01 |   KEY |   KEY |
---------------------------------------------------------------------------------------------------------------------------------

这里可以发现cdwjdd67x27mh在数据库中有两个执行计划

通过awr数据,看执行计划的变化情况

10:50:29 sys@Q9DB>select a.INSTANCE_NUMBER,a.snap_id,a.sql_id,a.plan_hash_value,b.begin_interval_time
10:50:30   2  from dba_hist_sqlstat a, dba_hist_snapshot b
10:50:30   3  where sql_id ='cdwjdd67x27mh'
10:50:30   4  and a.snap_id = b.snap_id
10:50:30   5  order by instance_number,begin_interval_time  desc;
INSTANCE_NUMBER    SNAP_ID SQL_ID        PLAN_HASH_VALUE BEGIN_INTERVAL_TIME
--------------- ---------- ------------- --------------- ----------------------------------------
              1      20719 cdwjdd67x27mh      2979024279 12-MAY-14 05.00.12.753 PM
              1      20719 cdwjdd67x27mh       647855111 12-MAY-14 05.00.12.753 PM
              1      20719 cdwjdd67x27mh      2979024279 12-MAY-14 05.00.12.702 PM
              1      20719 cdwjdd67x27mh       647855111 12-MAY-14 05.00.12.702 PM
              1      20718 cdwjdd67x27mh       647855111 12-MAY-14 04.00.24.197 PM
              1      20718 cdwjdd67x27mh      2979024279 12-MAY-14 04.00.24.197 PM
              1      20718 cdwjdd67x27mh       647855111 12-MAY-14 04.00.24.172 PM
              1      20718 cdwjdd67x27mh      2979024279 12-MAY-14 04.00.24.172 PM
              1      20717 cdwjdd67x27mh       647855111 12-MAY-14 03.11.22.251 PM
              1      20717 cdwjdd67x27mh      2979024279 12-MAY-14 03.11.22.251 PM
              1      20717 cdwjdd67x27mh       647855111 12-MAY-14 03.11.22.188 PM
              1      20717 cdwjdd67x27mh      2979024279 12-MAY-14 03.11.22.188 PM
              ………………
              1      20696 cdwjdd67x27mh      2979024279 11-MAY-14 07.00.07.142 PM
              1      20696 cdwjdd67x27mh      2979024279 11-MAY-14 07.00.07.105 PM
              1      20695 cdwjdd67x27mh      2979024279 11-MAY-14 06.00.12.771 PM
              1      20695 cdwjdd67x27mh      2979024279 11-MAY-14 06.00.12.707 PM
              1      20694 cdwjdd67x27mh      2979024279 11-MAY-14 05.00.48.249 PM
              1      20694 cdwjdd67x27mh      2979024279 11-MAY-14 05.00.48.170 PM
              1      20693 cdwjdd67x27mh      2979024279 11-MAY-14 04.00.37.841 PM
              …………
              2      20719 cdwjdd67x27mh      2979024279 12-MAY-14 05.00.12.753 PM
              2      20719 cdwjdd67x27mh      2979024279 12-MAY-14 05.00.12.702 PM
              2      20718 cdwjdd67x27mh      2979024279 12-MAY-14 04.00.24.197 PM
              2      20718 cdwjdd67x27mh      2979024279 12-MAY-14 04.00.24.172 PM
              2      20717 cdwjdd67x27mh      2979024279 12-MAY-14 03.11.22.251 PM
              2      20717 cdwjdd67x27mh      2979024279 12-MAY-14 03.11.22.188 PM

这里可以清晰看到,执行计划在1节点中发生震荡(两个执行计划都有选择),仔细看两个执行计划,会发现就是查询的时候所使用的index不同而已,继续分析两个index

IDX_TAOBAO_BILL_CR_ISDP_S      CREATED_TIME                            1
IDX_TAOBAO_BILL_CR_ISDP_S      IS_DISPART                              2
IDX_TAOBAO_BILL_CR_ISDP_S      STATUS_ID                               3
IDX_TAOBAO_BILL_CR_REC_S       REC_SITE_ID                             1
IDX_TAOBAO_BILL_CR_REC_S       CREATED_TIME                            2
IDX_TAOBAO_BILL_CR_REC_S       STATUS_ID                               3

分析数据分布情况(其他列省略)

TABLE_NAME                     COLUMN_NAME                    NUM_DISTINCT LAST_ANAL
------------------------------ ------------------------------ ------------ ---------
GE_TAOBAO_BILL                 CREATED_TIME                      287080448 05-MAY-14
GE_TAOBAO_BILL                 REC_SITE_ID                           13176 05-MAY-14
GE_TAOBAO_BILL                 IS_DISPART                                2 05-MAY-14
GE_TAOBAO_BILL                 STATUS_ID                                 7 05-MAY-14

这里可以看到数据库在正常的时候使用IDX_TAOBAO_BILL_CR_REC_S没有问题,但是在某些情况下选择使用IDX_TAOBAO_BILL_CR_ISDP_S这个就有问题,导致逻辑读过高,其实对于该sql语句,是因为IDX_TAOBAO_BILL_CR_REC_S 索引不合理导致,如果创建CREATED_TIME,REC_SITE_ID,STATUS_ID,就不会因为传输的值范围不同而使用IDX_TAOBAO_BILL_CR_ISDP_S的情况。针对该情况,因为表非常大,短时间内无法修改index,只能考虑使用 sql profile固定执行计划

sql profile固定计划

SQL>@coe_xfr_sql_profile.sql cdwjdd67x27mh
Parameter 1:
SQL_ID (required)
PLAN_HASH_VALUE AVG_ET_SECS
--------------- -----------
     2979024279        .011
      647855111       5.164
Parameter 2:
PLAN_HASH_VALUE (required)
Enter value for 2: 2979024279
Values passed:
~~~~~~~~~~~~~
SQL_ID         : "cdwjdd67x27mh"
PLAN_HASH_VALUE: "2979024279"
Execute coe_xfr_sql_profile_cdwjdd67x27mh_2979024279.sql
on TARGET system in order to create a custom SQL Profile
with plan 2979024279 linked to adjusted sql_text.
COE_XFR_SQL_PROFILE completed.
SQL>@coe_xfr_sql_profile_cdwjdd67x27mh_2979024279.sql
SQL>DECLARE
  2  sql_txt CLOB;
  3  h       SYS.SQLPROF_ATTR;
  4  BEGIN
  5  sql_txt := q'[
  6  select * from ( select
  7                  *
  8
  9                  from
 10
 11                  GE_TAOBAO_BILL
 12
 13                  o
 14
 15                   WHERE 1=1
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27                                  and o.CREATED_TIME >= :1
 28
 29
 30                                  and o.CREATED_TIME < :2
 31
 32
 33                                  and o.REC_SITE_ID = :3
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49                                  and o.STATUS_ID = :4
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64                                  and o.SERVICE_TYPE = :5 ) where rownum <= 36000
 65  ]';
 66  h := SYS.SQLPROF_ATTR(
 67  q'[BEGIN_OUTLINE_DATA]',
 68  q'[IGNORE_OPTIM_EMBEDDED_HINTS]',
 69  q'[OPTIMIZER_FEATURES_ENABLE('11.2.0.3')]',
 70  q'[DB_VERSION('11.2.0.3')]',
 71  q'[FIRST_ROWS]',
 72  q'[OUTLINE_LEAF(@"SEL$F5BB74E1")]',
 73  q'[MERGE(@"SEL$2")]',
 74  q'[OUTLINE(@"SEL$1")]',
 75  q'[OUTLINE(@"SEL$2")]',
 76  q'[INDEX_RS_ASC(@"SEL$F5BB74E1" "O"@"SEL$2" ("GE_TAOBAO_BILL"."REC_SITE_ID" "GE_TAOBAO_BILL"."CREATED_TIME" "GE_TAOBAO_BILL"."STATUS_ID"))]',
 77  q'[END_OUTLINE_DATA]');
 78  :signature := DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE(sql_txt);
 79  DBMS_SQLTUNE.IMPORT_SQL_PROFILE (
 80  sql_text    => sql_txt,
 81  profile     => h,
 82  name        => 'coe_cdwjdd67x27mh_2979024279',
 83  description => 'coe cdwjdd67x27mh 2979024279 '||:signature||'',
 84  category    => 'DEFAULT',
 85  validate    => TRUE,
 86  replace     => TRUE,
 87  force_match => FALSE /* TRUE:FORCE (match even when different literals in SQL). FALSE:EXACT (similar to CURSOR_SHARING) */ );
 88  END;
 89  /
PL/SQL procedure successfully completed.
SQL>WHENEVER SQLERROR CONTINUE
SQL>SET ECHO OFF;
            SIGNATURE
---------------------
 18414135509058398362
... manual custom SQL Profile has been created
COE_XFR_SQL_PROFILE_cdwjdd67x27mh_2979024279 completed

固定执行计划后,系统负载恢复正常
load恢复正常,单个进程占用cpu 也正常

top - 18:25:29 up 123 days,  4:25,  3 users,  load average: 17.59, 16.72, 16.10
Tasks: 1559 total,   6 running, 1551 sleeping,   0 stopped,   2 zombie
Cpu(s):  6.2%us,  1.2%sy,  0.0%ni, 84.3%id,  7.4%wa,  0.1%hi,  0.9%si,  0.0%st
Mem:  264253752k total, 262395300k used,  1858452k free,   305928k buffers
Swap: 33554424k total,   467420k used, 33087004k free, 66811412k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3010 oracle    15   0  150g  26m  21m S 23.6  0.0   0:08.60 oracleq9db2 (LOCAL=NO)
26936 oracle    15   0  150g  33m  28m S 22.9  0.0   0:12.53 oracleq9db2 (LOCAL=NO)
 9691 oracle    15   0  150g  34m  28m D 21.0  0.0   0:48.73 oracleq9db2 (LOCAL=NO)
 9694 oracle    15   0  150g  34m  28m S 17.4  0.0   0:45.74 oracleq9db2 (LOCAL=NO)
14366 oracle    15   0  150g  33m  28m R 17.4  0.0   0:25.11 oracleq9db2 (LOCAL=NO)
 2471 oracle    -2   0  150g 180m  37m S 16.7  0.1  10795:55 ora_lms3_q9db2
 2463 oracle    -2   0  150g 180m  37m S 15.7  0.1  10648:38 ora_lms1_q9db2
 2459 oracle    -2   0  150g 180m  37m S 14.8  0.1  10726:42 ora_lms0_q9db2
 2467 oracle    -2   0  150g 180m  36m S 14.8  0.1  10980:33 ora_lms2_q9db2
13324 oracle    15   0  150g  31m  26m S 14.1  0.0   0:22.75 oracleq9db2 (LOCAL=NO)
16740 oracle    15   0  150g  34m  28m R 13.4  0.0   0:43.85 oracleq9db2 (LOCAL=NO)
 1908 oracle    15   0  150g  27m  22m S 11.8  0.0   0:03.95 oracleq9db2 (LOCAL=NO)
 9689 oracle    15   0  150g  33m  27m S 11.8  0.0   0:46.01 oracleq9db2 (LOCAL=NO)
19410 oracle    15   0  150g  31m  26m S 11.8  0.0   0:43.18 oracleq9db2 (LOCAL=NO)
14102 oracle    15   0  150g  31m  25m S 11.5  0.0   2:17.23 oracleq9db2 (LOCAL=NO)
 1914 oracle    15   0  150g  29m  24m S 11.1  0.0   0:04.77 oracleq9db2 (LOCAL=NO)
31106 oracle    15   0  150g  34m  28m S  9.8  0.0   1:24.85 oracleq9db2 (LOCAL=NO)
31139 oracle    15   0  150g  30m  24m S  9.8  0.0   1:21.75 oracleq9db2 (LOCAL=NO)
 2498 oracle    15   0  150g  42m  35m S  7.9  0.0   3838:43 ora_lgwr_q9db2
28108 oracle    15   0  150g  36m  29m S  7.9  0.0   0:18.19 oracleq9db2 (LOCAL=NO)
 2392 oracle    15   0  150g  35m  17m S  7.5  0.0   5304:57 ora_lmd0_q9db2

exp dmp文件损坏恢复

在有些时候,exp的dmp文件因为某种原因损坏(比如磁盘异常,exp过程损坏等),导致imp导入无法继续,下面的处理方法(直接读取dmp文件)来对dmp文件进行抢救性恢复,最大程度减少数据丢失损失
创建exp dmp文件并使用dd破坏

SQL> create table t_xifenfei as select * from dba_objects;
Table created.
SQL> select count(*) from t_xifenfei;
  COUNT(*)
----------
     90915
[oracle@localhost ~]$ exp chf/xifenfei@pdb1 file=/tmp/t_xifenfei.dmp tables=t_xifenfei
Export: Release 12.1.0.2.0  on Sun Apr 27 21:39:26 2014
Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.
Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
Export done in US7ASCII character set and AL16UTF16 NCHAR character set
server uses WE8MSWIN1252 character set (possible charset conversion)
About to export specified tables via Conventional Path ...
. . exporting table                     T_XIFENFEI      90915 rows exported
EXP-00091: Exporting questionable statistics.
Export terminated successfully with warnings.
[oracle@localhost ~]$ od -x /tmp/t_xifenfei.dmp |head -10
0000000 0003 4501 5058 524f 3a54 3156 2e32 3130
0000020 302e 0a30 4344 4648 520a 4154 4c42 5345
0000040 380a 3931 0a32 0a30 3237 300a 000a 0001
0000060 07b2 00d0 0001 0000 0000 0000 0000 0013
0000100 2020 2020 2020 2020 2020 2020 2020 2020
*
0000140 2020 2020 2020 2020 7553 206e 7041 2072
0000160 3732 3220 3a31 3933 323a 2036 3032 3431
0000200 742f 706d 742f 785f 6669 6e65 6566 2e69
0000220 6d64 0070 0000 0000 0000 0000 0000 0000
--strings命令看dmp文件
[oracle@localhost ~]$ strings /tmp/t_xifenfei.dmp |head -50
EXPORT:V12.01.00
DCHF
RTABLES
8192
                                         Tue Apr 29 0:39:49 2014/tmp/t_xifenfei.dmp
#G#G
#G#G
+08:00
BYTE
UNUSED
INTERPRETED
DISABLE:ALL
METRICST
TABLE "T_XIFENFEI"
CREATE TABLE "T_XIFENFEI" ("OWNER" VARCHAR2(128), "OBJECT_NAME" VARCHAR2(128), "SUBOBJECT_NAME" VARCHAR2(128), "OBJECT_ID" NUMBER, "DATA_OBJECT_ID" NUMBER, "OBJECT_TYPE" VARCHAR2(23), "CREATED" DATE, "LAST_DDL_TIME" DATE, "TIMESTAMP" VARCHAR2(19), "STATUS" VARCHAR2(7), "TEMPORARY" VARCHAR2(1), "GENERATED" VARCHAR2(1), "SECONDARY" VARCHAR2(1), "NAMESPACE" NUMBER, "EDITION_NAME" VARCHAR2(128), "SHARING" VARCHAR2(13), "EDITIONABLE" VARCHAR2(1), "ORACLE_MAINTAINED" VARCHAR2(1))  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 STORAGE(INITIAL 13631488 NEXT 1048576 MINEXTENTS 1 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "USERS" LOGGING NOCOMPRESS
INSERT INTO "T_XIFENFEI" ("OWNER", "OBJECT_NAME", "SUBOBJECT_NAME", "OBJECT_ID", "DATA_OBJECT_ID", "OBJECT_TYPE", "CREATED", "LAST_DDL_TIME", "TIMESTAMP", "STATUS", "TEMPORARY", "GENERATED", "SECONDARY", "NAMESPACE", "EDITION_NAME", "SHARING", "EDITIONABLE", "ORACLE_MAINTAINED") VALUES (:1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12, :13, :14, :15, :16, :17, :18)
PUBLIC
V$DATAGUARD_CONFIG
SYNONYM
2014-04-22:17:51:05
VALID
METADATA LINK
V_$DATAGUARD_STATS
VIEW
2014-04-22:17:51:05
--破坏exp dmp文件
[oracle@localhost ~]$ dd if=/dev/zero of=/tmp/t_xifenfei.dmp bs=1024 count=1 conv=notrunc
1+0 records in
1+0 records out
1024 bytes (1.0 kB) copied, 6.0291e-05 seconds, 17.0 MB/s
[oracle@localhost ~]$ od -x /tmp/t_xifenfei.dmp |head -10
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0020000 0064 0000 6000 2401 050f 0c0b 0c03 050c
0020020 0504 060d 0709 0508 0505 0505 050f 0505
0020040 0505 050a 0505 0505 0504 0706 0808 4723
0020060 4723 1108 0823 4111 47b0 8300 b200 d007
0020100 0003 0000 0000 0000 0000 0000 0000 0000
0020120 0000 0000 0000 0000 0000 0000 0000 0000
0020140 0000 0000 0000 0064 0000 6000 2401 050f
0020160 0c0b 0c03 050c 0504 060d 0709 0508 0505
--损坏后的dmp文件使用strings命令看
[oracle@localhost ~]$ strings /tmp/t_xifenfei.dmp |head -50
#G#G
#G#G
+08:00
BYTE
UNUSED
INTERPRETED
DISABLE:ALL
METRICST
TABLE "T_XIFENFEI"
CREATE TABLE "T_XIFENFEI" ("OWNER" VARCHAR2(128), "OBJECT_NAME" VARCHAR2(128), "SUBOBJECT_NAME" VARCHAR2(128), "OBJECT_ID" NUMBER, "DATA_OBJECT_ID" NUMBER, "OBJECT_TYPE" VARCHAR2(23), "CREATED" DATE, "LAST_DDL_TIME" DATE, "TIMESTAMP" VARCHAR2(19), "STATUS" VARCHAR2(7), "TEMPORARY" VARCHAR2(1), "GENERATED" VARCHAR2(1), "SECONDARY" VARCHAR2(1), "NAMESPACE" NUMBER, "EDITION_NAME" VARCHAR2(128), "SHARING" VARCHAR2(13), "EDITIONABLE" VARCHAR2(1), "ORACLE_MAINTAINED" VARCHAR2(1))  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 STORAGE(INITIAL 13631488 NEXT 1048576 MINEXTENTS 1 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "USERS" LOGGING NOCOMPRESS
INSERT INTO "T_XIFENFEI" ("OWNER", "OBJECT_NAME", "SUBOBJECT_NAME", "OBJECT_ID", "DATA_OBJECT_ID", "OBJECT_TYPE", "CREATED", "LAST_DDL_TIME", "TIMESTAMP", "STATUS", "TEMPORARY", "GENERATED", "SECONDARY", "NAMESPACE", "EDITION_NAME", "SHARING", "EDITIONABLE", "ORACLE_MAINTAINED") VALUES (:1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12, :13, :14, :15, :16, :17, :18)
PUBLIC
V$DATAGUARD_CONFIG
SYNONYM
2014-04-22:17:51:05
VALID
METADATA LINK
V_$DATAGUARD_STATS
--imp 导入dmp文件失败
[oracle@localhost ~]$ imp chf/xifenfei@pdb1 file=/tmp/t_xifenfei.dmp full=y
Import: Release 12.1.0.2.0 -      on Sun Apr 27 22:02:40 2014
Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.
Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
IMP-00037: Character set marker unknown
IMP-00000: Import terminated unsuccessfully

这里通过分析可以知道,exp dmp文件虽然损坏了一点,但是通过strings命令看,相关记录依然存在,因此可以通过工具去读exp dmp文件,然后分析得出相关数据

恢复损坏exp dmp文件数据

CPFL> SEARCH TABLE T_XIFENFEI FROM EXPFILE  /tmp/t_xifenfei.dmp
8461: TABLE "T_XIFENFEI"
8480: CREATE TABLE "T_XIFENFEI" ("OWNER" VARCHAR2(128), "OBJECT_NAME" VARCHAR2(128), "SUBOBJECT_NAME" VARCHAR2(128), "OBJECT_ID" NUMBER, "DATA_OBJECT_ID" NUMBER, "OBJECT_TYPE" VARCHAR2(23), "CREATED" DATE, "LAST_DDL_TIME" DATE, "TIMESTAMP" VARCHAR2(19), "STATUS" VARCHAR2(7), "TEMPORARY" VARCHAR2(1), "GENERATED" VARCHAR2(1), "SECONDARY" VARCHAR2(1), "NAMESPACE" NUMBER, "EDITION_NAME" VARCHAR2(128), "SHARING" VARCHAR2(13), "EDITIONABLE" VARCHAR2(1), "ORACLE_MAINTAINED" VARCHAR2(1))  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "USERS" LOGGING NOCOMPRESS
9145: INSERT INTO "T_XIFENFEI" ("OWNER", "OBJECT_NAME", "SUBOBJECT_NAME", "OBJECT_ID", "DATA_OBJECT_ID", "OBJECT_TYPE", "CREATED", "LAST_DDL_TIME", "TIMESTAMP", "STATUS", "TEMPORARY", "GENERATED", "SECONDARY", "NAMESPACE", "EDITION_NAME", "SHARING", "EDITIONABLE", "ORACLE_MAINTAINED") VALUES (:1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12, :13, :14, :15, :16, :17, :18)
Conventional export
9644: start of table data
12331252: TABLE "T_XIFENFEI"
12331349: ENDTABLE
CPFL> UNLOAD  TABLE "T_XIFENFEI" ("OWNER" VARCHAR2(128), "OBJECT_NAME" VARCHAR2(128), "SUBOBJECT_NAME" VARCHAR2(128), "OBJECT_ID" NUMBER, "DATA_OBJECT_ID" NUMBER, "OBJECT_TYPE" VARCHAR2(23), "CREATED" DATE, "LAST_DDL_TIME" DATE, "TIMESTAMP" VARCHAR2(19), "STATUS" VARCHAR2(7), "TEMPORARY" VARCHAR2(1), "GENERATED" VARCHAR2(1), "SECONDARY" VARCHAR2(1), "NAMESPACE" NUMBER, "EDITION_NAME" VARCHAR2(128), "SHARING" VARCHAR2(13), "EDITIONABLE" VARCHAR2(1), "ORACLE_MAINTAINED" VARCHAR2(1))  expfile  /tmp/t_xifenfei.dmp from 8480 until 12331349
--因为exp dmp文件损坏记录
CPFL: Error: column 1 length 21059 exceeds max bind size 128
0000000000 45415445 20544142 4c452022 545f5849 EATE  TAB LE " T_XI
0000000016 46454e46 45492220 28224f57 4e455222 FENF EI"  ("OW NER"
0000000032 20564152 43484152                    VAR CHAR
8480: column 1 type VARCHAR2 size 21059 failed
8480: row 1 failed
row conversion failure, retrying from offset 8481
CPFL: Error: Zero (illegal) length column number 2
…………
CPFL: Error: Zero (illegal) length column number 1
9644: succesful conversion      1164 bytes skipped due to conversion problems
131877: row 1000 ok
253310: row 2000 ok
…………
12200617: row 90000 ok
Unloaded 90915 rows, end of table marker at 12322835
[oracle@localhost CPFL]$ ls -ltr T_XIFENFEI.*
-rw-r--r-- 1 oracle oinstall 17230747 Apr 27 22:12 T_XIFENFEI.dat
-rw-r--r-- 1 oracle oinstall     1489 Apr 27 22:17 T_XIFENFEI.ctl

导入数据并对比

SQL> create table t_xifenfei_exp as select * from t_xifenfei where 1=0;
Table created.
[oracle@localhost CPFL]$ more T_XIFENFEI.ctl
load data
CHARACTERSET UTF8
infile 'T_XIFENFEI.dat'
insert
into table "T_XIFENFEI_EXP"  ---修改为T_XIFENFEI_EXP表
fields terminated by whitespace
(
  "OWNER"                            CHAR(128) enclosed by X'7C'
 ,"OBJECT_NAME"                      CHAR(128) enclosed by X'7C'
 ,"SUBOBJECT_NAME"                   CHAR(29) enclosed by X'7C'
 ,"OBJECT_ID"                        CHAR(5) enclosed by X'7C'
 ,"DATA_OBJECT_ID"                   CHAR(5) enclosed by X'7C'
 ,"OBJECT_TYPE"                      CHAR(20) enclosed by X'7C'
 ,"CREATED"                          DATE "DD-MON-YYYY AD HH24:MI:SS" enclosed by X'7C'
 ,"LAST_DDL_TIME"                    DATE "DD-MON-YYYY AD HH24:MI:SS" enclosed by X'7C'
 ,"TIMESTAMP"                        CHAR(19) enclosed by X'7C'
 ,"STATUS"                           CHAR(5) enclosed by X'7C'
 ,"TEMPORARY"                        CHAR(1) enclosed by X'7C'
 ,"GENERATED"                        CHAR(1) enclosed by X'7C'
 ,"SECONDARY"                        CHAR(1) enclosed by X'7C'
 ,"NAMESPACE"                        CHAR(2) enclosed by X'7C'
 ,"EDITION_NAME"                     CHAR(1) enclosed by X'7C'
 ,"SHARING"                          CHAR(13) enclosed by X'7C'
 ,"EDITIONABLE"                      CHAR(1) enclosed by X'7C'
 ,"ORACLE_MAINTAINED"                CHAR(1) enclosed by X'7C'
 ,"UNEXP_STATUS"                     FILLER CHAR(3) enclosed by X'7C'
)
[oracle@localhost CPFL]$ sqlldr chf/xifenfei@pdb1 control=T_XIFENFEI.ctl
SQL*Loader: Release 12.1.0.1.0       on Sun Apr 27 22:17:54 2014
Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.
Path used:      Conventional
Commit point reached - logical record count 64
Commit point reached - logical record count 128
Commit point reached - logical record count 192
…………
Commit point reached - logical record count 90887
Commit point reached - logical record count 90915
Table "T_XIFENFEI_EXP":
  90915 Rows successfully loaded.
Check the log file:
  T_XIFENFEI.log
for more information about the load.
[oracle@localhost CPFL]$ sqlplus chf/xifenfei@pdb1
SQL*Plus: Release 12.1.0.2.0 Beta on Sun Apr 27 22:18:08 2014
Copyright (c) 1982, 2014, Oracle.  All rights reserved.
Last Successful login time: Sun Apr 27 2014 22:17:54 +08:00
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
SQL> select count(*) from t_xifenfei_exp;
  COUNT(*)
----------
     90915
SQL> select * from t_xifenfei
  2  minus
  3  select * from t_xifenfei_exp;
no rows selected

通过这里可以看出来,在exp dmp文件有部分损坏的情况下,还是可以通过直接读取dmp文件的方式恢复全部或者部分exp dmp文件中内容(具体恢复量取决于dmp文件损坏程度)

如果你在使用这些思路进行恢复遇到突发情况不能自行解决,请联系我们(ORACLE数据库恢复技术支持),将为您提供专业数据库技术支持:
Phone:17813235971    Q Q:107644445    E-Mail:dba@xifenfei.com

ORACLE 12C In-Memory功能性能测试

启用In-Memory功能
数据库版本12.1.0.2及其以上版本,inmemory_size参数设置为合适值

SQL> SELECT * FROM V$VERSION;
BANNER                                                                               CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit                         0
PL/SQL Release 12.1.0.2.0 -                                                               0
CORE    12.1.0.2.0                                                                        0
TNS for Linux: Version 12.1.0.2.0 -                                                       0
NLSRTL Version 12.1.0.2.0 -                                                               0
SQL> SHOW PARAMETER inmemory;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
inmemory_clause_default              string
inmemory_force                       string      DEFAULT
inmemory_query                       string      ENABLE
inmemory_size                        big integer 200M

创建表
这里可以知道,创建表大小为13631488,但是未使用In-Memory功能

SQL> create table t_xifenfei_in_memory as select * from dba_objects;
Table created.
SQL> SELECT BYTES FROM USER_SEGMENTS WHERE SEGMENT_NAME='T_XIFENFEI_IN_MEMORY';
     BYTES
----------
  13631488
SQL> select TABLE_NAME,INMEMORY_PRIORITY,INMEMORY_DISTRIBUTE,INMEMORY_COMPRESSION from user_tables;
TABLE_NAME                     INMEMORY INMEMORY_DISTRI INMEMORY_COMPRESS
------------------------------ -------- --------------- -----------------
T_XIFENFEI_IN_MEMORY
SQL>  SELECT * FROM V$INMEMORY_AREA;
POOL                       ALLOC_BYTES USED_BYTES POPULATE_STATUS                CON_ID
-------------------------- ----------- ---------- -------------------------- ----------
1MB POOL                     166723584          0 DONE                                3
64KB POOL                     33554432          0 DONE                                3

未使用In-Memory功能测试

SQL> SET AUTOT TRACE
SQL> SELECT * FROM T_XIFENFEI_IN_MEMORY;
90902 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3598036702
------------------------------------------------------------------------------------------
| Id  | Operation         | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                      | 90902 |     9M|   427   (1)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T_XIFENFEI_IN_MEMORY | 90902 |     9M|   427   (1)| 00:00:01 |
------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
          5  recursive calls
          0  db block gets
       7505  consistent gets
       1527  physical reads
          0  redo size
   12125231  bytes sent via SQL*Net to client
      67212  bytes received via SQL*Net from client
       6062  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
      90902  rows processed

这里可以看到未使用In-Memory功能,数据库查询执行计划使用TABLE ACCESS FULL,consistent gets为7505

使用In-Memory功能测试

SQL>  alter table  T_XIFENFEI_IN_MEMORY inmemory;
Table altered.
SQL> select TABLE_NAME,INMEMORY_PRIORITY,INMEMORY_DISTRIBUTE,INMEMORY_COMPRESSION from user_tables;
TABLE_NAME                     INMEMORY INMEMORY_DISTRI INMEMORY_COMPRESS
------------------------------ -------- --------------- -----------------
T_XIFENFEI_IN_MEMORY           NONE     AUTO DISTRIBUTE FOR QUERY
--因为只是把该表设置了INMEMORY,但是未查询过,所以查询V$INMEMORY_AREA中未使用相关内存
SQL> SELECT * FROM V$INMEMORY_AREA;
POOL                       ALLOC_BYTES USED_BYTES POPULATE_STATUS                CON_ID
-------------------------- ----------- ---------- -------------------------- ----------
1MB POOL                     166723584          0 DONE                                3
64KB POOL                     33554432          0 DONE                                3
--进行一次全表扫描
SQL> SELECT COUNT(*) FROM T_XIFENFEI_IN_MEMORY;
  COUNT(*)
----------
     90902
--再次查看,已经使用了分配的In-Memory中内存
SQL> SELECT * FROM V$INMEMORY_AREA;
POOL                       ALLOC_BYTES USED_BYTES POPULATE_STATUS                CON_ID
-------------------------- ----------- ---------- -------------------------- ----------
1MB POOL                     166723584    4194304 DONE                                3
64KB POOL                     33554432     131072 DONE                                3
SQL> SET AUTOT TRACE
SQL> SELECT * FROM T_XIFENFEI_IN_MEMORY;
90902 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3598036702
---------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                      | 90902 |     9M|    20  (45)| 00:00:01 |
|   1 |  TABLE ACCESS INMEMORY FULL| T_XIFENFEI_IN_MEMORY | 90902 |     9M|    20  (45)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
          3  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
    4946298  bytes sent via SQL*Net to client
      67212  bytes received via SQL*Net from client
       6062  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      90902  rows processed

这里我们可以发现,使用了In-Memory功能之后,数据库consistent gets为4,相比未使用In-Memory之前的7505,性能最少提高近2000倍.