asm disk误设置pvid导致asm diskgroup无法mount恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:asm disk误设置pvid导致asm diskgroup无法mount恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友找到我说他们把以前存储到AIX直连的存储切换为含光纤交换机的存储网络后,RAC无法启动,让我给予支持.通过分析是由于换链路之后开始磁盘顺序不对,维护人员对其asm disk 设置了pvid,导致asm 磁盘组无法正常mount,从而使得含votedisk的dg的asm disk无法正常访问,从而RAC的cssd进程无法启动,同样数据文件的磁盘组也无法mount,通过kfed修复成功,实现数据0丢失.
平台版本信息(2节点RAC)

$ sqlplus -v
SQL*Plus: Release 11.2.0.4.0 Production
$ uname -a
AIX db2 1 7 00F9733E4C00

GI日志报错信息

2014-12-20 16:44:08.769:
[ohasd(6946818)]CRS-2769:Unable to failover resource 'ora.diskmon'.
2014-12-20 16:44:11.775:
[cssd(9502756)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds;
Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log
2014-12-20 16:44:26.791:
[cssd(9502756)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds;
、Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log
2014-12-20 16:44:41.812:
[cssd(9502756)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds;
Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log

从这里可以看出来是由于RAC启动过程中无法获得votedisk使得其无法正常启动,通过分析日志找出来votedisk相关磁盘

2014-12-15 17:36:15.424:
[cssd(10027070)]CRS-1605:CSSD voting file is online: /dev/rhdisk4; details in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log
2014-12-15 17:36:15.433:
[cssd(10027070)]CRS-1605:CSSD voting file is online: /dev/rhdisk5; details in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log
2014-12-15 17:36:15.445:
[cssd(10027070)]CRS-1605:CSSD voting file is online: /dev/rhdisk6; details in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log

从这里可以知道rhdisk4,5,6为votedisk对应磁盘,使用kfed查看磁盘头信息

$kfed read /dev/rhdisk4
kfbh.endian:                        201 ; 0x000: 0xc9
kfbh.hard:                          194 ; 0x001: 0xc2
kfbh.type:                          212 ; 0x002: *** Unknown Enum ***
kfbh.datfmt:                        193 ; 0x003: 0xc1
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
1102BEE00 C9C2D4C1 00000000 00000000 00000000  [................]
1102BEE10 00000000 00000000 00000000 00000000  [................]
        Repeat 6 times
1102BEE80 00F9733D 67553E0A 00000000 00000000  [..s=gU>.........]
1102BEE90 00000000 00000000 00000000 00000000  [................]
  Repeat 246 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][212]
$kfed read /dev/rhdisk4 blkn=1
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            2 ; 0x002: KFBTYP_FREESPC
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                       1 ; 0x004: blk=1
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  3883664132 ; 0x00c: 0xe77c0304
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdfsb.aunum:                         0 ; 0x000: 0x00000000
kfdfsb.max:                         254 ; 0x004: 0x00fe
kfdfsb.cnt:                          23 ; 0x006: 0x0017
kfdfsb.bound:                         0 ; 0x008: 0x0000
kfdfsb.flag:                          1 ; 0x00a: B=1
kfdfsb.ub1spare:                      0 ; 0x00b: 0x00
kfdfsb.spare[0]:                      0 ; 0x00c: 0x00000000
kfdfsb.spare[1]:                      0 ; 0x010: 0x00000000
kfdfsb.spare[2]:                      0 ; 0x014: 0x00000000
kfdfse[0].fse:                      119 ; 0x018: FREE=0x7 FRAG=0x7
kfdfse[1].fse:                       16 ; 0x019: FREE=0x0 FRAG=0x1
…………
$kfed read /dev/rhdisk4 blkn=510
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                     254 ; 0x004: blk=254
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  3460116983 ; 0x00c: 0xce3d31f7
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:         ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]:            0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                186646528 ; 0x020: 0x0b200000
kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        2 ; 0x026: KFDGTP_NORMAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:                CRS_0000 ; 0x028: length=8
kfdhdb.grpname:                     CRS ; 0x048: length=3
kfdhdb.fgname:                 CRS_0000 ; 0x068: length=8
…………

由上述分析可以基本上确定是asm disk header 被破坏,进一步分析破坏原因

[db2/dev#]lspv
hdisk0          00f9733ef7cf27e9                    rootvg          active
hdisk1          00f9733e21b953e6                    rootvg          active
hdisk2          00f9733e21b97a83                    appvg           active
hdisk3          00f9733e21b98434                    appvg           active
hdisk4          00f9733d67553e0a                    None
hdisk5          00f9733d67553f31                    None
hdisk6          00f9733d67554011                    None
hdisk7          00f9733d67554165                    None
hdisk8          00f9733d675541e5                    None
hdisk9          00f9733d675542e4                    None
hdisk10         none                                None
[db2/dev#]ls -l rhdisk*
crw-------    2 root     system       24,  1 Oct 18 11:45 rhdisk0
crw-------    1 root     system       24,  3 Oct 18 13:27 rhdisk1
crw-------    1 root     system       24,  5 Dec 20 20:02 rhdisk10
crw-------    1 root     system       24,  2 Oct 18 13:32 rhdisk2
crw-------    1 root     system       24,  0 Oct 18 13:32 rhdisk3
crw-rw----    1 grid     asmadmin     24,  8 Dec 20 20:02 rhdisk4
crw-rw----    1 grid     asmadmin     24,  9 Dec 20 20:02 rhdisk5
crw-rw----    1 grid     asmadmin     24, 10 Dec 20 20:02 rhdisk6
crw-rw----    1 grid     asmadmin     24,  4 Dec 20 20:02 rhdisk7
crw-rw----    1 grid     asmadmin     24,  6 Dec 20 20:02 rhdisk8
crw-rw----    1 grid     asmadmin     24,  7 Dec 20 20:02 rhdisk9

从这里基本上可以看出来,是由于磁盘头被重写了pvid,导致asm disk header 被破坏.进一步分析asm log,确定哪些磁盘被用作asm disk

SQL> CREATE DISKGROUP CRS NORMAL REDUNDANCY  DISK '/dev/rhdisk4',
'/dev/rhdisk5',
'/dev/rhdisk6' ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='1M' /* ASMCA */
NOTE: Assigning number (1,0) to disk (/dev/rhdisk4)
NOTE: Assigning number (1,1) to disk (/dev/rhdisk5)
NOTE: Assigning number (1,2) to disk (/dev/rhdisk6)
NOTE: initializing header on grp 1 disk CRS_0000
NOTE: initializing header on grp 1 disk CRS_0001
NOTE: initializing header on grp 1 disk CRS_0002
SQL> CREATE DISKGROUP DATA EXTERNAL REDUNDANCY  DISK
'/dev/rhdisk9' SIZE 614400M  ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='1M' /* ASMCA */
NOTE: Assigning number (2,0) to disk (/dev/rhdisk9)
NOTE: initializing header on grp 2 disk DATA_0000
SQL> CREATE DISKGROUP FBA EXTERNAL REDUNDANCY  DISK
'/dev/rhdisk8' SIZE 204800M  ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='1M' /* ASMCA */
NOTE: Assigning number (3,0) to disk (/dev/rhdisk8)
NOTE: initializing header on grp 3 disk FBA_0000
SQL> CREATE DISKGROUP ARCH EXTERNAL REDUNDANCY  DISK
'/dev/rhdisk7' SIZE 102400M  ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='1M' /* ASMCA */
NOTE: Assigning number (4,0) to disk (/dev/rhdisk7)
NOTE: initializing header on grp 4 disk ARCH_0000

这里可以确定asm disk为rhdisk[4-9],通过kfed分析全部和rhdisk4一样的问题,也符合lspv查询出来的结果,使用kfed repair修复asm disk header后

SQL> alter diskgroup data mount;
Diskgroup altered.
SQL> alter diskgroup fba mount;
Diskgroup altered.
SQL> alter diskgroup arch mount;
Diskgroup altered.
SQL> alter diskgroup crs mount;
Diskgroup altered.
SQL> select group_number,disk_number,path from v$asm_disk;
GROUP_NUMBER DISK_NUMBER PATH
------------ ----------- --------------------------------------------------
           2           0 /dev/rhdisk4
           2           1 /dev/rhdisk5
           2           2 /dev/rhdisk6
           1           0 /dev/rhdisk7
           4           0 /dev/rhdisk8
           3           0 /dev/rhdisk9
6 rows selected.
SQL> select group_number,name from v$asm_diskgroup;
GROUP_NUMBER NAME
------------ ------------------------------------------------------------
           1 ARCH
           2 CRS
           3 DATA
           4 FBA

这里证明通过kfed对磁盘头的修复,asm磁盘组已经全部mount成功,GI状态也恢复正常

[db2/#]crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCH.dg
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.CRS.dg
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.DATA.dg
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.FBA.dg
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.LISTENER.lsnr
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.asm
               ONLINE  ONLINE       db1                      Started
               ONLINE  ONLINE       db2                      Started
ora.gsd
               OFFLINE OFFLINE      db1
               OFFLINE OFFLINE      db2
ora.net1.network
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.ons
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.registry.acfs
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       db1
ora.cvu
      1        ONLINE  ONLINE       db1
ora.db1.vip
      1        ONLINE  ONLINE       db1
ora.db2.vip
      1        ONLINE  ONLINE       db2
ora.nkora.db
      1        ONLINE  ONLINE       db1                      Open
      2        ONLINE  ONLINE       db2                      Open
ora.oc4j
      1        ONLINE  ONLINE       db1
ora.scan1.vip
      1        ONLINE  ONLINE       db1

这里忽略了一个问题,在修复磁盘头之前没有清除pvid,导致磁盘头修复后,pvid依然存储在odm中

[db2/dev#]lspv
hdisk0          00f9733ef7cf27e9                    rootvg          active
hdisk1          00f9733e21b953e6                    rootvg          active
hdisk2          00f9733e21b97a83                    appvg           active
hdisk3          00f9733e21b98434                    appvg           active
hdisk4          00f9733d67553e0a                    None
hdisk5          00f9733d67553f31                    None
hdisk6          00f9733d67554011                    None
hdisk7          00f9733d67554165                    None
hdisk8          00f9733d675541e5                    None
hdisk9          00f9733d675542e4                    None
hdisk10         none                                None

通过分析发现fba磁盘组中无任何记录,使用该磁盘组进行直接清除pvid测试

$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.4.0 Production on Sun Dec 21 03:13:31 2014
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> alter diskgroup fba dismount;
Diskgroup altered.
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
$ exit
You have mail in /usr/spool/mail/root
[db2/#]chdev -l hdisk8 -a pv=clear
hdisk8 changed
[db2/#]lspv
hdisk0          00f9733ef7cf27e9                    rootvg          active
hdisk1          00f9733e21b953e6                    rootvg          active
hdisk2          00f9733e21b97a83                    appvg           active
hdisk3          00f9733e21b98434                    appvg           active
hdisk4          00f9733d67553e0a                    None
hdisk5          00f9733d67553f31                    None
hdisk6          00f9733d67554011                    None
hdisk7          00f9733d67554165                    None
hdisk8          none                                None
hdisk9          00f9733d675542e4                    None
hdisk10         none                                None
[db2/#]su - grid
$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.4.0 Production on Sun Dec 21 03:15:19 2014
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> alter diskgroup fba mount;
Diskgroup altered.
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

通过测试直接清除pvid asm 磁盘头依然工作正常,关闭GI,使用chdev清除hdisk[4-9]所有pvid,启动GI一切正常

[db1/#]crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCH.dg
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.CRS.dg
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.DATA.dg
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.FBA.dg
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.LISTENER.lsnr
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.asm
               ONLINE  ONLINE       db1                      Started
               ONLINE  ONLINE       db2                      Started
ora.gsd
               OFFLINE OFFLINE      db1
               OFFLINE OFFLINE      db2
ora.net1.network
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.ons
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
ora.registry.acfs
               ONLINE  ONLINE       db1
               ONLINE  ONLINE       db2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       db1
ora.cvu
      1        ONLINE  ONLINE       db1
ora.db1.vip
      1        ONLINE  ONLINE       db1
ora.db2.vip
      1        ONLINE  ONLINE       db2
ora.nkora.db
      1        ONLINE  ONLINE       db1                      Open
      2        ONLINE  ONLINE       db2                      Open
ora.oc4j
      1        ONLINE  ONLINE       db1
ora.scan1.vip
      1        ONLINE  ONLINE       db1
[db1/#]lspv
hdisk0          00f9733df7c7a9db                    rootvg          active
hdisk1          00f9733d21dad8fe                    rootvg          active
hdisk2          00f9733d21dbd08b                    appvg           active
hdisk3          00f9733d21dbd2ab                    appvg           active
hdisk4          none                                None
hdisk5          none                                None
hdisk6          none                                None
hdisk7          none                                None
hdisk8          none                                None
hdisk9          none                                None
hdisk10         none                                None

至此设置pvid导致asm disk header损坏的asm 恢复正常,实现数据0丢失。
温馨提示:aix asm disk磁盘中不能设置pvid,否则将会导致asm disk header 损坏,无法正常mount
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

SYSTEM表空间坏块恢复—C_TS#对象坏块恢复(file 1 block 60)

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:SYSTEM表空间坏块恢复—C_TS#对象坏块恢复(file 1 block 60)

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一朋友给我电话,说他们客户公司数据库故障,被另外一家公司恢复了一天不能正常恢复,请求我协助解决.接手一看数据库已经被破坏的不像样子了,根据alert日志信息大概分析了故障原因和上家公司处理情况。后面接手后通过bbed修复block数据库恢复过程,在本次恢复中出现大量ORA-600错误,主要包括ORA-00600 400,ORA-00600 2662,ORA-00600 2663,ORA-00600 krhpfh_03-1209,ORA-00600 3600,ORA-00600 ktsitbs_info1,ORA-00600 4137,ORA-00600 4511,ORA-00600 4198,ORA-00600 6807等
故障原因redo文件丢失

Thu Nov 20 11:28:39 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_lgwr_1404.trc:
ORA-00313: open failed for members of log group 7 of thread 1
ORA-00312: online log 9 thread 1: '/data2/oradata/redo0902.log'
ORA-27037: unable to obtain file status
SVR4 Error: 2: No such file or directory
Additional information: 3
Thu Nov 20 11:28:39 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_lgwr_1404.trc:
ORA-00313: open failed for members of log group 7 of thread 1
ORA-00312: online log 9 thread 1: '/data2/oradata/redo0902.log'
ORA-27037: unable to obtain file status
SVR4 Error: 2: No such file or directory
Additional information: 3
Thu Nov 20 11:28:39 2014
LGWR: terminating instance due to error 313
Thu Nov 20 11:28:39 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_pmon_1394.trc:
ORA-00313: open failed for members of log group  of thread
Thu Nov 20 11:28:39 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_ckpt_1406.trc:
ORA-00313: open failed for members of log group  of thread
Instance terminated by LGWR, pid = 1404

尝试clear redo文件方式恢复

Thu Nov 20 13:04:16 2014
alter database clear logfile group 9
Thu Nov 20 13:04:16 2014
ORA-1624 signalled during: alter database clear logfile group 9...
Thu Nov 20 13:04:45 2014
alter database clear logfile group 9
Thu Nov 20 13:04:46 2014
ORA-1624 signalled during: alter database clear logfile group 9...
Thu Nov 20 13:04:59 2014
alter database clear unarchived logfile group 9
Thu Nov 20 13:04:59 2014
ORA-1624 signalled during: alter database clear unarchived logfile group 9...
Thu Nov 20 13:05:00 2014
alter database clear unarchived logfile group 9
Thu Nov 20 13:05:00 2014
ORA-1624 signalled during: alter database clear unarchived logfile group 9...

不完全恢复resetlogs尝试打开数据库

ORA-279 signalled during: ALTER DATABASE RECOVER  database using backup controlfile  ...
Thu Nov 20 13:49:01 2014
ALTER DATABASE RECOVER    CONTINUE DEFAULT
Thu Nov 20 13:49:02 2014
Media Recovery Log /opt/oracle/flash_recovery_area/xifenfei/archivelog/2014_11_20/o1_mf_1_285999_%u_.arc
Errors with log /opt/oracle/flash_recovery_area/xifenfei/archivelog/2014_11_20/o1_mf_1_285999_%u_.arc
ORA-308 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
Thu Nov 20 13:49:02 2014
ALTER DATABASE RECOVER    CONTINUE DEFAULT
Thu Nov 20 13:49:02 2014
Media Recovery Log /opt/oracle/flash_recovery_area/xifenfei/archivelog/2014_11_20/o1_mf_1_285999_%u_.arc
Errors with log /opt/oracle/flash_recovery_area/xifenfei/archivelog/2014_11_20/o1_mf_1_285999_%u_.arc
ORA-308 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
Thu Nov 20 13:49:02 2014
ALTER DATABASE RECOVER CANCEL
Thu Nov 20 13:49:03 2014
Media Recovery Canceled
Completed: ALTER DATABASE RECOVER CANCEL
Thu Nov 20 13:49:33 2014
alter database open resetlogs
Thu Nov 20 13:49:34 2014
ORA-1113 signalled during: alter database open resetlogs...

使用隐含参数

_allow_resetlogs_corruption= TRUE

进行不完全恢复,尝试open数据库报ORA-600 4000错误

Thu Nov 20 14:35:02 2014
ALTER DATABASE   MOUNT
Thu Nov 20 14:35:07 2014
Setting recovery target incarnation to 2
Thu Nov 20 14:35:07 2014
Successful mount of redo thread 1, with mount id 4039504598
Thu Nov 20 14:35:07 2014
Database mounted in Exclusive Mode
Completed: ALTER DATABASE   MOUNT
Thu Nov 20 14:40:33 2014
ALTER DATABASE RECOVER  database until cancel
Thu Nov 20 14:40:33 2014
Media Recovery Start
Thu Nov 20 14:40:33 2014
Media Recovery failed with error 1610
ORA-283 signalled during: ALTER DATABASE RECOVER  database until cancel  ...
Thu Nov 20 14:41:23 2014
ALTER DATABASE RECOVER  database using backup controlfile until cancel
Thu Nov 20 14:43:08 2014
alter database open resetlogs
Thu Nov 20 14:43:08 2014
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 31293973571
Resetting resetlogs activation ID 3855216310 (0xe5c9eeb6)
Online log /data2/oradata/redo0802.log: Thread 1 Group 8 was previously cleared
Online log /data2/oradata/redo0902.log: Thread 1 Group 9 was previously cleared
Thu Nov 20 14:43:14 2014
Setting recovery target incarnation to 3
Thu Nov 20 14:43:14 2014
Assigning activation ID 4039504598 (0xf0c5f2d6)
Thread 1 opened at log sequence 1
  Current log# 9 seq# 1 mem# 0: /data2/oradata/redo0902.log
Successful open of redo thread 1
Thu Nov 20 14:43:14 2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Nov 20 14:43:14 2014
SMON: enabling cache recovery
Thu Nov 20 14:43:14 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_1844.trc:
ORA-00600: internal error code, arguments: [4000], [17], [], [], [], [], [], []
Thu Nov 20 14:43:16 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_1844.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [17], [], [], [], [], [], []
Thu Nov 20 14:43:16 2014
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 1844
ORA-1092 signalled during: alter database open resetlogs...

尝试隐含屏蔽回滚段

_corrupted_rollback_segments= _SYSSMU1$, _SYSSMU2$,…………

错误依旧ORA-600 4000

Thu Nov 20 15:09:21 2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Nov 20 15:09:21 2014
SMON: enabling cache recovery
Thu Nov 20 15:09:21 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_624.trc:
ORA-00600: internal error code, arguments: [4000], [17], [], [], [], [], [], []
Thu Nov 20 15:09:23 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_624.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [17], [], [], [], [], [], []
Thu Nov 20 15:09:23 2014
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 624
ORA-1092 signalled during: alter database open

多次重启,resetlogs后,数据库出现ORA-600 2662错误

Successful open of redo thread 1
Thu Nov 20 17:13:24 2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Nov 20 17:13:24 2014
SMON: enabling cache recovery
Thu Nov 20 17:13:24 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_7967.trc:
ORA-00600: internal error code, arguments: [2662], [7], [1229382552], [7], [1229560642], [8388633], [], []
Thu Nov 20 17:13:25 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_7967.trc:
ORA-00600: internal error code, arguments: [2662], [7], [1229382552], [7], [1229560642], [8388633], [], []
Thu Nov 20 17:13:25 2014
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 7967
ORA-1092 signalled during: ALTER DATABASE OPEN...
Thu Nov 20 17:18:23 2014
USER: terminating instance due to error 1092
Instance terminated by USER, pid = 7967

offline undo相关文件,尝试打开数据库

Database mounted in Exclusive Mode
Completed: ALTER DATABASE   MOUNT
Thu Nov 20 17:52:31 2014
ALTER DATABASE RECOVER  database until cancel
Thu Nov 20 17:52:31 2014
Media Recovery Start
 parallel recovery started with 15 processes
ORA-279 signalled during: ALTER DATABASE RECOVER  database until cancel  ...
Thu Nov 20 17:53:42 2014
ALTER DATABASE RECOVER CANCEL
Thu Nov 20 17:53:44 2014
ORA-1547 signalled during: ALTER DATABASE RECOVER CANCEL ...
Thu Nov 20 17:56:34 2014
alter database datafile '/opt/oracle/oradata/xifenfei/undotbs01.dbf' offline
Thu Nov 20 17:56:35 2014
Completed: alter database datafile '/opt/oracle/oradata/xifenfei/undotbs01.dbf' offline
Thu Nov 20 17:57:01 2014
alter database datafile '/data2/oradata/undotbs02.dbf' offline
Thu Nov 20 17:57:02 2014
Completed: alter database datafile '/data2/oradata/undotbs02.dbf' offline
Thu Nov 20 17:57:26 2014
alter database datafile '/data2/oradata/undotbs03.dbf' offline
Thu Nov 20 17:57:27 2014
Completed: alter database datafile '/data2/oradata/undotbs03.dbf' offline
Thu Nov 20 17:57:43 2014
alter database open resetlogs
Thu Nov 20 17:57:43 2014
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
ORA-1245 signalled during: alter database open resetlogs...
Thu Nov 20 17:58:58 2014
alter database datafile '/opt/oracle/oradata/xifenfei/undotbs01.dbf' offline drop
Thu Nov 20 17:58:58 2014
Completed: alter database datafile '/opt/oracle/oradata/xifenfei/undotbs01.dbf' offline drop
Thu Nov 20 17:59:15 2014
alter database open resetlogs
Thu Nov 20 17:59:15 2014
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
ORA-1245 signalled during: alter database open resetlogs...
Thu Nov 20 17:59:35 2014
alter database datafile '/data2/oradata/undotbs02.dbf' offline drop
Thu Nov 20 17:59:35 2014
Completed: alter database datafile '/data2/oradata/undotbs02.dbf' offline drop
Thu Nov 20 17:59:50 2014
alter database datafile '/data2/oradata/undotbs03.dbf' offline drop
Thu Nov 20 17:59:50 2014
Completed: alter database datafile '/data2/oradata/undotbs03.dbf' offline drop
Thu Nov 20 18:00:07 2014
alter database open resetlogs
Thu Nov 20 18:00:07 2014
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 31294173628
Resetting resetlogs activation ID 4039492628 (0xf0c5c414)
Online log /data2/oradata/redo0802.log: Thread 1 Group 8 was previously cleared
Thu Nov 20 18:00:14 2014
Setting recovery target incarnation to 8
Thu Nov 20 18:00:14 2014
Assigning activation ID 4039504142 (0xf0c5f10e)
Thread 1 opened at log sequence 1
  Current log# 9 seq# 1 mem# 0: /data2/oradata/redo0902.log
Successful open of redo thread 1
Thu Nov 20 18:00:15 2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Nov 20 18:00:15 2014
SMON: enabling cache recovery
Thu Nov 20 18:00:15 2014
Successfully onlined Undo Tablespace 1.
Dictionary check beginning
File #2 is offline, but is part of an online tablespace.
data file 2: '/opt/oracle/oradata/xifenfei/undotbs01.dbf'
File #100 is offline, but is part of an online tablespace.
data file 100: '/data2/oradata/undotbs02.dbf'
Thu Nov 20 18:00:28 2014
File #185 is offline, but is part of an online tablespace.
data file 185: '/data2/oradata/undotbs03.dbf'
Dictionary check complete
Thu Nov 20 18:00:35 2014
SMON: enabling tx recovery
Thu Nov 20 18:00:36 2014
Database Characterset is ZHS16CGB231280
Thu Nov 20 18:00:37 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_28472.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 185 cannot be read at this time
ORA-01110: data file 185: '/data2/oradata/undotbs03.dbf'
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Thu Nov 20 18:00:37 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_lgwr_28450.trc:
ORA-00604: error occurred at recursive SQL level
Thu Nov 20 18:00:37 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_dbw0_28446.trc:
ORA-00604: error occurred at recursive SQL level
Instance terminated by USER, pid = 28472
ORA-1092 signalled during: alter database open resetlogs...

不知道做了什么操作出现file 1 block 60坏块,很可能bbed修改错误导致

Thu Nov 20 19:18:15 2014
SMON: enabling cache recovery
Thu Nov 20 19:18:16 2014
Hex dump of (file 1, block 60) in trace file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_13232.trc
Corrupt block relative dba: 0x0040003c (file 1, block 60)
Bad header found during buffer read
Data in bad block:
 type: 128 format: 0 rdba: 0x0040003c
 last change scn: 0x0005.ebe04bc9 seq: 0x2 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x4bc90602
 check value in block header: 0x6faa
 computed block checksum: 0x0
Reread of rdba: 0x0040003c (file 1, block 60) found same corrupted data
Successfully onlined Undo Tablespace 1.
Thu Nov 20 19:18:16 2014
SMON: enabling tx recovery
Thu Nov 20 19:18:17 2014
Database Characterset is ZHS16CGB231280
Thu Nov 20 19:18:17 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_13232.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 185 cannot be read at this time
ORA-01110: data file 185: '/data2/oradata/undotbs03.dbf'
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 13232
ORA-1092 signalled during: alter database open...

尝试不完全恢复,并resetlogs操作

ALTER DATABASE RECOVER  database until cancel
Thu Nov 20 19:33:41 2014
Media Recovery Start
Datafile 2 is on orphaned branch
          File status = 4
        Abs fuzzy SCN = 0
 Hot backup fuzzy SCN = 0
Thu Nov 20 19:33:41 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_20878.trc:
ORA-00600: internal error code, arguments: [krhpfh_03-1209], [2], [864151207], [864153315], [1229402557], [7], [0], [0]
ORA-01110: data file 2: '/opt/oracle/oradata/xifenfei/undotbs01.dbf'
Thu Nov 20 19:33:42 2014
Media Recovery failed with error 600
ORA-283 signalled during: ALTER DATABASE RECOVER  database until cancel  ...
Thu Nov 20 19:34:06 2014
alter database open resetlogs
Thu Nov 20 19:34:06 2014
ORA-1139 signalled during: alter database open resetlogs...
Thu Nov 20 19:34:17 2014
alter database open
Thu Nov 20 19:34:17 2014
ORA-1190 signalled during: alter database open...
Thu Nov 20 19:35:57 2014
ALTER DATABASE RECOVER  database until cancel
Thu Nov 20 19:35:57 2014
Media Recovery Start
Datafile 2 is on orphaned branch
          File status = 4
        Abs fuzzy SCN = 0
 Hot backup fuzzy SCN = 0
Thu Nov 20 19:35:58 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_20878.trc:
ORA-00600: internal error code, arguments: [krhpfh_03-1209], [2], [864151207], [864153315], [1229402557], [7], [0], [0]
ORA-01110: data file 2: '/opt/oracle/oradata/xifenfei/undotbs01.dbf'
Thu Nov 20 19:35:59 2014
Media Recovery failed with error 600
ORA-283 signalled during: ALTER DATABASE RECOVER  database until cancel  ...
Thu Nov 20 19:37:19 2014
alter database open resetlogs
Thu Nov 20 19:37:19 2014
ORA-1139 signalled during: alter database open resetlogs...

继续打开报 ORA-600 3600错误

Thu Nov 20 19:43:14 2014
alter database datafile '/opt/oracle/oradata/xifenfei/undotbs01.dbf' offline drop
Thu Nov 20 19:43:14 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_dbw0_20856.trc:
ORA-00600: internal error code, arguments: [3600], [2], [14], [], [], [], [], []
Thu Nov 20 19:43:15 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_dbw0_20856.trc:
ORA-00600: internal error code, arguments: [3600], [2], [14], [], [], [], [], []
Thu Nov 20 19:43:15 2014
DBW0: terminating instance due to error 471
Instance terminated by DBW0, pid = 20856

<strong>中间多次重启和resetlogs,还出现ORA-600 2663错误</strong>

Fri Nov 21 12:35:12 2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Nov 21 12:35:12 2014
SMON: enabling cache recovery
Fri Nov 21 12:35:13 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_15596.trc:
ORA-00600: internal error code, arguments: [2663], [7], [1229543007], [7], [1229560642], [], [], []
Fri Nov 21 12:35:14 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_15596.trc:
ORA-00600: internal error code, arguments: [2663], [7], [1229543007], [7], [1229560642], [], [], []
Fri Nov 21 12:35:14 2014
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Fri Nov 21 12:35:14 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_mman_15572.trc:
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], []
Fri Nov 21 12:35:14 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_dbw1_15576.trc:
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], []
Instance terminated by USER, pid = 15596
ORA-1092 signalled during: ALTER DATABASE OPEN..

继续尝试打开数据库出现ORA-600 ktsitbs_info1错误

SMON: enabling cache recovery
Fri Nov 21 13:54:25 2014
Hex dump of (file 1, block 60) in trace file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_21111.trc
Corrupt block relative dba: 0x0040003c (file 1, block 60)
Bad header found during buffer read
Data in bad block:
 type: 128 format: 0 rdba: 0x0040003c
 last change scn: 0x0005.ebe04bc9 seq: 0x2 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x4bc90602
 check value in block header: 0x6faa
 computed block checksum: 0x0
Reread of rdba: 0x0040003c (file 1, block 60) found same corrupted data
Fri Nov 21 13:54:25 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_21111.trc:
ORA-00600: internal error code, arguments: [ktsitbs_info1], [2], [], [], [], [], [], []
Fri Nov 21 13:54:27 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_21111.trc:
ORA-00600: internal error code, arguments: [ktsitbs_info1], [2], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 21111
ORA-1092 signalled during: alter database open...

以上是客户数据库故障原因和问题大概的处理过程,下面是我接手后的处理过程


dbv 检查system01.dbf文件,得到结果

HNDX-DB% dbv file=/opt/oracle/oradata/xifenfei/system01.dbf
DBVERIFY: Release 10.2.0.1.0 - Production on Fri Nov 21 16:22:37 2014
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
DBVERIFY - Verification starting : FILE = /opt/oracle/oradata/xifenfei/system01.dbf
Page 60 is marked corrupt
Corrupt block relative dba: 0x0040003c (file 1, block 60)
Bad header found during dbv:
Data in bad block:
 type: 128 format: 0 rdba: 0x0040003c
 last change scn: 0x0005.ebe04bc9 seq: 0x2 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x4bc90602
 check value in block header: 0x6faa
 computed block checksum: 0x0
Corrupt block relative dba: 0x004001f2 (file 1, block 498)
Bad check value found during buffer read
Data in bad block:
 type: 6 format: 2 rdba: 0x004001f2
 last change scn: 0x0007.49499ca1 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x9ca10601
 check value in block header: 0xe458
 computed block checksum: 0x9720
DBVERIFY - Verification complete
Total Pages Examined         : 786432
Total Pages Processed (Data) : 201131
Total Pages Failing   (Data) : 2
Total Pages Processed (Index): 221394
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 60265
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 303641
Total Pages Marked Corrupt   : 2
Total Pages Influx           : 0
Highest block SCN            : 1229823477 (7.1229823477)

这里知道数据库有两个坏块,而且根据对于bootstrap$的经验,可以大概确定60坏块很可能是C_TS#,第一反应type异常,498可能是seq$

对数据库启动过程做10046,得到trace文件

PARSING IN CURSOR #1 len=275 dep=2 uid=0 oct=3 lid=0 tim=27978051403575 hv=3408408745 ad='7df93cd0'
select name,online$,contents$,undofile#,undoblock#,blocksize,dflmaxext,dflinit,dflincr,dflextpct,dflminext,
dflminlen, owner#,scnwrp,scnbas, NVL(pitrscnwrp, 0), NVL(pitrscnbas, 0), dflogging, bitmapped, inc#, flags,
plugged, NVL(spare1,0), NVL(spare2,0) from ts$ where ts#=:1
END OF STMT
PARSE #1:c=0,e=92,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,tim=27978051403569
BINDS #1:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=ffffffff7dbac9a8  bln=22  avl=02  flg=05
  value=2
EXEC #1:c=0,e=310,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,tim=27978051404296
WAIT #1: nam='db file sequential read' ela= 42 file#=1 block#=60 blocks=1 obj#=-1 tim=27978051404449
Hex dump of (file 1, block 60)
Corrupt block relative dba: 0x0040003c (file 1, block 60)
Bad header found during buffer read
Data in bad block:
 type: 128 format: 0 rdba: 0x0040003c
 last change scn: 0x0005.ebe04bc9 seq: 0x2 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x4bc90602
 check value in block header: 0x6faa
 computed block checksum: 0x0
Reread of rdba: 0x0040003c (file 1, block 60) found same corrupted data
FETCH #1:c=10000,e=4072,p=1,cr=2,cu=0,mis=0,r=0,dep=2,og=4,tim=27978051408438
STAT #1 id=1 cnt=0 pid=0 pos=1 obj=16 op='TABLE ACCESS CLUSTER TS$ (cr=2 pr=1 pw=0 time=4075 us)'
STAT #1 id=2 cnt=1 pid=1 pos=1 obj=7 op='INDEX UNIQUE SCAN I_TS# (cr=1 pr=0 pw=0 time=13 us)'
*** 2014-11-22 14:44:43.235
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [ktsitbs_info1], [2], [], [], [], [], [], []
Current SQL statement for this session:
select max(maxconcurrency) from sys.wrh$_undostat  where instance_number = :1 and dbid = :2
and snap_id in   (select snap_id from dba_hist_snapshot where end_interval_time >
(select max(end_interval_time)-7 from dba_hist_snapshot))

这里显示了数据库启动报ORA-00600[ktsitbs_info1],[2],明显的表示了b中的2是表示表空间号,由于ts$坏块,无法读取ts$中表空间信息,从而出现数据字典不一致,从而出现该错误。所以恢复该库的关键是修复file 1 block 60.

bbed尝试修复file 1 block 60

HNDX-DB% bbed password=blockedit mode=edit
BBED: Release 2.0.0.0.0 - Limited Production on Sat Nov 22 15:16:26 2014
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
************* !!! For Oracle Internal Use only !!! ***************
BBED> set filename '/opt/oracle/oradata/xifenfei/system01.dbf'
        FILENAME        /opt/oracle/oradata/xifenfei/system01.dbf
BBED> set block 8192
        BLOCK#          8192
BBED> set block 60
        BLOCK#          60
BBED> set count 64
        COUNT           64
BBED> map
 File: /opt/oracle/oradata/xifenfei/system01.dbf (0)
 Block: 60                                    Dba:0x00000000
------------------------------------------------------------
BBED-00400: invalid blocktype (128)
BBED> set block 61
        BLOCK#          61
BBED> map
 File: /opt/oracle/oradata/xifenfei/system01.dbf (0)
 Block: 61                                    Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Table/Cluster)
 struct kcbh, 20 bytes                      @0
 struct ktbbh, 72 bytes                     @20
 struct kdbh, 14 bytes                      @92
 struct kdbt[3], 12 bytes                   @106
 sb2 kdbr[2]                                @118
 ub1 freespace[7959]                        @122
 ub1 rowdata[107]                           @8081
 ub4 tailchk                                @8188
BBED> p kcbh
struct kcbh, 20 bytes                       @0
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub1 spare1_kcbh                          @2        0x00
   ub1 spare2_kcbh                          @3        0x00
   ub4 rdba_kcbh                            @4        0x0040003d
   ub4 bas_kcbh                             @8        0x0000235b
   ub2 wrp_kcbh                             @12       0x0000
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x04 (KCBHFCKV)
   ub2 chkval_kcbh                          @16       0x7a85
   ub2 spare3_kcbh                          @18       0x0000
BBED> set block 60
        BLOCK#          60
BBED> d
 File: /opt/oracle/oradata/xifenfei/system01.dbf (0)
 Block: 60               Offsets:    0 to   63           Dba:0x00000000
------------------------------------------------------------------------
 80000000 0040003c ebe04bc9 00050204 6faa0000 01000000 00000006 29b3a204
 00040ca0 00020200 00000000 000a0000 00000002 0080009b 00000100 80000000
 <32 bytes per line>
BBED> d block 61
 File: /opt/oracle/oradata/xifenfei/system01.dbf (0)
 Block: 61               Offsets:    0 to   63           Dba:0x00000000
------------------------------------------------------------------------
 06a20000 0040003d 0000235b 00000104 7a850000 01000000 00000006 00001837
 00001738 00020200 00000000 0007002e 00000002 00800075 00012300 80000000
 <32 bytes per line>
BBED> set block 60
        BLOCK#          60
BBED> m /x 06a2
 File: /opt/oracle/oradata/xifenfei/system01.dbf (0)
 Block: 60               Offsets:    0 to   63           Dba:0x00000000
------------------------------------------------------------------------
 06a20000 0040003c ebe04bc9 00050204 6faa0000 01000000 00000006 29b3a204
 00040ca0 00020200 00000000 000a0000 00000002 0080009b 00000100 80000000
 <32 bytes per line>
BBED> map
 File: /opt/oracle/oradata/xifenfei/system01.dbf (0)
 Block: 60                                    Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Table/Cluster)
 struct kcbh, 20 bytes                      @0
 struct ktbbh, 72 bytes                     @20
 struct kdbh, 14 bytes                      @92
 struct kdbt[3], 12 bytes                   @106
 sb2 kdbr[2]                                @118
 ub1 freespace[7598]                        @122
 ub1 rowdata[468]                           @7720
 ub4 tailchk                                @8188
BBED> sum apply
Check value for File 0, Block 60:
current = 0xe908, required = 0xe908
BBED> verify
DBVERIFY - Verification starting
FILE = /opt/oracle/oradata/xifenfei/system01.dbf
BLOCK = 60
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 1
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0
BBED>

尝试启动数据库

Sat Nov 22 15:51:33 2014
alter database open
Sat Nov 22 15:51:34 2014
Thread 1 opened at log sequence 7
  Current log# 8 seq# 7 mem# 0: /data2/oradata/redo0802.log
Successful open of redo thread 1
Sat Nov 22 15:51:34 2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Nov 22 15:51:34 2014
SMON: enabling cache recovery
SMON: enabling tx recovery
Sat Nov 22 15:51:34 2014
Database Characterset is ZHS16CGB231280
Hex dump of (file 1, block 498) in trace file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_2818.trc
Corrupt block relative dba: 0x004001f2 (file 1, block 498)
Bad check value found during buffer read
Data in bad block:
 type: 6 format: 2 rdba: 0x004001f2
 last change scn: 0x0007.49499ca1 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x9ca10601
 check value in block header: 0xe458
 computed block checksum: 0x9720
Reread of rdba: 0x004001f2 (file 1, block 498) found same corrupted data
Sat Nov 22 15:51:35 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_smon_2803.trc:
ORA-00600: internal error code, arguments: [4000], [12], [], [], [], [], [], []
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=18, OS id=3000
Sat Nov 22 15:51:36 2014
Completed: alter database open
Sat Nov 22 15:51:36 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_3010.trc:
ORA-00600: internal error code, arguments: [6807], [AUDSES$], [144], [], [], [], [], []
Sat Nov 22 15:51:37 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_mmon_2809.trc:
ORA-00600: internal error code, arguments: [6807], [WRI$_ALERT_SEQUENCE], [8783], [], [], [], [], []
Sat Nov 22 15:51:37 2014
Non-fatal internal error happenned while SMON was doing non-existent object cleanup.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Sat Nov 22 15:51:38 2014
ORA-600 encountered when generating server alert SMG-3000
Sat Nov 22 15:51:38 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_mmon_2809.trc:
ORA-00600: internal error code, arguments: [ktcpoptx_0], [0x772705E60], [], [], [], [], [], []

只要出现ORA-600 4000和ORA-600 6807错误,其中ORA-600 6807错误比较明显是由于seq$坏块,导致AUDSES$ seq异常导致。ORA-600 4000应该是回滚段异常,继续分析回滚段

SQL> select name,ts#,status$ from undo$;
NAME                                  TS#    STATUS$
------------------------------ ---------- ----------
SYSTEM                                  0          2
_SYSSMU1$                               1          2
_SYSSMU2$                               1          2
_SYSSMU3$                               1          2
…………
_SYSSMU168$                             1          2
_SYSSMU169$                             1          2

这里很异常,system回滚段在数据库open之后,按照常理不可能处于STATUS$=2(OFFLINE)状态。而且其他回滚段全部为OFFLINE状态也属于异常情况.而且尝试drop undo报ORA-01561,另外在dba_rollback_segs中无SYSTEM(查询结果忘记保存)

SQL> drop tablespace undotbs1 including contents;
drop tablespace undotbs1 including contents
*
ERROR at line 1:
ORA-01561: failed to remove all objects in the tablespace specified

通过这一系列很怀疑是由于bbed 修改了undo$等相关基表信息导致现在system中的undo信息混乱.信息反馈给客户后,客户想起来昨天给他们恢复的公司在bbed操作前备份了system01.dbf.突然感觉救星来了.实在怕不懂bbed的人折腾bbed

dbv检测备份文件

DBVERIFY - Verification starting : FILE = /data3/backup/system01.dbf_bak
Page 60 is marked corrupt
Corrupt block relative dba: 0x0040003c (file 1, block 60)
Bad header found during dbv:
Data in bad block:
 type: 128 format: 0 rdba: 0x0040003c
 last change scn: 0x0005.ebe04bc9 seq: 0x2 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x4bc90602
 check value in block header: 0x6faa
 computed block checksum: 0x0
Block Checking: DBA = 4194802, Block Type = KTB-managed data block
data header at 0x1002ef05c
kdbchk: row locked by non-existent transaction
        table=0   slot=4
        lockid=1   ktbbhitc=2
Page 498 failed with check code 6101
DBVERIFY - Verification complete
Total Pages Examined         : 786432
Total Pages Processed (Data) : 201131
Total Pages Failing   (Data) : 1
Total Pages Processed (Index): 221394
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 60265
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 303641
Total Pages Marked Corrupt   : 1
Total Pages Influx           : 0
Highest block SCN            : 1229823477 (7.1229823477)

好家伙只有一个物理坏块和一个逻辑坏块,而对于物理坏块block 60已经知道如何修复,逻辑坏块可以尝试设置隐含参数跳过去,bbed修改相关block(同上步骤)

再次启动数据库

dd if=/opt/oracle/oradata/xifenfei/system01.dbf bs=8192 count=2 of=/tmp/system01.2
dd if=/tmp/system01.2 of=/data3/backup/system01.dbf_bak bs=8192 count=2 conv=notrunc
Sat Nov 22 17:52:50 2014
Database mounted in Exclusive Mode
Completed: ALTER DATABASE   MOUNT
Sat Nov 22 17:53:38 2014
alter database rename file '/opt/oracle/oradata/xifenfei/system01.dbf' to '/data3/backup/system01.dbf_bak'
Sat Nov 22 17:53:39 2014
Completed: alter database rename file '/opt/oracle/oradata/xifenfei/system01.dbf' to '/data3/backup/system01.dbf_bak'
Sat Nov 22 17:55:43 2014
alter database open
Sat Nov 22 17:55:48 2014
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=18, OS id=15858
Sat Nov 22 17:56:10 2014
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=17, OS id=15879
Sat Nov 22 17:56:19 2014
Thread 1 opened at log sequence 7
  Current log# 8 seq# 7 mem# 0: /data2/oradata/redo0802.log
Successful open of redo thread 1
Sat Nov 22 17:56:19 2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Nov 22 17:56:19 2014
SMON: enabling cache recovery
SMON: enabling tx recovery
Sat Nov 22 17:56:20 2014
ARC1: STARTING ARCH PROCESSES
Sat Nov 22 17:56:20 2014
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Sat Nov 22 17:56:22 2014
Database Characterset is ZHS16CGB231280
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Sat Nov 22 17:56:33 2014
ARC2: Archival started
ARC1: STARTING ARCH PROCESSES COMPLETE
ARC1: Becoming the heartbeat ARCH
ARC2 started with pid=23, OS id=15928
QMNC started with pid=25, OS id=15996
Sat Nov 22 17:57:11 2014
Completed: alter database open
Sat Nov 22 17:57:18 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_16010.trc:
ORA-00600: internal error code, arguments: [4511], [], [], [], [], [], [], []
Sat Nov 22 17:57:26 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_16012.trc:
ORA-00600: internal error code, arguments: [4511], [], [], [], [], [], [], []
Sat Nov 22 17:58:17 2014
Starting background process EMN0
Sat Nov 22 18:00:03 2014
Shutting down instance: further logons disabled
EMN0 started with pid=71, OS id=16421
Sat Nov 22 18:00:12 2014
SMON: Restarting fast_start parallel rollback
Sat Nov 22 18:00:23 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_p000_15951.trc:
ORA-00600: internal error code, arguments: [4198], [9], [], [], [], [], [], []
Sat Nov 22 18:00:24 2014
Stopping background process CJQ0
Sat Nov 22 18:00:24 2014
Stopping background process QMNC
Sat Nov 22 18:00:27 2014
Doing block recovery for file 2 block 41
Block recovery from logseq 7, block 180883 to scn 214748389244
Sat Nov 22 18:00:27 2014
Recovery of Online Redo Log: Thread 1 Group 8 Seq 7 Reading mem 0
  Mem# 0 errs 0: /data2/oradata/redo0802.log
Block recovery stopped at EOT rba 7.180988.16
Block recovery completed at rba 7.180988.16, scn 50.24441
Sat Nov 22 18:00:32 2014
Stopping background process MMNL
Sat Nov 22 18:00:38 2014
Stopping background process MMON
Sat Nov 22 18:00:41 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_smon_15395.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Sat Nov 22 18:00:42 2014
ORACLE Instance xifenfei (pid = 9) - Error 600 encountered while recovering transaction (3, 4).
Sat Nov 22 18:00:42 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_smon_15395.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []

这里都是很常规的错误,查询undo$也已经正常,重建新undo表空间删除老undo,然后alert日志中无其他报错,数据库恢复至此完成,建议客户导出导入重建数据库

Oracle 12C的第一次异常恢复—文件头坏块

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:Oracle 12C的第一次异常恢复—文件头坏块

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到第一个使用Oracle 12C作为生产库的恢复救援.有两个业务数据文件报文件头损坏,其他数据文件全部是9月份的一次备份,在当前的条件下,希望我们能够帮他们恢复出来业务文件中的数据
数据库版本信息

SQL> select * from v$version;
BANNER                                                                               CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production              0
PL/SQL Release 12.1.0.1.0 - Production                                                    0
CORE    12.1.0.1.0      Production                                                        0
TNS for Linux: Version 12.1.0.1.0 - Production                                            0
NLSRTL Version 12.1.0.1.0 - Production                                                    0

数据库故障
具体脚本请参考:数据库恢复检查脚本(Oracle Database Recovery Check)
控制文件信息
1
控制文件中关于数据文件信息
2
数据文件头信息
3
alert日志报错

Reading datafile '/app/oracle/oradata/freetouch/sales.dbf' for corruption at rdba: 0x00000001 (file 4, block 1)
Reread (file 4, block 1) found same corrupt data (no logical check)
Hex dump of (file 5, block 1) in trace file /app/oracle/diag/rdbms/valuenet/valuenet/trace/valuenet_ora_12384.trc
Corrupt block relative dba: 0x00000001 (file 5, block 1)
Fractured block found during kcvxfh v8
Data in bad block:
 type: 0 format: 2 rdba: 0x00000001
 last change scn: 0x0000.00000000 seq: 0x1 flg: 0x05
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x00000001
 check value in block header: 0xa701
 computed block checksum: 0x0
Reading datafile '/app/oracle/oradata/freetouch/drp_200200' for corruption at rdba: 0x00000001 (file 5, block 1)
Reread (file 5, block 1) found same corrupt data (no logical check)
Hex dump of (file 4, block 1) in trace file /app/oracle/diag/rdbms/valuenet/valuenet/trace/valuenet_ora_12384.trc
Corrupt block relative dba: 0x00000001 (file 4, block 1)
Fractured block found during kcvxfh v8
Data in bad block:
 type: 0 format: 2 rdba: 0x00000001
 last change scn: 0x0000.00000000 seq: 0x1 flg: 0x05
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x00000001
 check value in block header: 0xa701
 computed block checksum: 0x0

odu无法识别异常文件

[oracle@db odu]$ ./odu
Oracle Data Unloader trial version 4.1.3
Copyright (c) 2008,2009,2010,2011 XiongJun. All rights reserved.
loading default config.......
byte_order little
block_size  8192
db_timezone -7
Invalid db timezone:-7
client_timezone 8
Invalid client timezone:8
asmfile_extract_path /home/oracle/hongye/odu/data
data_path  /home/oracle/hongye/odu/data
lob_path  /home/oracle/hongye/odu/data
charset_name ZHS16GBK
ncharset_name AL16UTF16
output_form  dmp
error at line 10.
lob_storage infile
clob_byte_order big
trace_level 1
delimiter |
unload_deleted yes
file_header_offset 0
is_tru64 no
record_row_addr no
convert_clob_charset yes
load config file 'config.txt' successful
loading default asm disk file ......
can not open file 'asmdisk.txt', error message:No such file or directory.
loading default control file ......
unknown file format '/app/oracle/oradata/freetouch/sales.dbf'
unknown file format '/app/oracle/oradata/freetouch/drp_200200'
 ts#   fn  rfn bsize   blocks bf offset filename
---- ---- ---- ----- -------- -- ------ --------------------------------------------
   1    1    1  8192   194560 N       0 /app/oracle/oradata/freetouch/system01.dbf
   6    2   10  8192    45840 N       0 /app/oracle/oradata/freetouch/example01.dbf
   1    3    3  8192   907520 N       0 /app/oracle/oradata/freetouch/sysaux01.dbf
   4 1024   10  8192        0 N       0 /app/oracle/oradata/freetouch/sales.dbf
   5 1024    9  8192        0 N       0 /app/oracle/oradata/freetouch/drp_200200
   4    6    6  8192   128320 N       0 /app/oracle/oradata/freetouch/users01.dbf
   7    7    7  8192   780288 N       0 /app/oracle/oradata/freetouch/undotbs03.dbf
  11    8    8  8192    25600 N       0 /app/oracle/oradata/freetouch/indx01.dbf
load control file 'control.txt' successful
loading dictionary data......done
loading scanned data......done

dul无法识别异常文件

[oracle@db dul]$ ./dul
Data UnLoader: 10.2.0.5.32 - Internal Only - on Sun Nov  2 23:34:42 2014
with 64-bit io functions
Copyright (c) 1994 2014 Bernard van Duijnen All rights reserved.
 Strictly Oracle Internal Use Only
DUL: Warning: ulimit process stack size is only 33554432
Found db_id = 270587870
Found db_name = VALUENET
DUL: Warning: Cannot verify file number for /app/oracle/oradata/freetouch/sales.dbf
DUL: Warning: First four bytes(76 162 0 0) of block 2 are not the start of a proper data block header
DUL: Warning: Block corruption or configuration error
DUL: Warning: Check db_block_size and/or osd_file_leader_size and/or file offset
DUL: Error: File Number can only be zero for Single Tablespace Datafiles
DUL: Warning: Cannot verify file number for /app/oracle/oradata/freetouch/drp_200200
DUL: Warning: First four bytes(76 162 0 0) of block 2 are not the start of a proper data block header
DUL: Warning: Block corruption or configuration error
DUL: Warning: Check db_block_size and/or osd_file_leader_size and/or file offset
DUL: Error: File Number can only be zero for Single Tablespace Datafiles
DUL> show datafiles;
ts# rf# start   blocks offs open  err file name
  0   1     0   194561    0    1    0 /app/oracle/oradata/freetouch/system01.dbf
  1   3     0   907521    0    1    0 /app/oracle/oradata/freetouch/sysaux01.dbf
  4   6     0   128321    0    1    0 /app/oracle/oradata/freetouch/users01.dbf
  7   7     0   780289    0    1    0 /app/oracle/oradata/freetouch/undotbs03.dbf
 11   8     0    25601    0    1    0 /app/oracle/oradata/freetouch/indx01.dbf
  6  10     0    45841    0    1    0 /app/oracle/oradata/freetouch/example01.dbf

该异常文件使用dul/odu均无法正常识别.证明文件头确实已经损坏

dbv 检测

[oracle@db trace]$ dbv file=/app/oracle/oradata/freetouch/drp_200200
DBVERIFY: Release 12.1.0.1.0 - Production on Sun Nov 2 14:08:34 2014
Copyright (c) 1982, 2013, Oracle and/or its affiliates.  All rights reserved.
DBVERIFY - Verification starting : FILE = /app/oracle/oradata/freetouch/drp_200200
DBVERIFY - Verification complete
Total Pages Examined         : 194560
Total Pages Processed (Data) : 114596
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 26198
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 37787
Total Pages Processed (Seg)  : 1
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 15979
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 129603862 (0.129603862)
[oracle@db ~]$ dbv file=/app/oracle/oradata/freetouch/sales.dbf
DBVERIFY: Release 12.1.0.1.0 - Production on Sun Nov 2 23:12:05 2014
Copyright (c) 1982, 2013, Oracle and/or its affiliates.  All rights reserved.
DBVERIFY - Verification starting : FILE = /app/oracle/oradata/freetouch/sales.dbf
DBVERIFY - Verification complete
Total Pages Examined         : 655360
Total Pages Processed (Data) : 294938
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 233404
Total Pages Failing   (Index): 0
Total Pages Processed (Lob)  : 38
Total Pages Failing   (Lob)  : 0
Total Pages Processed (Other): 23252
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 103728
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 134665298 (0.134665298)

dbv检测结果无坏块,但是v$datafile_header和alert日志中报坏块,初步判断是由于该文件是bigfile,dbv未检测到文件头坏块,实际该该数据文件头损坏,其他block正常.所幸的是该库有9月份的rman备份(中间归档丢失),因此使用rman还原出来9月份的数据文件,然后使用dd拷贝两个 block(block 0和block 1)到异常文件.

[root@db freetouch]# dd if=/app1/oracle/oradata/freetouch/sales.dbf bs=8192 count=2 of=/tmp/odu/sales.2
2+0 records in
2+0 records out
16384 bytes (16 kB) copied, 0.0176368 s, 929 kB/s
[root@db freetouch]# dd if=/tmp/odu/sales.2 of=/app/oracle/oradata/freetouch/sales.dbf bs=8192 count=2 conv=notrunc
2+0 records in
2+0 records out
16384 bytes (16 kB) copied, 6.4281e-05 s, 255 MB/s
[root@db freetouch]# dd if=/app1/oracle/oradata/freetouch/drp_200200 bs=8192 count=2 of=/tmp/odu/drp_200200.2
2+0 records in
2+0 records out
16384 bytes (16 kB) copied, 0.0185934 s, 881 kB/s
[root@db freetouch]# dd if=/tmp/odu/drp_200200.2 of=/app/oracle/oradata/freetouch/drp_200200 bs=8192 count=2 conv=notrunc
2+0 records in
2+0 records out
16384 bytes (16 kB) copied, 6.4419e-05 s, 254 MB/s

尝试恢复数据库

SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-01610: recovery using the BACKUP CONTROLFILE option must be done
SQL> recover database using backup controlfile;
ORA-00283: recovery session canceled due to errors
ORA-19909: datafile 4 belongs to an orphan incarnation
ORA-01110: data file 4: '/app/oracle/oradata/freetouch/sales.dbf'

使用bbed修改相关文件头,然后继续恢复
具体见:bbed解决ORA-01190类似方法处理

SQL> recover database using backup controlfile;
ORA-00279: change 129603904 generated at 11/02/2014 19:19:54 needed for thread
1
ORA-00289: suggestion :
/app/oracle/recovery_area/VALUENET/archivelog/2014_11_02/o1_mf_1_1_%u_.arc
ORA-00280: change 129603904 for thread 1 is in sequence #1
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel;
ORA-00308: cannot open archived log 'cancel;'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
Media recovery cancelled.
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [0], [129603911], [0],
[129603913], [29360256], [], [], [], [], [], []
Process ID: 19881
Session ID: 1 Serial number: 3

出现ORA-600[2662]错误,因为scn相差比较小,重启数据库机器,出现ORA-600[4194]错误

SQL> startup pfile='/tmp/pfile.txt' mount
ORACLE instance started.
Total System Global Area 2505338880 bytes
Fixed Size                  2291472 bytes
Variable Size             973080816 bytes
Database Buffers         1526726656 bytes
Redo Buffers                3239936 bytes
Database mounted.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: '/app/oracle/oradata/freetouch/system01.dbf'
SQL> recover database ;
ORA-00283: recovery session canceled due to errors
ORA-16433: The database or pluggable database must be opened in read/write
mode.

重建控制文件后继续恢复

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [4194], [46], [19], [], [], [], [],
[], [], [], [], []
Process ID: 20351
Session ID: 1 Serial number: 3

设置undo_management=MANUAL然后继续恢复

[oracle@db tmp]$ sqlplus / as sysdba
SQL*Plus: Release 12.1.0.1.0 Production on Sun Nov 2 19:29:45 2014
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup pfile='/tmp/pfile.txt'
ORACLE instance started.
Total System Global Area 2505338880 bytes
Fixed Size                  2291472 bytes
Variable Size             973080816 bytes
Database Buffers         1526726656 bytes
Redo Buffers                3239936 bytes
Database mounted.
Database opened.

这次的恢复也证明Oracle 12C确实有着越来越多的用户在使用.

ORA-27069: skgfdisp: 尝试在文件范围外执行 I/O

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-27069: skgfdisp: 尝试在文件范围外执行 I/O

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到网友技术支持请求,win 2003 ntfs格式文件系统,Oracle 8.1.7版本,主机重启后,数据库无法正常启动,offline datafile 15,数据库open成功,但是datafile 无法正常online,报错为:ORA-27069: skgfdisp: attempt to do I/O beyond the range of the file,请求协助处理

SQL> recover datafile 'D:\ORACLE\ORADATA\ORCL\ZSF_DATA.DBF';
ORA-00283: 恢复会话因错误而取消
ORA-01115: 从文件 15 读取块时出现 IO 错误 (块 # 1030071)
ORA-01110: 数据文件 15: 'D:\ORACLE\ORADATA\ORCL\ZSF_DATA.DBF'
ORA-27069: skgfdisp: 尝试在文件范围外执行 I/O
OSD-04026: 无效的参数经过. (OS 1030071)

使用bbed,成功online datafile 15

Tue Oct 28 16:30:35 2014
ALTER DATABASE RECOVER  datafile 15
Tue Oct 28 16:30:35 2014
Media Recovery Datafile: 15
Media Recovery Start
Media Recovery Log
Recovery of Online Redo Log: Thread 1 Group 1 Seq 245110 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\ORADATA\ORCL\REDO03.LOG
Media Recovery failed with error 1115
ORA-283 signalled during: ALTER DATABASE RECOVER  datafile 15  ...
Tue Oct 28 16:32:53 2014
Shutting down instance (abort)
License high water mark = 6
Instance terminated by USER, pid = 1548
Starting up ORACLE RDBMS Version: 8.1.7.0.0.
System parameters with non-default values:
  processes                = 600
  shared_pool_size         = 52428800
  large_pool_size          = 20971520
  java_pool_size           = 20971520
  control_files            = D:\oracle\oradata\ORCL\control01.ctl, D:\oracle\oradata\ORCL\control02.ctl
  db_block_buffers         = 19200
  db_block_size            = 8192
  compatible               = 8.1.0
  log_buffer               = 32768
  log_checkpoint_interval  = 10000
  log_checkpoint_timeout   = 1800
  db_files                 = 1024
  db_file_multiblock_read_count= 8
  max_enabled_roles        = 30
  remote_login_passwordfile= EXCLUSIVE
  global_names             = TRUE
  distributed_transactions = 500
  instance_name            = ORCL
  service_names            = ORCL
  mts_dispatchers          = (PROTOCOL=TCP)(PRE=oracle.aurora.server.SGiopServer)
  open_links               = 4
  sort_area_size           = 65536
  sort_area_retained_size  = 65536
  db_name                  = ORCL
  open_cursors             = 500
  ifile                    = D:\oracle\admin\ORCL\pfile\init.ora
  os_authent_prefix        =
  job_queue_processes      = 4
  job_queue_interval       = 10
  parallel_max_servers     = 5
  background_dump_dest     = D:\oracle\admin\ORCL\bdump
  user_dump_dest           = D:\oracle\admin\ORCL\udump
  max_dump_file_size       = 10240
  oracle_trace_collection_name=
PMON started with pid=2
DBW0 started with pid=3
LGWR started with pid=4
CKPT started with pid=5
SMON started with pid=6
RECO started with pid=7
SNP0 started with pid=8
SNP1 started with pid=9
SNP2 started with pid=10
SNP3 started with pid=11
Tue Oct 28 16:33:01 2014
starting up 1 shared server(s) ...
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
Tue Oct 28 16:33:02 2014
ALTER DATABASE   MOUNT
Tue Oct 28 16:33:06 2014
Successful mount of redo thread 1, with mount id 1389958722.
Tue Oct 28 16:33:06 2014
Database mounted in Exclusive Mode.
Completed: ALTER DATABASE   MOUNT
Tue Oct 28 16:33:49 2014
ALTER DATABASE RECOVER  database until cancel
Tue Oct 28 16:33:49 2014
Media Recovery Start
Media Recovery Log
kcrrga: Warning.  Log sequence in archive filename wrapped
to fix length as indicated by %S in LOG_ARCHIVE_FORMAT.
Old log archive with same name might be overwritten.
ORA-279 signalled during: ALTER DATABASE RECOVER  database until cancel  ...
Tue Oct 28 16:34:03 2014
ALTER DATABASE RECOVER    LOGFILE 'D:\ORACLE\ORADATA\ORCL\REDO02.LOG'
Tue Oct 28 16:34:03 2014
Media Recovery Log D:\ORACLE\ORADATA\ORCL\REDO02.LOG
Incomplete recovery applied all redo ever generated.
Recovery completed through change %s139866389
Media Recovery Complete
Completed: ALTER DATABASE RECOVER    LOGFILE 'D:\ORACLE\ORADA
Tue Oct 28 16:34:29 2014
alter database datafile 15 online
Tue Oct 28 16:34:29 2014
Completed: alter database datafile 15 online
Tue Oct 28 16:34:36 2014
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 139866389
Tue Oct 28 16:34:38 2014
Thread 1 opened at log sequence 1
  Current log# 2 seq# 1 mem# 0: D:\ORACLE\ORADATA\ORCL\REDO02.LOG
Successful open of redo thread 1.
Tue Oct 28 16:34:38 2014
SMON: enabling cache recovery
Tue Oct 28 16:34:38 2014
Dictionary check beginning
Dictionary check complete
Tue Oct 28 16:34:39 2014
SMON: enabling tx recovery
Tue Oct 28 16:34:44 2014
Completed: alter database open resetlogs

数据库datafile 15 online成功后,客户操作业务继续发生ORA-600[ktsxs_add2]错误

Tue Oct 28 17:07:42 2014
Errors in file D:\oracle\admin\ORCL\udump\ORA02340.TRC:
ORA-00600: 内部错误代码,自变量: [ktsxs_add2], [14], [15], [42534], [5732], [5733], [], []
Tue Oct 28 17:07:53 2014
Errors in file D:\oracle\admin\ORCL\udump\ORA02340.TRC:
ORA-00600: 内部错误代码,自变量: [ktsxs_add2], [14], [15], [42534], [5732], [5733], [], []
Tue Oct 28 17:08:03 2014
Errors in file D:\oracle\admin\ORCL\udump\ORA02340.TRC:
ORA-00600: 内部错误代码,自变量: [ktsxs_add2], [14], [15], [42534], [5732], [5733], [], []
Tue Oct 28 17:08:16 2014
Errors in file D:\oracle\admin\ORCL\udump\ORA02340.TRC:
ORA-00600: 内部错误代码,自变量: [ktsxs_add2], [14], [15], [42534], [5732], [5733], [], []
Tue Oct 28 17:08:23 2014
Errors in file D:\oracle\admin\ORCL\udump\ORA02308.TRC:
ORA-00600: 内部错误代码,自变量: [ktsxs_add2], [14], [15], [42534], [5732], [5733], [], []
Tue Oct 28 17:08:31 2014
Errors in file D:\oracle\admin\ORCL\udump\ORA02340.TRC:
ORA-00600: 内部错误代码,自变量: [ktsxs_add2], [14], [15], [42534], [5732], [5733], [], []
Tue Oct 28 17:08:38 2014
Errors in file D:\oracle\admin\ORCL\udump\ORA02308.TRC:
ORA-00600: 内部错误代码,自变量: [ktsxs_add2], [14], [15], [42534], [5732], [5733], [], []

通过分析相关日志发现是insert插入表报错,很好理解,该库的datafile 15已经超过了系统的限制,现在继续插入数据,因此报错,查询可能异常对象

SQL> col segment_name for a20
SQL> SELECT distinct OWNER, SEGMENT_NAME, SEGMENT_TYPE, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
  3   WHERE FILE_ID = 15
  4     AND 1030071 <= BLOCK_ID;
OWNER                          SEGMENT_NAME         SEGMENT_TYPE
------------------------------ -------------------- ------------------
PARTITION_NAME
------------------------------
ZSF                            DETAIL               TABLE
ZSF                            DETAIL1              INDEX
ZSF                            DETAIL2              INDEX
OWNER                          SEGMENT_NAME         SEGMENT_TYPE
------------------------------ -------------------- ------------------
PARTITION_NAME
------------------------------
ZSF                            DETAIL3              INDEX
ZSF                            DETAIL4              INDEX
ZSF                            FK_RECI_ORD          INDEX
OWNER                          SEGMENT_NAME         SEGMENT_TYPE
------------------------------ -------------------- ------------------
PARTITION_NAME
------------------------------
ZSF                            PREPAY1              INDEX
ZSF                            RECEDETAIL1          INDEX

创建新表空间

Create tablespace zsf_new datafile  'D:\ORACLE\ORADATA\ORCL\ZSF_DATA_new01.dbf' size 4096m;
alter tablespace zsf_new add datafile 'D:\ORACLE\ORADATA\ORCL\ZSF_DATA_new02.dbf'
size 128m autoextend on next 128M maxsize 4096m;

迁移异常对象到新表空间

alter table ZSF.DETAIL move tablespace ZSF_new;
alter index ZSF.DETAIL1 rebuild tablespace ZSF_new;
alter index ZSF.DETAIL2 rebuild tablespace ZSF_new;
alter index ZSF.DETAIL3 rebuild tablespace ZSF_new;
alter index ZSF.DETAIL4 rebuild tablespace ZSF_new;
alter index ZSF.FK_RECI_ORD rebuild tablespace ZSF_new;
alter index ZSF.PREPAY1 rebuild tablespace ZSF_new;
alter index ZSF.RECEDETAIL1 rebuild tablespace ZSF_new;

然后对于datafile 15所在表空间增加新文件,因为已经迁移了异常对象,然后resize datafile 15小于8G,关闭自扩展,至此该数据库恢复完成

bootstrap$核心index(I_OBJ1,I_USER1,I_FILE#_BLOCK#,I_IND1,I_TS#,I_CDEF1等)异常恢复—ORA-00701错误解决

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:bootstrap$核心index(I_OBJ1,I_USER1,I_FILE#_BLOCK#,I_IND1,I_TS#,I_CDEF1等)异常恢复—ORA-00701错误解决

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在Oracle使用的过程中,经常遭遇一些核心index出现异常,导致数据库无法正常使用,特别是在数据库open的情况下,因为出现这些bootstrap$中的部分index是无法通过设置event38003进行重建,从而导致数据库无法正常使用,最常见异常index 有:I_ICOL1, I_TS1, I_CDEF1, I_CDEF2, I_CDEF3, I_CDEF4, I_PROXY_DATA$, I_IND1, I_TS#, I_UNDO1, I_UNDO2, I_COBJ#, I_USER1, I_USER2, I_CON1, I_CON2, I_FILE1, I_FILE2, I_FILE#_BLOCK#, I_USER#, I_OBJ#, I_PROXY_ROLE_DATA$_1, I_PROXY_ROLE_DATA$_2, I_CCOL1, I_CCOL2, I_TAB1, I_COL1, I_COL2, I_COL3, I_OBJ1, I_OBJ2, I_OBJ3, I_OBJ4, I_OBJ5
重建的index异常报错有:ORA-00701,ORA-01410,ORA-08102,ORA-08103,ORA-01578,ORA-01499,ORA-00600等各类错误
因为处理细节复杂,无法一一描述出来.这里假设Oracle数据库一个非常核心的表OBJ$中的I_OBJ1异常,现在准备重建它,大概步骤如下:
1. 假设异常index信息(I_OBJ1)

SQL> alter index sys.i_obj1 rebuild;
alter index sys.i_obj1 rebuild
*
ERROR at line 1:
ORA-00701: object necessary for warmstarting database cannot be altered
SQL> SELECT SQL_TEXT FROM BOOTSTRAP$ WHERE SQL_TEXT LIKE '%obj%';
SQL_TEXT
--------------------------------------------------------------------------------
CREATE UNIQUE INDEX I_OBJ1 ON OBJ$(OBJ#,OWNER#,TYPE#) PCTFREE 10 INITRANS 2 MAXT
RANS 255 STORAGE (  INITIAL 64K NEXT 1024K MINEXTENTS 1 MAXEXTENTS 2147483645 PC
TINCREASE 0 OBJNO 36 EXTENTS (FILE 1 BLOCK 336))

2. 创建获取异常index所属表

SQL>  select table_name from dba_indexes where owner='SYS' AND INDEX_NAME='I_OBJ1';
TABLE_NAME
--------------------------------------------------------------------------------
OBJ$

3. 获取obj$ ddl 语句

SQL> set long 10000
SQL> set pages 1000
SQL>  SELECT DBMS_METADATA.GET_DDL('TABLE','OBJ$','SYS') FROM DUAL;
DBMS_METADATA.GET_DDL('TABLE','OBJ$','SYS')
--------------------------------------------------------------------------------
  CREATE TABLE "SYS"."OBJ$"
   (    "OBJ#" NUMBER NOT NULL ENABLE,
        "DATAOBJ#" NUMBER,
        "OWNER#" NUMBER NOT NULL ENABLE,
        "NAME" VARCHAR2(128) NOT NULL ENABLE,
        "NAMESPACE" NUMBER NOT NULL ENABLE,
        "SUBNAME" VARCHAR2(128),
        "TYPE#" NUMBER NOT NULL ENABLE,
        "CTIME" DATE NOT NULL ENABLE,
        "MTIME" DATE NOT NULL ENABLE,
        "STIME" DATE NOT NULL ENABLE,
        "STATUS" NUMBER NOT NULL ENABLE,
        "REMOTEOWNER" VARCHAR2(128),
        "LINKNAME" VARCHAR2(128),
        "FLAGS" NUMBER,
        "OID$" RAW(16),
        "SPARE1" NUMBER,
        "SPARE2" NUMBER,
        "SPARE3" NUMBER,
        "SPARE4" VARCHAR2(1000),
        "SPARE5" VARCHAR2(1000),
        "SPARE6" DATE,
        "SIGNATURE" RAW(16),
        "SPARE7" NUMBER,
        "SPARE8" NUMBER,
        "SPARE9" NUMBER
   ) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 NOCOMPRESS LOGGING
  STORAGE(INITIAL 16384 NEXT 106496 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "SYSTEM"

4. 获取obj$相关index 语句

SQL>  SELECT DBMS_METADATA.GET_DDL('INDEX',INDEX_NAME,'SYS') FROM dba_indexes where owner='SYS' AND TABLE_NAME='OBJ$';
DBMS_METADATA.GET_DDL('INDEX',INDEX_NAME,'SYS')
--------------------------------------------------------------------------------
  CREATE UNIQUE INDEX "SYS"."I_OBJ5" ON "SYS"."OBJ$" ("SPARE3", "NAME", "NAMESPA
CE", "TYPE#", "OWNER#", "REMOTEOWNER", "LINKNAME", "SUBNAME", "OBJ#")
  PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "SYSTEM"
  CREATE INDEX "SYS"."I_OBJ4" ON "SYS"."OBJ$" ("DATAOBJ#", "TYPE#", "OWNER#")
  PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "SYSTEM"
  CREATE INDEX "SYS"."I_OBJ3" ON "SYS"."OBJ$" ("OID$")
  PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "SYSTEM"
  CREATE UNIQUE INDEX "SYS"."I_OBJ2" ON "SYS"."OBJ$" ("OWNER#", "NAME", "NAMESPA
CE", "REMOTEOWNER", "LINKNAME", "SUBNAME", "TYPE#", "SPARE3", "OBJ#")
  PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
  STORAGE(INITIAL 16384 NEXT 106496 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "SYSTEM"
  CREATE UNIQUE INDEX "SYS"."I_OBJ1" ON "SYS"."OBJ$" ("OBJ#", "OWNER#", "TYPE#")
  PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "SYSTEM"

5. 创建obj$img表和相关index

SQL>   CREATE TABLE "SYS"."OBJ$IMG"
  2     (    "OBJ#" NUMBER NOT NULL ENABLE,
  3          "DATAOBJ#" NUMBER,
  4          "OWNER#" NUMBER NOT NULL ENABLE,
  5          "NAME" VARCHAR2(128) NOT NULL ENABLE,
  6          "NAMESPACE" NUMBER NOT NULL ENABLE,
  7          "SUBNAME" VARCHAR2(128),
  8          "TYPE#" NUMBER NOT NULL ENABLE,
  9          "CTIME" DATE NOT NULL ENABLE,
 10          "MTIME" DATE NOT NULL ENABLE,
 11          "STIME" DATE NOT NULL ENABLE,
 12          "STATUS" NUMBER NOT NULL ENABLE,
 13          "REMOTEOWNER" VARCHAR2(128),
 14          "LINKNAME" VARCHAR2(128),
 15          "FLAGS" NUMBER,
 16          "OID$" RAW(16),
 17          "SPARE1" NUMBER,
 18          "SPARE2" NUMBER,
 19          "SPARE3" NUMBER,
 20          "SPARE4" VARCHAR2(1000),
 21          "SPARE5" VARCHAR2(1000),
 22          "SPARE6" DATE,
 23          "SIGNATURE" RAW(16),
 24          "SPARE7" NUMBER,
 25          "SPARE8" NUMBER,
 26          "SPARE9" NUMBER
 27     ) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 28   NOCOMPRESS LOGGING
 29    STORAGE(INITIAL 16384 NEXT 106496 MINEXTENTS 1 MAXEXTENTS 2147483645
 30    PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
 31    BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
 32    TABLESPACE "SYSTEM"
 33  ;
Table created.
SQL>
SQL>   CREATE UNIQUE INDEX "SYS"."I_OBJ_5IMG" ON "SYS"."OBJ$IMG" ("SPARE3", "NAME", "NAMESPACE", "TYPE#", "OWNER#", "REMOTEOWNER", "LINKNAME", "SUBNAME", "OBJ#")
  2    PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
  3    STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  4    PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  5    BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  6    TABLESPACE "SYSTEM";
Index created.
SQL>   CREATE INDEX "SYS"."I_OBJ4IMG" ON "SYS"."OBJ$IMG" ("DATAOBJ#", "TYPE#", "OWNER#")
  2    PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
  3    STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  4    PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  5    BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  6    TABLESPACE "SYSTEM";
Index created.
SQL>   CREATE INDEX "SYS"."I_OBJ3IMG" ON "SYS"."OBJ$IMG" ("OID$")
  2    PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
  3    STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  4    PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  5    BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  6    TABLESPACE "SYSTEM";
Index created.
SQL>   CREATE UNIQUE INDEX "SYS"."I_OBJ2IMG" ON "SYS"."OBJ$IMG" ("OWNER#", "NAME", "NAMESPACE", "REMOTEOWNER", "LINKNAME", "SUBNAME", "TYPE#", "SPARE3", "OBJ#")
  2    PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
  3    STORAGE(INITIAL 16384 NEXT 106496 MINEXTENTS 1 MAXEXTENTS 2147483645
  4    PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  5    BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  6    TABLESPACE "SYSTEM";
Index created.
SQL>   CREATE UNIQUE INDEX "SYS"."I_OBJ1IMG" ON "SYS"."OBJ$IMG" ("OBJ#", "OWNER#", "TYPE#")
  2    PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
  3    STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  4    PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  5    BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  6    TABLESPACE "SYSTEM"
  7  ;
Index created.

6. 获取bootstrap$语句

SQL> SELECT DBMS_METADATA.GET_DDL('TABLE','BOOTSTRAP$','SYS') FROM DUAL;
DBMS_METADATA.GET_DDL('TABLE','BOOTSTRAP$','SYS')
--------------------------------------------------------------------------------
  CREATE TABLE "SYS"."BOOTSTRAP$"
   (    "LINE#" NUMBER NOT NULL ENABLE,
        "OBJ#" NUMBER NOT NULL ENABLE,
        "SQL_TEXT" VARCHAR2(4000) NOT NULL ENABLE
   ) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 NOCOMPRESS LOGGING
  STORAGE(INITIAL 57344 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "SYSTEM"

7. 创建bootstrap$img表

SQL>   CREATE TABLE "SYS"."BOOTSTRAP$IMG"
  2     (    "LINE#" NUMBER NOT NULL ENABLE,
  3          "OBJ#" NUMBER NOT NULL ENABLE,
  4          "SQL_TEXT" VARCHAR2(4000) NOT NULL ENABLE
  5     ) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
  6   NOCOMPRESS LOGGING
  7    STORAGE(INITIAL 57344 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  8    PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  9    BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
 10    TABLESPACE "SYSTEM"
 11  /
Table created.

8.插入数据

insert into obj$img select * from obj$;
insert into bootstrap$img select * from bootstrap$;
commit;

9. 删除bootstrap$img对象名

delete from bootstrap$img where obj# in
 (select obj# from obj$
  where name in ('OBJ$',  'I_OBJ1',  'I_OBJ2', 'I_OBJ3', 'I_OBJ4', 'I_OBJ5',
                 'BOOTSTRAP$'));
commit;

10. 插入新创建对象

insert into bootstrap$img select * from bootstrap$tmpstr;
commit;

11. 关闭数据库

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.

12. bbed修改相关block值

[oracle@oel6 ~]$ bbed
Password:
BBED: Release 2.0.0.0.0 - Limited Production on Thu Aug 7 04:54:50 2014
Copyright (c) 1982, 2013, Oracle and/or its affiliates.  All rights reserved.
************* !!! For Oracle Internal Use only !!! ***************
BBED> set filename '/u01/app/oracle/oradata/XIFENFEI/system01.dbf'
        FILENAME        /u01/app/oracle/oradata/XIFENFEI/system01.dbf
BBED> set blocksize 8192
        BLOCKSIZE       8192
BBED> set block 1
        BLOCK#          1
BBED> set mode edit
        MODE            Edit
BBED> set count 32
        COUNT           32
BBED> m /x e81d
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: /u01/app/oracle/oradata/XIFENFEI/system01.dbf (0)
 Block: 1                Offsets:   196 to  227           Dba:0x00000000
------------------------------------------------------------------------
 e81d4000 12000000 00000000 41bad632 15bad632 01000000 00000000 00000000
 <32 bytes per line>
BBED> m /x 5200 offset +2
 File: /u01/app/oracle/oradata/XIFENFEI/system01.dbf (0)
 Block: 1                Offsets:   198 to  229           Dba:0x00000000
------------------------------------------------------------------------
 52001200 00000000 000041ba d63215ba d6320100 00000000 00000000 00000000
 <32 bytes per line>
BBED> d offset 96
 File: /u01/app/oracle/oradata/XIFENFEI/system01.dbf (0)
 Block: 1                Offsets:   196 to  227           Dba:0x00000000
------------------------------------------------------------------------
 e81d5200 12000000 00000000 41bad632 15bad632 01000000 00000000 00000000
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 1:
current = 0x5fbf, required = 0x5fbf
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/app/oracle/oradata/XIFENFEI/system01.dbf
BLOCK = 1
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0
Message 531 not found;  product=RDBMS; facility=BBED
BBED> exit

13. 启动数据库

SQL> startup
ORACLE instance started.
Total System Global Area  663945216 bytes
Fixed Size                  2291808 bytes
Variable Size             369100704 bytes
Database Buffers          289406976 bytes
Redo Buffers                3145728 bytes
Database mounted.
Database opened.

14. 验证I_OBJ1 index重建情况

SQL> SELECT OBJECT_ID,DATA_OBJECT_ID,CREATED FROM DBA_OBJECTS WHERE OBJECT_NAME='I_OBJ1';
 OBJECT_ID DATA_OBJECT_ID CREATED
---------- -------------- ---------
     77120          77120 06-AUG-14
SQL> !date
Thu Aug  6 05:29:25 CST 2014
SQL> SELECT HEADER_FILE,HEADER_BLOCK FROM DBA_SEGMENTS WHERE SEGMENT_NAME='I_OBJ1';
HEADER_FILE HEADER_BLOCK
----------- ------------
          1        77296
SQL> SELECT SQL_TEXT FROM BOOTSTRAP$ WHERE SQL_TEXT LIKE '%obj%';
SQL_TEXT
-----------------------------------------------------------------------------------------------------------------------------------------------
create unique index i_obj1 on obj$(obj#, owner#, type#) PCTFREE 10 INITRANS 2 MAXTRANS 255 STORAGE (  INITIAL 65536 NEXT 1048576 MINEXTENTS 1 M
AXEXTENTS 2147483645 PCTINCREASE 0 OBJNO 77120 EXTENTS (FILE 1 BLOCK 77296))
create unique index i_obj2 on obj$(owner#, name, namespace,remoteowner, linkname, subname, type#, spare3, obj#) PCTFREE 10 INITRANS 2 MAXTRANS
255 STORAGE (  INITIAL 16384 NEXT 106496 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 OBJNO 77119 EXTENTS (FILE 1 BLOCK 77288))
create index i_obj3 on obj$(oid$) PCTFREE 10 INITRANS 2 MAXTRANS 255 STORAGE (  INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 P
CTINCREASE 0 OBJNO 77118 EXTENTS (FILE 1 BLOCK 77280))
create index i_obj4 on obj$(dataobj#, type#, owner#) PCTFREE 10 INITRANS 2 MAXTRANS 255 STORAGE (  INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXE
XTENTS 2147483645 PCTINCREASE 0 OBJNO 77117 EXTENTS (FILE 1 BLOCK 77272))
create unique index i_obj5 on obj$(spare3, name, namespace, type#, owner#, remoteowner, linkname, subname, obj#) PCTFREE 10 INITRANS 2 MAXTRANS
 255 STORAGE (  INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 OBJNO 77116 EXTENTS (FILE 1 BLOCK 77264))

不同版本对应的基表和index结构不一样(本篇blog使用的是12c做试验),但是本方法支持数据库从Oracle 7到Oracle 12c对于bootstrap$中核心index处理.
因为重建bootstrap$中的核心index是一个复杂而且风险非常大的事情,在你无法确定风险或者无法正常完成此类操作之时,如有需要请联系我
Phone:17813235971    Q Q:107644445    E-Mail:dba@xifenfei.com

某集团ebs数据库redo undo丢失导致悲剧

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:某集团ebs数据库redo undo丢失导致悲剧

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

某集团的ebs系统因磁盘空间不足把redo和undo存放到raid 0之上,而且该库无任何备份。最终悲剧发生了,raid 0异常导致redo undo全部丢失,数据库无法正常启动(我接手之时数据库已经resetlogs过,但是未成功)

Sun Jul 27 11:31:27 2014
SMON: enabling cache recovery
SMON: enabling tx recovery
Sun Jul 27 11:31:27 2014
Database Characterset is ZHS16GBK
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_454754.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 42 cannot be read at this time
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_454754.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 42 cannot be read at this time
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_454754.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 42 cannot be read at this time
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_663670.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 41 cannot be read at this time
ORA-01110: data file 41: '/prod/oracle/PROD/logdata/undo/undo2.dbf'
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 663670
ORA-1092 signalled during: ALTER DATABASE OPEN...

查询相关文件状态发现,undo表空间文件丢失,被offline处理
df_status
因为以前alert日志被清理,通过这里大概猜测是offline丢失的undo文件,然后resetlogs了数据库,现在处理方式为
使用_corrupted_rollback_segments屏蔽回滚段,然后尝试启动数据库

Tue Jul 29 11:40:39 2014
SMON: enabling cache recovery
SMON: enabling tx recovery
Tue Jul 29 11:40:39 2014
Database Characterset is ZHS16GBK
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_569378.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_569378.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_569378.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_585786.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 585786
ORA-1092 signalled during: alter database open...

该错误是由于数据库启动需要找到对应的回滚段,但是由于undo异常导致该回滚段无法找到,因此出现该错误,解决方法是通过修改数据scn,让其不找回滚段,从而屏蔽该错误.数据库启动后,删除undo重新创建新undo

Tue Jul 29 15:59:22 2014
drop tablespace undo2 including contents and datafiles
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01122: database file 41 failed verification check
ORA-01110: data file 41: '/prod/oracle/PROD/logdata/undo/undo2.dbf'
ORA-01565: error in identifying file '/prod/oracle/PROD/logdata/undo/undo2.dbf'
ORA-27037: unable to obtain file status
IBM AIX RISC System/6000 Error: 2: No such file or directory
Additional information: 3
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01259: unable to delete datafile /prod/oracle/PROD/logdata/undo/undo2.dbf
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01122: database file 42 failed verification check
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
ORA-01565: error in identifying file '/prod/oracle/PROD/logdata/undo/undo1.dbf'
ORA-27037: unable to obtain file status
IBM AIX RISC System/6000 Error: 2: No such file or directory
Additional information: 3
ORA-01259: unable to delete datafile /prod/oracle/PROD/logdata/undo/undo2.dbf
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01259: unable to delete datafile /prod/oracle/PROD/logdata/undo/undo1.dbf
Tue Jul 29 15:59:23 2014
Completed: drop tablespace undo2 including contents and datafiles
Tue Jul 29 15:59:56 2014
create undo tablespace undotbs1 datafile '/prod/oracle/PROD/logdata/undo_new01.dbf' size 100M autoextend on next 128M maxsize 30G
Tue Jul 29 15:59:57 2014
Completed: create undo tablespace undotbs1 datafile '/prod/oracle/PROD/logdata/undo_new01.dbf' size 100M autoextend on next 128M maxsize 30G
Tue Jul 29 16:00:03 2014
alter tablespace undotbs1 add datafile '/prod/oracle/PROD/logdata/undo_new02.dbf' size 100M autoextend on next 128M maxsize 30G
Completed: alter tablespace undotbs1 add datafile '/prod/oracle/PROD/logdata/undo_new02.dbf' size 100M autoextend on next 128M maxsize 30G

业务运行过程中,数据库报大量ORA-600 4097,ORA-600 kdsgrp1,ORA-600 kcfrbd_3错误

Tue Jul 29 16:07:03 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_950484.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:07:06 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_950484.trc:
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Tue Jul 29 16:10:06 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_917702.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:10:07 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_917702.trc:
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Tue Jul 29 16:12:45 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_m000_880692.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:21:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:21:37 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:21:56 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:22:18 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:22:28 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1105950.trc:
ORA-00600: 内部错误代码, 参数: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:22:33 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1159232.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [42], [61235], [1], [12800], [12800], [], []

出现该错误有几个原因和解决方法:
ORA-600 kdsgrp1 是因为相关坏块引起(tab,index,memory,cr block等),结合日志分析对象异常原因,根据具体情况确定对象然后选择合适处理方案(具体参考NOTE:1332252.1)
ORA-600 4097 由于数据库异常关闭然后open,创建回滚段,可能触发bug导致该问题(虽然说在当前版本修复,但是实际处理我确实按照NOTE:1030620.6解决)
ORA-600 kcfrbd_3 有事务的block被访问之后,根据回滚槽信息定位到相关回滚段,而正好新建的回滚段信息又和以前的名字编号一致,从而反馈出来是数据文件大小不够,从而出现该错误(具体参考NOTE:601798.1)
最终该数据库虽然恢复了,抢救了大量数据,但是对于ebs系统来说,丢失redo和undo数据的损失还是巨大的.再次温馨提示:数据库的redo,undo也很重要,数据库的备份更加重要

ORACLE 8.1.7 数据库ORA-600 4194故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORACLE 8.1.7 数据库ORA-600 4194故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一个817数据库报ORA-600 4194 无法正常启动

Fri Jul 25 10:49:47 2014
Database mounted in Exclusive Mode.
Completed: ALTER DATABASE   MOUNT
Fri Jul 25 10:49:58 2014
ALTER DATABASE RECOVER  database
Fri Jul 25 10:49:58 2014
Media Recovery Start
Media Recovery Log
Recovery of Online Redo Log: Thread 1 Group 2 Seq 3320 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\ORADATA\ORCL\REDO02.LOG
Media Recovery Complete
Completed: ALTER DATABASE RECOVER  database
Fri Jul 25 10:50:09 2014
alter database open
Beginning crash recovery of 1 threads
Fri Jul 25 10:50:09 2014
Thread recovery: start rolling forward thread 1
Recovery of Online Redo Log: Thread 1 Group 2 Seq 3320 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\ORADATA\ORCL\REDO02.LOG
Fri Jul 25 10:50:09 2014
Thread recovery: finish rolling forward thread 1
Thread recovery: 0 data blocks read, 0 data blocks written, 3 redo blocks read
Crash recovery completed successfully
Fri Jul 25 10:50:09 2014
Thread 1 advanced to log sequence 3321
Thread 1 opened at log sequence 3321
  Current log# 3 seq# 3321 mem# 0: D:\ORACLE\ORADATA\ORCL\REDO01.LOG
Successful open of redo thread 1.
Fri Jul 25 10:50:09 2014
SMON: enabling cache recovery
Fri Jul 25 10:50:10 2014
Errors in file D:\oracle\admin\ORCL\udump\ORA03216.TRC:
ORA-00600: ??????????: [4194], [12], [37], [], [], [], [], []
Fri Jul 25 10:50:10 2014
Recovery of Online Redo Log: Thread 1 Group 3 Seq 3321 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\ORADATA\ORCL\REDO01.LOG
Fri Jul 25 10:50:10 2014
SMON: disabling cache recovery
Fri Jul 25 10:50:10 2014
ORA-600 signalled during: alter database open

ORA-600 4194这个错误在数据库异常恢复中非常常见,因为库不是很重要,因此就是直接屏蔽掉故障回滚段,然后强制拉库,该库的恢复过程中,也直接使用隐含参数屏蔽回滚段
_corrupted_rollback_segments= RBS0, RBS1, RBS2, RBS3, RBS4, RBS5, RBS6, RBS_HDSYS,数据库依然无法open,进一步分析trace文件

Fri Jul 25 11:26:07 2014
ORACLE V8.1.7.0.0 - Production vsnsta=0
vsnsql=e vsnxtr=3
Windows 2000 Version 5.2 Service Pack 2, CPU type 586
Oracle8i Release 8.1.7.0.0 - Production
JServer Release 8.1.7.0.0 - Production
Windows 2000 Version 5.2 Service Pack 2, CPU type 586
Instance name: orcl
Redo thread mounted by this instance: 1
Oracle process number: 14
Windows thread id: 3648, image: ORACLE.EXE
*** SESSION ID:(11.1) 2014-07-25 11:26:07.843
*** 2014-07-25 11:26:07.843
ksedmp: internal or fatal error
ORA-00600: ??????????: [4194], [12], [37], [], [], [], [], []
Current SQL statement for this session:
update undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,xactsqn=:8,
scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12 where us#=:1
----- Call Stack Trace -----

这里很明显看出来,数据库是在open过程中,update undo$表遭遇到ORA-600 4194,因为该过程需要使用系统回滚段,但是由于其所对应的undo和redo信息不一致,所以无法正常启动数据库.继续读trace文件

  Extent Control Header
  -----------------------------------------------------------------
  Extent Header:: spare1: 0      space2: 0      #extents: 5      #blocks: 49
                  last map  0x00000000  #maps: 0      offset: 4128
      Highwater::  0x00400006  ext#: 0      blk#: 3      ext size: 9
  #blocks in seg. hdr's freelists: 0
  #blocks below: 0
  mapblk  0x00000000  offset: 0
                   Unlocked
     Map Header:: next  0x00000000  #extents: 5    obj#: 0      flag: 0x40000000
  Extent Map
  -----------------------------------------------------------------
   0x00400003  length: 9
   0x0040000c  length: 10
   0x0040008f  length: 10
   0x00400099  length: 10
   0x004000a3  length: 10
  TRN CTL:: seq: 0x003c chd: 0x004e ctl: 0x0050 inc: 0x00000000 nfb: 0x0000
            mgc: 0x8002 xts: 0x0068 flg: 0x0001 opt: 2147483646 (0x7ffffffe)
            uba: 0x00400006.003c.25 scn: 0x0000.009a4009
Version: 0x01
  FREE BLOCK POOL::
    uba: 0x00000000.003c.24 ext: 0x0  spc: 0x196
    uba: 0x00000000.001f.14 ext: 0x1  spc: 0x16f6
    uba: 0x00000000.0018.02 ext: 0x4  spc: 0x1f1a
    uba: 0x00000000.0000.00 ext: 0x0  spc: 0x0
    uba: 0x00000000.0000.00 ext: 0x0  spc: 0x0
  TRN TBL::

通过这里可以看出来,数据库在启动的时候,使用system undo的block为为0x00400006,使用bbed清除掉该uba记录,让数据库启动的时候重新分配system undo block给数据库执行update undo$使用,数据库open成功

BBED> m /x 0x00000000
 File: D:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF (0)
 Block: 2                Offsets: 4188 to 4192           Dba:0x00000000
------------------------------------------------------------------------
 00000000 3c002400 00009601 00000000 1f001400 0100f616 00000000 18000200
BBED> m /x 0x0000
 File: D:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF (0)
 Block: 2                Offsets: 4028 to 4032           Dba:0x00000000
------------------------------------------------------------------------
 00000000 00000000 3c005000 02800100 68000000 feffff7f 06004000 3c002400
Sat Jul 26 12:09:21 2014
Thread recovery: start rolling forward thread 1
Recovery of Online Redo Log: Thread 1 Group 2 Seq 3326 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\ORADATA\ORCL\REDO02.LOG
Sat Jul 26 12:09:21 2014
Thread recovery: finish rolling forward thread 1
Thread recovery: 0 data blocks read, 0 data blocks written, 3 redo blocks read
Crash recovery completed successfully
Sat Jul 26 12:09:22 2014
Thread 1 advanced to log sequence 3327
Thread 1 opened at log sequence 3327
  Current log# 3 seq# 3327 mem# 0: D:\ORACLE\ORADATA\ORCL\REDO01.LOG
Successful open of redo thread 1.
Sat Jul 26 12:09:22 2014
SMON: enabling cache recovery
SMON: enabling tx recovery
Sat Jul 26 12:09:39 2014
Completed: alter database open

分享一次ORA-01113 ORA-01110故障处理过程

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:分享一次ORA-01113 ORA-01110故障处理过程

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库报ORA-01113 ORA-01110无法正常open

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF'

alert日志

Wed Jul 16 17:17:56 2014
ALTER DATABASE   MOUNT
Successful mount of redo thread 1, with mount id 1380870980
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT
Wed Jul 16 17:19:01 2014
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_m000_3540.trc  (incident=367575):
ORA-00700: soft internal error, arguments: [dbgrmblcp_corrupt_page], [D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\metadata\HM_FINDING.ams],
[1245], [], [], [], [], [], [], [], [], []
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF'
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_367575\orcl_m000_3540_i367575.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_m000_3540.trc  (incident=367576):
ORA-00600: internal error code, arguments: [dbgrmblgp_get_page_1], [1245], [0], [80], [], [], [], [], [], [], [], []
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF'
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_367576\orcl_m000_3540_i367576.trc
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_m000_3540.trc:
ORA-51106: check failed to complete due to an error.  See error below
ORA-00600: internal error code, arguments: [dbgrmblgp_get_page_1], [1245], [0], [80], [], [], [], [], [], [], [], []
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF'

这里出现ORA-00700[dbgrmblcp_corrupt_page]与ORA-00600[dbgrmblgp_get_page_1]是Bug 10321285 : ORA-600 [DBGRMBLCP_CORRUPT_PAGE]

尝试recover database 遭遇ORA-00283 ORA-16433

SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-16433: The database must be opened in read/write mode.

alert日志信息

Wed Jul 16 17:36:33 2014
ALTER DATABASE RECOVER  database
Media Recovery Start
 started logmerger process
Wed Jul 16 17:36:33 2014
Media Recovery failed with error 16433
Slave exiting with ORA-283 exception
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_pr00_3868.trc:
ORA-00283: recovery session canceled due to errors
ORA-16433: The database must be opened in read/write mode.
Recovery Slave PR00 previously exited with exception 283
ORA-283 signalled during: ALTER DATABASE RECOVER  database  ...
Wed Jul 16 17:36:45 2014
alter database open
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_3236.trc:
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF'
ORA-1113 signalled during: alter database open...

重建控制文件后恢复,遭遇ORA-600 2662错误

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup nomount
ORACLE instance started.
Total System Global Area      10255212544 bytes
Fixed Size                        2264088 bytes
Variable Size                  4529849320 bytes
Database Buffers               5704253440 bytes
Redo Buffers                     18845696 bytes
SQL> @d:\ctl.sql
Control file created.
SQL> recover database using backup controlfile;
ORA-00279: change 13953689551797 generated at 07/16/2014 17:53:41 needed for
thread 1
ORA-00289: suggestion :
D:\APP\ADMINISTRATOR\FAST_RECOVERY_AREA\ORCL\ARCHIVELOG\2014_07_16\O1_MF_1_1_%U_
.ARC
ORA-00280: change 13953689551797 for thread 1 is in sequence #1
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
d:\app\administrator\oradata\orcl\redo01.log
Log applied.
Media recovery complete.
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [2662], [3248], [3635774399],
[3249], [3580470066], [4194545], [], [], [], [], [], []
Process ID: 3932
Session ID: 200 Serial number: 1

因为该数据库是11.2.0.3未打上scn patch,直接推进scn(可以使用event,隐含参数,oradebug,bbed等),数据库就正常open

误drop tablespace后使用flashback database闪回异常处理

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:误drop tablespace后使用flashback database闪回异常处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友夜间打电话请求技术支持,数据库表空间被删除,然后使用flashback database 无法正常恢复。通过分析alert日志发现,创建表空间(xifenfei 别名),发现已经存在,就删除了该表空间(其实是业务核心表空间,误删除了,是否是连接错了数据库?)

Sat Jul 05 17:10:06 2014
create tablespace XIFENFEI datafile 'D:\Oracle\oradata\orcl\HANDBB.DBF'
size 50M autoextend on next 50M maxsize 1536M extent management local
Sat Jul 05 17:10:06 2014
ORA-1543 signalled during: create tablespace XIFENFEI datafile 'D:\Oracle\oradata\orcl\HANDBB.DBF'
size 50M autoextend on next 50M maxsize 1536M extent management local
...
Sat Jul 05 17:10:59 2014
drop tablespace XIFENFEI
Sat Jul 05 17:10:59 2014
ORA-1549 signalled during: drop tablespace XIFENFEI
...
Sat Jul 05 17:11:05 2014
drop tablespace XIFENFEI
ORA-1549 signalled during: drop tablespace XIFENFEI
...
Sat Jul 05 17:11:24 2014
drop tablespace XIFENFEI including contents
Sat Jul 05 17:11:36 2014
Thread 1 advanced to log sequence 186895 (LGWR switch)
  Current log# 1 seq# 186895 mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\GZSERVER\REDO01.LOG
Sat Jul 05 17:11:36 2014
ARC3: Warning.  Log sequence in archive filename wrapped
to fix length as indicated by %S in LOG_ARCHIVE_FORMAT.
Old log archive with same name might be overwritten.
Sat Jul 05 17:11:43 2014
LNS: Standby redo logfile selected for thread 1 sequence 186895 for destination LOG_ARCHIVE_DEST_4
Sat Jul 05 17:11:49 2014
LNS: Standby redo logfile selected for thread 1 sequence 186895 for destination LOG_ARCHIVE_DEST_2
Sat Jul 05 17:12:09 2014
Starting control autobackup
Control autobackup written to DISK device
	handle 'D:\FULLBACK\C-1342406147-20140705-00'
Completed: drop tablespace XIFENFEI including contents

通过这里可以发现删除表空间时间点为2014年7月5日17:12:09

闪回数据库到删除之前时间点

Sat Jul 05 18:16:54 2014
Database mounted in Exclusive Mode
Completed: ALTER DATABASE   MOUNT
Sat Jul 05 18:19:23 2014
FLASHBACK DATABASE TO TIMESTAMP TO_TIMESTAMP('2014-07-05 17:09:00','YYYY-MM-DD HH24:MI:SS')
Sat Jul 05 18:19:25 2014
Flashback Restore Start
Sat Jul 05 18:20:52 2014
--闪回时的控制文件中无表空间XIFENFEI信息(因为已经被删除),
--但是由于闪回的system 数据字典里面有相关文件信息,因此数据库在控制文件里面创建相关文件信息
Flashback: created tablespace #6: 'XIFENFEI' in the controlfile.
Flashback: created OFFLINE file 'UNNAMED00012' for tablespace #6 in the controlfile.
Filename was:
'D:\ORACLE\PRODUCT\10.2.0\ORADATA\GZSERVER\XIFENFEI4.DBF' when dropped.
File will have to be restored from a backup and recovered.
Flashback: created OFFLINE file 'UNNAMED00010' for tablespace #6 in the controlfile.
Filename was:
'D:\ORACLE\PRODUCT\10.2.0\ORADATA\GZSERVER\XIFENFEI3.DBF' when dropped.
File will have to be restored from a backup and recovered.
Flashback: created OFFLINE file 'UNNAMED00008' for tablespace #6 in the controlfile.
Filename was:
'D:\ORACLE\PRODUCT\10.2.0\ORADATA\GZSERVER\XIFENFEI2.DBF' when dropped.
File will have to be restored from a backup and recovered.
Flashback: created OFFLINE file 'UNNAMED00005' for tablespace #6 in the controlfile.
Filename was:
'D:\ORACLE\PRODUCT\10.2.0\ORADATA\GZSERVER\XIFENFEI.DBF' when dropped.
File will have to be restored from a backup and recovered.
Flashback Restore Complete
Flashback Media Recovery Start
 parallel recovery started with 15 processes
Flashback Media Recovery Log D:\ORACLE\PRODUCT\10.2.0\ORADATA\ARCHIVE\ARC\ARC86891_0766797318.001
Flashback Media Recovery Log D:\ORACLE\PRODUCT\10.2.0\ORADATA\ARCHIVE\ARC\ARC86892_0766797318.001
Sat Jul 05 18:21:40 2014
Flashback Media Recovery Log D:\ORACLE\PRODUCT\10.2.0\ORADATA\ARCHIVE\ARC\ARC86893_0766797318.001
Sat Jul 05 18:21:47 2014
WARNING: inbound connection timed out (ORA-3136)
Sat Jul 05 18:22:11 2014
Recovery of Online Redo Log: Thread 1 Group 3 Seq 186894 Reading mem 0
  Mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\GZSERVER\REDO03.LOG
Sat Jul 05 18:22:39 2014
Incomplete Recovery applied until change 9078991241
Flashback Media Recovery Complete
ORA-38795 signalled during: FLASHBACK DATABASE TO TIMESTAMP TO_TIMESTAMP('2014-07-05 17:09:00','YYYY-MM-DD HH24:MI:SS')...
Sat Jul 05 18:30:11 2014
ALTER DATABASE OPEN RESETLOGS
Sat Jul 05 18:30:11 2014
ORA-1245 signalled during: ALTER DATABASE OPEN RESETLOGS...
--重命名相关UNNAMExxxxx文件名到硬盘上被删除表空间文件
Sat Jul 05 18:39:31 2014
alter database rename file 'D:\ORACLE\PRODUCT\10.2.0\DB_1\DATABASE\UNNAMED00005'
to 'D:\oracle\product\10.2.0\oradata\gzserver\XIFENFEI.DBF'
Sat Jul 05 18:39:31 2014
Completed: alter database rename file 'D:\ORACLE\PRODUCT\10.2.0\DB_1\DATABASE\UNNAMED00005'
to 'D:\oracle\product\10.2.0\oradata\gzserver\XIFENFEI.DBF'
Sat Jul 05 18:39:47 2014
alter database rename file 'D:\ORACLE\PRODUCT\10.2.0\DB_1\DATABASE\UNNAMED00008'
to 'D:\oracle\product\10.2.0\oradata\gzserver\XIFENFEI2.DBF'
Sat Jul 05 18:39:47 2014
Completed: alter database rename file 'D:\ORACLE\PRODUCT\10.2.0\DB_1\DATABASE\UNNAMED00008'
to 'D:\oracle\product\10.2.0\oradata\gzserver\XIFENFEI2.DBF'
Sat Jul 05 18:39:59 2014
alter database rename file 'D:\ORACLE\PRODUCT\10.2.0\DB_1\DATABASE\UNNAMED00010'
to 'D:\oracle\product\10.2.0\oradata\gzserver\XIFENFEI3.DBF'
Sat Jul 05 18:39:59 2014
Completed: alter database rename file 'D:\ORACLE\PRODUCT\10.2.0\DB_1\DATABASE\UNNAMED00010'
to 'D:\oracle\product\10.2.0\oradata\gzserver\XIFENFEI3.DBF'
Sat Jul 05 18:40:12 2014
alter database rename file 'D:\ORACLE\PRODUCT\10.2.0\DB_1\DATABASE\UNNAMED00012'
to 'D:\oracle\product\10.2.0\oradata\gzserver\XIFENFEI4.DBF'
Sat Jul 05 18:40:12 2014
Completed: alter database rename file 'D:\ORACLE\PRODUCT\10.2.0\DB_1\DATABASE\UNNAMED00012'
to 'D:\oracle\product\10.2.0\oradata\gzserver\XIFENFEI4.DBF'
Sat Jul 05 18:41:25 2014
ALTER DATABASE OPEN RESETLOGS
Sat Jul 05 18:41:25 2014
ORA-1245 signalled during: ALTER DATABASE OPEN RESETLOGS...

到这里,可以看出来,因为数据库整体已经闪回,但是被drop 表空间的四个数据文件未被正常闪回,因此该四个文件的scn可能异常,通过数据库恢复检查脚本(Oracle Database Recovery Check)检查结果如下
datafile_scn
datafile_header_scn
这里很明显控制文件中的scn信息混乱不做过多参考,数据文件头信息看到只有ts# 6中的四个文件(就是被删除的表空间文件)scn过大,其他文件scn都处于正常状态(处于干净状态),到这里很明显,数据库闪回成功,但是被drop tablespace的数据文件未被闪回,因此该故障可以通过bbed修改四个文件头信息和其他文件相同即可使得数据库恢复正常
温馨提示:数据库操作需要慎重,备份重于一切

数据库恢复历史再次刷新到Oracle 7.3.2版本—redo异常恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:数据库恢复历史再次刷新到Oracle 7.3.2版本—redo异常恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有网友在QQ上找我,说Oracle 7.3的数据库,因为redo异常咨询我是否可以恢复
qq咨询


检查数据库得到以下信息

SVRMGR> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle7 Workgroup Server Release 7.3.2.2.1 - Production Release
PL/SQL Release 2.3.2.2.0 - Production
CORE Version 3.5.2.0.0 - Production
TNS for 32-bit Windows: Version 2.3.2.1.0 - Production
NLSRTL Version 3.2.2.0.0 - Production
已选择 5 行

数据文件信息
qq
qq2


redo信息
qq3


跳过redo进行恢复,在resetlogs过程中报rbs表空间坏块,然后通过dul工具获得回滚段名称,然后使用隐含参数屏蔽掉

License high water mark = 2
Starting up ORACLE RDBMS Version: 7.3.2.2.1.
System parameters with non-default values:
  processes                = 800
  shared_pool_size         = 540000000
  control_files            = D:\ORANT\DATABASE\ctl1orcl.ora, D:\ORANT\DATABASE\ctl2orcl.ora
  compatible               = 7.3.0.0.0
  log_buffer               = 327680
  log_checkpoint_interval  = 1000000
  db_files                 = 40
  db_file_simultaneous_writes= 1280
  max_rollback_segments    = 12800
  _offline_rollback_segments= RB13, RB14, RB15, RB16, RB20
  _corrupted_rollback_segments= RB13, RB14, RB15, RB16, RB20
  sequence_cache_entries   = 100
  sequence_cache_hash_buckets= 100
  remote_login_passwordfile= SHARED
  mts_servers              = 0
  mts_max_servers          = 0
  mts_max_dispatchers      = 0
  audit_trail              = NONE
  sort_area_retained_size  = 65536
  sort_direct_writes       = AUTO
  db_name                  = oracle
  open_cursors             = 800
  text_enable              = TRUE
  snapshot_refresh_processes= 1
  background_dump_dest     = %RDBMS73%\trace
  user_dump_dest           = %RDBMS73%\trace
Mon Jun 16 16:46:57 2014
PMON started
Mon Jun 16 16:46:57 2014
DBWR started
Mon Jun 16 16:46:57 2014
LGWR started
Mon Jun 16 16:46:57 2014
RECO started
Mon Jun 16 16:46:57 2014
SNP0 started
Mon Jun 16 16:46:57 2014
alter database  mount exclusive
Mon Jun 16 16:46:58 2014
Successful mount of redo thread 1.
Mon Jun 16 16:46:58 2014
Completed: alter database  mount exclusive
Mon Jun 16 16:48:15 2014
alter database open
Mon Jun 16 16:48:16 2014
Beginning crash recovery of 1 threads
Crash recovery completed successfully
Mon Jun 16 16:48:17 2014
Thread 1 advanced to log sequence 9
  Current log# 1 seq# 9 mem# 0: D:\ORANT\DATABASE\LOG2ORCL.ORA
Thread 1 opened at log sequence 9
  Current log# 1 seq# 9 mem# 0: D:\ORANT\DATABASE\LOG2ORCL.ORA
Successful open of redo thread 1.
Mon Jun 16 16:48:18 2014
SMON: enabling cache recovery
Mon Jun 16 16:48:19 2014
Completed: alter database open
Mon Jun 16 16:48:20 2014
SMON: enabling tx recovery
SMON: about to recover undo segment 14
SMON: mark undo segment 14 as needs recovery
SMON: about to recover undo segment 15
SMON: mark undo segment 15 as needs recovery
SMON: about to recover undo segment 16
SMON: mark undo segment 16 as needs recovery
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as needs recovery
SMON: about to recover undo segment 18
SMON: mark undo segment 18 as needs recovery
Mon Jun 16 16:48:20 2014
Errors in file D:\ORANT\RDBMS73\trace\orclSMON.TRC:
ORA-00600: internal error code, arguments: [4306], [21], [2], [], [], [], [], []

数据库在启动过程中出现ORA-00600[4306],导致smon异常。该错误是因为在数据库open过程中smon会清理临时段从而出现该错误,通过设置event跳过,数据库算整体打开,不过在恢复过程中还遇到了

Mon Jun 16 17:53:10 2014
Errors in file D:\ORANT\RDBMS73\trace\orclDBWR.TRC:
ORA-00600: internal error code, arguments: [3600], [3], [14], [], [], [], [], []
Mon Jun 16 18:05:12 2014
Errors in file D:\ORANT\RDBMS73\trace\ORA06880.TRC:
ORA-01578: ORACLE数据块有错(文件号12, 块号46644)
ORA-01110: 文件'12'没有联机
ORA-00600: 内部错误码, 变元: [4194], [18], [5], [], [], []
ORA-00600: 内部错误码, 变元: [4194], [18], [5], [], [], []

ORA-00600[3600]是因为在offline 回滚段所在表空间锁出现的问题
ORA-00600[4194]是因为回滚段所在的表空间数据文件出现坏块所导致