通过rowid获取segment header坏块数据

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:通过rowid获取segment header坏块数据

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在上篇(table中各种坏块对select/dml操作影响)中说到如果segment header异常了,不能通过ctas来获得相关数据,在群的讨论中,EZIO说到可以通过rowid方式来获得相关数据,通过测试证明,确实可以通过该方法获得数据,以后遇到此类错误,大家也不必惊慌.
创建测试表

SQL> create table t_xifenfei
  2  as
  3  select * from dba_objects where object_id is not null;
SQL> alter table t_xifenfei
  2  add constraint PK_t_xifenfei primary key (object_id)
  3  ;
Table altered.
SQL> alter system checkpoint;
System altered.
SQL> select count(*) from chf.t_xifenfei;
  COUNT(*)
----------
     74762
SQL> select header_file,header_block from
  2  DBA_SEGments where segment_name='T_XIFENFEI' AND OWNER='CHF';
HEADER_FILE HEADER_BLOCK
----------- ------------
          4          170

dump block

alter system dump datafile 4 block 170;
Dump of buffer cache at level 4 for tsn=4 rdba=16777386
Block dump from disk:
buffer tsn: 4 rdba: 0x010000aa (4/170)
scn: 0x0b8c.3c0092e4 seq: 0x01 flg: 0x04 tail: 0x91e42301
frmt: 0x02 chkval: 0xa531 type: 0x23=PAGETABLE SEGMENT HEADER

通过header_block和dump block确定block 170即为PAGETABLE SEGMENT HEADER

bbed制造SEGMENT HEADER坏块

BBED> set block 170
        BLOCK#          170
BBED> set offset 8188
        OFFSET          8188
BBED> d
 File: /u01/oracle/oradata/ora11g/users01.dbf (0)
 Block: 170              Offsets: 8188 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 0123e492
 <32 bytes per line>
BBED> m /x 0123e491
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: /u01/oracle/oradata/ora11g/users01.dbf (0)
 Block: 170              Offsets: 8188 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 0123e491
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 170:
current = 0xa531, required = 0xa531
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/oracle/oradata/ora11g/users01.dbf
BLOCK = 170
Block 170 is corrupt
Corrupt block relative dba: 0x010000aa (file 0, block 170)
Fractured block found during verification
Data in bad block:
 type: 35 format: 2 rdba: 0x010000aa
 last change scn: 0x0b8c.3c0092e4 seq: 0x1 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x91e42301
 check value in block header: 0xa531
 computed block checksum: 0x0
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 1
Total Blocks Influx           : 2
Message 531 not found;  product=RDBMS; facility=BBED

测试segment header坏块后select操作

SQL> select * from chf.t_xifenfei;
select * from chf.t_xifenfei
                         *
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 4, block # 170)
ORA-01110: data file 4: '/u01/oracle/oradata/ora11g/users01.dbf'
SQL> select /*+index(t PK_T_XIFENFEI)*/  count(rowid) from chf.t_xifenfei t;
COUNT(ROWID)
------------
       74762

基于rowid获取segment header 坏块对象数据

SQL> create table chf.bad_rows (table_name varchar2(60),
   2 row_id rowid, oracle_error_code number);
Table created.
SQL> DECLARE
  2   TYPE RowIDTab IS TABLE OF ROWID INDEX BY BINARY_INTEGER;
  3   CURSOR c1 IS  select /*+index(t PK_T_XIFENFEI)*/ rowid from chf.t_xifenfei t;
  4   r RowIDTab;
  5   rows  NATURAL := 20000;
  6   bad_rows number := 0 ;
  7   errors number;
  8   error_code number;
  9   myrowid rowid;
 10  BEGIN
 11   OPEN c1;
 12   LOOP
 13     FETCH  c1 BULK COLLECT INTO r LIMIT rows;
 14     EXIT WHEN r.count=0;
 15     BEGIN
 16      FORALL i IN r.FIRST..r.LAST SAVE EXCEPTIONS
 17       insert into chf.t_xifenfei_new
 18       select /*+ ROWID(A) */ *
 19       from chf.t_xifenfei A where rowid = r(i);
 20     EXCEPTION
 21     when OTHERS then
 22      BEGIN
 23       errors := SQL%BULK_EXCEPTIONS.COUNT;
 24       FOR err1 IN 1..errors LOOP
 25           error_code := SQL%BULK_EXCEPTIONS(err1).ERROR_CODE;
 26           myrowid := r(SQL%BULK_EXCEPTIONS(err1).ERROR_INDEX);
 27           bad_rows := bad_rows + 1;
 28           insert into chf.bad_rows values('chf.t_xifenfei',myrowid, error_code);
 29       END LOOP;
 30       END;
 31     END;
 32    commit;
 33   END LOOP;
 34   commit;
 35   CLOSE c1;
 36   dbms_output.put_line('Total Bad Rows: '||bad_rows);
 37  END;
 38  /
Total Bad Rows: 0
PL/SQL procedure successfully completed.
SQL> select count(*) from chf.t_xifenfei_new;
  COUNT(*)
----------
     74762

通过上面pl/sql,基于rowid成功获得segment header 异常对象中的所有数据记录.如果没有主键的表出现该问题,可以参考:使用plsql抢救数据

table中各种类型block坏块是否能被跳过

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:table中各种类型block坏块是否能被跳过

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在table遇到的各种坏块中,大部分情况,我们都可以通过设置event 10231或者dbms_repair来跳过坏块,抢救其他数据;但是在部分情况下,我们设置了他们依然不能跳过坏块,数据库依然报ORA-01578,本文测试了table中各种类型的block,证明在哪些blog出现异常之后不能被跳过.
如果是事务数据块出现坏块可以通过dbms_repair.skip_corrupt_blocks来标记坏块(也可以使用event 10231)来实现;对于BITMAP BLOCK如果出现坏块不会对于读取数据无影响(至于其他操作,请见后续blog);SEGMENT HEADER如果出现坏块,无法通过跳过坏块来实现获取其他数据(正常block)
创建测试表

SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> create table t_xifenfei
  2  tablespace users
  3  as
  4  select * from dba_objects;
Table created.
SQL>  select count(*) from chf.t_xifenfei;
  COUNT(*)
----------
     74663

查询相关block信息

SQL> select SEGMENT_NAME,HEADER_FILE,HEADER_BLOCK,blocks,extents from DBA_SEGMENTS
  2  WHERE OWNER='CHF' AND SEGMENT_NAME='T_XIFENFEI';
SEGMENT_NAME    HEADER_FILE HEADER_BLOCK     BLOCKS    EXTENTS
--------------- ----------- ------------ ---------- ----------
T_XIFENFEI                4          378       1152         24
SQL>   select
  2    dbms_rowid.rowid_relative_fno(rowid)rel_fno,
  3    max(dbms_rowid.rowid_block_number(rowid)) max_block,
  4    min(dbms_rowid.rowid_block_number(rowid)) min_block
  5    from chf.t_xifenfei
  6    group by dbms_rowid.rowid_relative_fno(rowid);
   REL_FNO  MAX_BLOCK  MIN_BLOCK
---------- ---------- ----------
         4       1728        379
SQL> select EXTENT_ID,FILE_ID,BLOCK_ID,blocks from dba_extents where owner='CHF'
   2  AND SEGMENT_NAME='T_XIFENFEI';
 EXTENT_ID    FILE_ID   BLOCK_ID     BLOCKS
---------- ---------- ---------- ----------
         0          4        376          8
         1          4        640          8
         2          4        648          8
         3          4        656          8
         4          4        664          8
         5          4        672          8
         6          4        680          8
         7          4        688          8
         8          4        696          8
         9          4        704          8
        10          4        712          8
        11          4        720          8
        12          4        728          8
        13          4        736          8
        14          4        744          8
        15          4        752          8
        16          4        768        128
        17          4        896        128
        18          4       1024        128
        19          4       1152        128
        20          4       1280        128
        21          4       1408        128
        22          4       1536        128
        23          4       1664        128

通过这里可以知道:真正的存储数据是从block 379开始,至于block 376、377、378是什么,使用dump block分析

验证block类型

SQL> alter system dump datafile 4 block 376;
System altered.
SQL> alter system dump datafile 4 block 377;
System altered.
SQL> alter system dump datafile 4 block 378;
System altered.
SQL> alter system dump datafile 4 block 379;
System altered.
--该block是另外extent的开始,所以也尝试分析是否有特殊之处
SQL> alter system dump datafile 4 block 640;
System altered.
--dump 文件header信息
Start dump data blocks tsn: 4 file#:4 minblk 376 maxblk 376
Block dump from cache:
Dump of buffer cache at level 4 for tsn=4 rdba=16777592
Block dump from disk:
buffer tsn: 4 rdba: 0x01000178 (4/376)
scn: 0x0b8c.3bfc6517 seq: 0x04 flg: 0x04 tail: 0x65172004
frmt: 0x02 chkval: 0xc8b3 type: 0x20=FIRST LEVEL BITMAP BLOCK
Start dump data blocks tsn: 4 file#:4 minblk 377 maxblk 377
Block dump from cache:
Dump of buffer cache at level 4 for tsn=4 rdba=16777593
Block dump from disk:
buffer tsn: 4 rdba: 0x01000179 (4/377)
scn: 0x0b8c.3bfc6517 seq: 0x18 flg: 0x04 tail: 0x65172118
frmt: 0x02 chkval: 0x9e8c type: 0x21=SECOND LEVEL BITMAP BLOCK
Start dump data blocks tsn: 4 file#:4 minblk 378 maxblk 378
Block dump from cache:
Dump of buffer cache at level 4 for tsn=4 rdba=16777594
BH (0x2a7f7f0c) file#: 4 rdba: 0x0100017a (4/378) class: 4 ba: 0x2a742000
  set: 5 pool: 3 bsz: 8192 bsi: 0 sflg: 1 pwc: 0,0
  dbwrid: 0 obj: 76372 objn: 76372 tsn: 4 afn: 4 hint: f
  hash: [0x3150a748,0x3150a748] lru: [0x2a7f8094,0x2a7f7ee4]
  lru-flags: hot_buffer
  ckptq: [NULL] fileq: [NULL] objq: [0x2f72dc34,0x2f72dc34] objaq: [0x2f72dc2c,0x2f72dc2c]
  st: XCURRENT md: NULL fpin: 'ktewh25: kteinicnt' tch: 1
  flags:
  LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
Block dump from disk:
buffer tsn: 4 rdba: 0x0100017a (4/378)
scn: 0x0b8c.3bfc651b seq: 0x01 flg: 0x04 tail: 0x651b2301
frmt: 0x02 chkval: 0xb2ae type: 0x23=PAGETABLE SEGMENT HEADER
Start dump data blocks tsn: 4 file#:4 minblk 379 maxblk 379
Block dump from cache:
Dump of buffer cache at level 4 for tsn=4 rdba=16777595
Block dump from disk:
buffer tsn: 4 rdba: 0x0100017b (4/379)
scn: 0x0b8c.3bfc6494 seq: 0x01 flg: 0x04 tail: 0x64940601
frmt: 0x02 chkval: 0x0567 type: 0x06=trans data
Start dump data blocks tsn: 4 file#:4 minblk 640 maxblk 640
Block dump from cache:
Dump of buffer cache at level 4 for tsn=4 rdba=16777856
Block dump from disk:
buffer tsn: 4 rdba: 0x01000280 (4/640)
scn: 0x0b8c.3bfc6496 seq: 0x01 flg: 0x04 tail: 0x64960601
frmt: 0x02 chkval: 0x0efe type: 0x06=trans data

这里可以知道:
1.block 376、377为BITMAP BLOCK
2.block 378为SEGMENT HEADER(和dba_segments视图中一致)
3.除extent 0中有特殊(含BITMAP BLOCK和SEGMENT HEADER)block,其他extent只包含事务数据

测试block 640

--block 640包含条数
SQL> select   count(rowid)
  2    from chf.t_xifenfei
  3     where dbms_rowid.rowid_block_number(rowid)=640
  4  and dbms_rowid.rowid_relative_fno(rowid)=4;
COUNT(ROWID)
------------
          79
--bbed修改tailchk
BBED> set filename '/u01/oracle/oradata/ora11g/users01.dbf'
        FILENAME        /u01/oracle/oradata/ora11g/users01.dbf
BBED> set block 640
        BLOCK#          640
BBED> set mode edit
        MODE            Edit
BBED> p tailchk
ub4 tailchk                                 @8188     0x64960601
BBED> m /x 64960602
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: /u01/oracle/oradata/ora11g/users01.dbf (0)
 Block: 640              Offsets: 8188 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 64960602
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 640:
current = 0xf80b, required = 0xf80b
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/oracle/oradata/ora11g/users01.dbf
BLOCK = 640
Block 640 is corrupt
Corrupt block relative dba: 0x01000280 (file 0, block 640)
Fractured block found during verification
Data in bad block:
 type: 6 format: 2 rdba: 0x01000280
 last change scn: 0x0b8c.3bfc6496 seq: 0x1 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x02069664
 check value in block header: 0xf80b
 computed block checksum: 0x0
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 1
Total Blocks Influx           : 2
Message 531 not found;  product=RDBMS; facility=BBED
--查询坏块
SQL> select count(*) from chf.t_xifenfei;
select count(*) from chf.t_xifenfei
                         *
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 4, block # 640)
ORA-01110: data file 4: '/u01/oracle/oradata/ora11g/users01.dbf'
--跳过坏块
SQL> exec dbms_repair.skip_corrupt_blocks('CHF','T_XIFENFEI');
PL/SQL procedure successfully completed.
SQL>  select skip_corrupt from dba_tables where table_name='T_XIFENFEI' AND OWNER='CHF';
SKIP_COR
--------
ENABLED
SQL>  select count(*) from chf.t_xifenfei;
  COUNT(*)
----------
     74584
--修复坏块
BBED> m /x 01069664
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: /u01/oracle/oradata/ora11g/users01.dbf (0)
 Block: 640              Offsets: 8188 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 01069664
 <32 bytes per line>
BBED> p tailchk
ub4 tailchk                                 @8188     0x64960601
BBED> sum apply
Check value for File 0, Block 640:
current = 0x0efe, required = 0x0efe
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/oracle/oradata/ora11g/users01.dbf
BLOCK = 640
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 1
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0
Message 531 not found;  product=RDBMS; facility=BBED
--除掉标记表坏块
SQL> BEGIN
  2  DBMS_REPAIR.SKIP_CORRUPT_BLOCKS (
  3  SCHEMA_NAME => 'CHF',
  4  OBJECT_NAME => 'T_XIFENFEI',
  5  OBJECT_TYPE => dbms_repair.table_object,
  6  FLAGS => dbms_repair.NOSKIP_FLAG);
  7  END;
  8  /
PL/SQL procedure successfully completed.
SQL> select skip_corrupt from dba_tables where table_name='T_XIFENFEI' AND OWNER='CHF';
SKIP_COR
--------
DISABLED
--查询表记录正常
SQL>  select count(*) from chf.t_xifenfei;
  COUNT(*)
----------
     74663

在后续的操作中,也是按照类似步骤操作,考虑到篇幅有限,部分过程不再贴出来

测试block 379

SQL>  select count(*) from chf.t_xifenfei;
 select count(*) from chf.t_xifenfei
                          *
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 4, block # 379)
ORA-01110: data file 4: '/u01/oracle/oradata/ora11g/users01.dbf'
SQL> exec dbms_repair.skip_corrupt_blocks('CHF','T_XIFENFEI');
PL/SQL procedure successfully completed.
SQL> select count(*) from chf.t_xifenfei;
  COUNT(*)
----------
     74575

测试block 378

BBED> set block 378
        BLOCK#          378
--segment header 不支持bbed查看结构
BBED> p tailchk
BBED-00400: invalid blocktype (35)
BBED> map
 File: /u01/oracle/oradata/ora11g/users01.dbf (0)
 Block: 378                                   Dba:0x00000000
------------------------------------------------------------
BBED-00400: invalid blocktype (35)
BBED> set offset 8188
        OFFSET          8188
BBED> d
 File: /u01/oracle/oradata/ora11g/users01.dbf (0)
 Block: 378              Offsets: 8188 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 01231b65
 <32 bytes per line>
BBED> m /x 651b2302
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: /u01/oracle/oradata/ora11g/users01.dbf (0)
 Block: 378              Offsets: 8188 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 651b2302
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 378:
current = 0xedf2, required = 0xedf2
--验证坏块
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/oracle/oradata/ora11g/users01.dbf
BLOCK = 378
Block 378 is corrupt
Corrupt block relative dba: 0x0100017a (file 0, block 378)
Fractured block found during verification
Data in bad block:
 type: 35 format: 2 rdba: 0x0100017a
 last change scn: 0x0b8c.3bfc651b seq: 0x1 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x02231b65
 check value in block header: 0xedf2
 computed block checksum: 0x0
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 1
Total Blocks Influx           : 2
Message 531 not found;  product=RDBMS; facility=BBED
SQL>  select count(*) from chf.t_xifenfei;
 select count(*) from chf.t_xifenfei
                          *
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 4, block # 378)
ORA-01110: data file 4: '/u01/oracle/oradata/ora11g/users01.dbf'
--标记跳过坏块
SQL> exec dbms_repair.skip_corrupt_blocks('CHF','T_XIFENFEI');
PL/SQL procedure successfully completed.
--查询依然失败
SQL> select count(*) from chf.t_xifenfei;
select count(*) from chf.t_xifenfei
                         *
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 4, block # 378)
ORA-01110: data file 4: '/u01/oracle/oradata/ora11g/users01.dbf'

测试block 377

BBED> m /x 18211766
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: /u01/oracle/oradata/ora11g/users01.dbf (0)
 Block: 377              Offsets: 8188 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 18211766
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 377:
current = 0x9d8c, required = 0x9d8c
--bbed验证为坏块
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/oracle/oradata/ora11g/users01.dbf
BLOCK = 377
Block 377 is corrupt
Corrupt block relative dba: 0x01000179 (file 0, block 377)
Fractured block found during verification
Data in bad block:
 type: 33 format: 2 rdba: 0x01000179
 last change scn: 0x0b8c.3bfc6517 seq: 0x18 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x66172118
 check value in block header: 0x9d8c
 computed block checksum: 0x0
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 1
Total Blocks Influx           : 2
Message 531 not found;  product=RDBMS; facility=BBED
--dbv验证为坏块
[oracle@xifenfei ~]$ dbv file=/u01/oracle/oradata/ora11g/users01.dbf
DBVERIFY: Release 11.2.0.3.0 - Production on Fri Jan 18 03:32:18 2013
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
DBVERIFY - Verification starting : FILE = /u01/oracle/oradata/ora11g/users01.dbf
Page 377 is influx - most likely media corrupt
Corrupt block relative dba: 0x01000179 (file 4, block 377)
Fractured block found during dbv:
Data in bad block:
 type: 33 format: 2 rdba: 0x01000179
 last change scn: 0x0b8c.3bfc6517 seq: 0x18 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x66172118
 check value in block header: 0x9d8c
 computed block checksum: 0x0
DBVERIFY - Verification complete
Total Pages Examined         : 2560
Total Pages Processed (Data) : 1434
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 10
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 213
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 902
Total Pages Marked Corrupt   : 1
Total Pages Influx           : 1
Total Pages Encrypted        : 0
Highest block SCN            : 1006486374 (2956.1006486374)
--查询表记录
SQL> select skip_corrupt from dba_tables where table_name='T_XIFENFEI' AND OWNER='CHF';
SKIP_COR
--------
DISABLED
SQL> select count(*) from chf.t_xifenfei;
  COUNT(*)
----------
     74663

测试block 376

BBED> m /x 04201766
 File: /u01/oracle/oradata/ora11g/users01.dbf (0)
 Block: 376              Offsets: 8188 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 04201766
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 376:
current = 0xcbb3, required = 0xcbb3
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/oracle/oradata/ora11g/users01.dbf
BLOCK = 376
Block 376 is corrupt
Corrupt block relative dba: 0x01000178 (file 0, block 376)
Fractured block found during verification
Data in bad block:
 type: 32 format: 2 rdba: 0x01000178
 last change scn: 0x0b8c.3bfc6517 seq: 0x4 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x66172004
 check value in block header: 0xcbb3
 computed block checksum: 0x0
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 1
Total Blocks Influx           : 2
Message 531 not found;  product=RDBMS; facility=BBED
SQL>  select count(*) from chf.t_xifenfei;
  COUNT(*)
----------
     74663

通过测试证明,如果是事务数据块出现坏块可以通过dbms_repair.skip_corrupt_blocks来标记坏块(也可以使用event 10231)来实现;对于BITMAP BLOCK如果出现坏块不会对于读取数据无影响(至于其他操作,请见table中各种坏块对select/dml操作影响);SEGMENT HEADER如果出现坏块,无法通过跳过坏块来实现获取其他数据(正常block)

recover遇到坏块处理本质探讨

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:recover遇到坏块处理本质探讨

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

如果在还原出来的数据文件中有坏块,而归档日志和联机日志是正常的,那么在应用日志恢复过程中,会出现什么情况,这里通过一个简单的测试给予其中一种情况的说明
创建测试表

SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production
SQL> create table t_xifenfei(object_id,object_name) tablespace xifenfei
  2  as
  3  select object_id,object_name from dba_objects
  4  where rownum<11;
Table created.
SQL> col object_name for a30
SQL> select   object_id,object_name,
  2   dbms_rowid.rowid_relative_fno(rowid)rel_fno,
  3  dbms_rowid.rowid_block_number(rowid)blockno
  4   from chf.t_xifenfei;
 OBJECT_ID OBJECT_NAME                       REL_FNO    BLOCKNO
---------- ------------------------------ ---------- ----------
        20 ICOL$                                   5         12
        44 I_USER1                                 5         12
        28 CON$                                    5         12
        15 UNDO$                                   5         12
        29 C_COBJ#                                 5         12
         3 I_OBJ#                                  5         12
        25 PROXY_ROLE_DATA$                        5         12
        39 I_IND1                                  5         12
        51 I_CDEF2                                 5         12
        26 I_PROXY_ROLE_DATA$_1                    5         12
10 rows selected.
SQL> select name from v$datafile where file#=5;
NAME
--------------------------------------------------------------
/u01/oracle/oradata/XFF/xifenfei01.dbf
SQL> update t_xifenfei set object_name='WWW.XIFENFEI.COM';
10 rows updated.
SQL> commit;
Commit complete.
SQL> create table t_xifenfei_new(object_id,object_name) tablespace xifenfei
  2  as
  3  select object_id,object_name from dba_objects
  4  where rownum<11;
Table created.
SQL> conn / as sysdba
Connected.
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

备份数据文件

[oracle@xifenfei XFF]$ cp xifenfei01.dbf  ../tmp/
[oracle@xifenfei XFF]$ ll ../tmp/xifenfei01.dbf
-rw-r----- 1 oracle oinstall 10493952 Sep 28 19:05 ../tmp/xifenfei01.dbf
[oracle@xifenfei XFF]$ date
Fri Sep 28 19:05:42 CST 2012

bbed破坏备份文件

[oracle@xifenfei XFF]$ bbed password=blockedit
BBED: Release 2.0.0.0.0 - Limited Production on Fri Sep 28 19:05:59 2012
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
************* !!! For Oracle Internal Use only !!! ***************
BBED> set filename '/u01/oracle/oradata/tmp/xifenfei01.dbf'
        FILENAME        /u01/oracle/oradata/tmp/xifenfei01.dbf
BBED> set block 12
        BLOCK#          12
BBED> set mode edit
        MODE            Edit
BBED> map
 File: /u01/oracle/oradata/tmp/xifenfei01.dbf (0)
 Block: 12                                    Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Table/Cluster)
 struct kcbh, 20 bytes                      @0
 struct ktbbh, 96 bytes                     @20
 struct kdbh, 14 bytes                      @124
 struct kdbt[1], 4 bytes                    @138
 sb2 kdbr[10]                               @142
 ub1 freespace[7666]                        @162
 ub1 rowdata[360]                           @7828
 ub4 tailchk                                @8188
BBED> p kcbh
struct kcbh, 20 bytes                       @0
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub1 spare1_kcbh                          @2        0x00
   ub1 spare2_kcbh                          @3        0x00
   ub4 rdba_kcbh                            @4        0x0140000c
   ub4 bas_kcbh                             @8        0x0004d7b0
   ub2 wrp_kcbh                             @12       0x000a
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x06 (KCBHFDLC, KCBHFCKV)
   ub2 chkval_kcbh                          @16       0xe573
   ub2 spare3_kcbh                          @18       0x0000
BBED> d offset 8188
 File: /u01/oracle/oradata/tmp/xifenfei01.dbf (0)
 Block: 12               Offsets: 8188 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 0106b0d7
 <32 bytes per line>
BBED> m /x 11
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: /u01/oracle/oradata/tmp/xifenfei01.dbf (0)
 Block: 12               Offsets: 8188 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 1106b0d7
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 12:
current = 0xe563, required = 0xe563
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/oracle/oradata/tmp/xifenfei01.dbf
BLOCK = 12
Block 12 is corrupt
Corrupt block relative dba: 0x0140000c (file 0, block 12)
Fractured block found during verification
Data in bad block:
 type: 6 format: 2 rdba: 0x0140000c
 last change scn: 0x000a.0004d7b0 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xd7b00611
 check value in block header: 0xe563
 computed block checksum: 0x0
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 1
Total Blocks Influx           : 2

修改数据库记录

SQL> startup
ORACLE instance started.
Total System Global Area  318767104 bytes
Fixed Size                  1267236 bytes
Variable Size             109054428 bytes
Database Buffers          201326592 bytes
Redo Buffers                7118848 bytes
Database mounted.
Database opened.
SQL> conn chf/xifenfei
Connected.
SQL> update t_xifenfei set object_name='惜分飞';
10 rows updated.
SQL> update t_xifenfei_new set object_name='惜分飞';
10 rows updated.
SQL> commit;
Commit complete.
SQL> alter system switch logfile;
System altered.
SQL> alter system switch logfile;
System altered.
SQL> alter system switch logfile;
System altered.
SQL> conn / as sysdba
Connected.
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

利用备份数据文件恢复数据库

[oracle@xifenfei XFF]$ cp xifenfei01.dbf xifenfei01.dbf_bak
[oracle@xifenfei XFF]$ cp ../tmp/xifenfei01.dbf  xifenfei01.dbf
[oracle@xifenfei XFF]$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.4.0 - Production on Fri Sep 28 19:13:59 2012
Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area  318767104 bytes
Fixed Size                  1267236 bytes
Variable Size             109054428 bytes
Database Buffers          201326592 bytes
Redo Buffers                7118848 bytes
Database mounted.
ORA-01113: file 5 needs media recovery
ORA-01110: data file 5: '/u01/oracle/oradata/XFF/xifenfei01.dbf'
--提示数据需要恢复
SQL> recover datafile 5;
ORA-00279: change 42949990720 generated at 09/28/2012 19:04:10 needed for
thread 1
ORA-00289: suggestion : /u01/oracle/oradata/XFF/archivelog/1_24_792679299.dbf
ORA-00280: change 42949990720 for thread 1 is in sequence #24
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
auto
Log applied.
Media recovery complete.
SQL> alter database open;
Database altered.
--利用被破坏的数据文件+归档日志恢复数据库正常
SQL> col object_name for a30
SQL> select   object_id,object_name from t_xifenfei;
select   object_id,object_name from t_xifenfei
                                    *
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 5, block # 12)
ORA-01110: data file 5: '/u01/oracle/oradata/XFF/xifenfei01.dbf'
--提示被破坏的数据块,查询不能完成
--证明坏块之外的数据块还是被正常应用日志
SQL>  select   object_id,object_name from t_xifenfei_new;
 OBJECT_ID OBJECT_NAME
---------- ------------------------------
        20 惜分飞
        44 惜分飞
        28 惜分飞
        15 惜分飞
        29 惜分飞
         3 惜分飞
        25 惜分飞
        39 惜分飞
        51 惜分飞
        26 惜分飞
10 rows selected.
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

dbv检查坏块

[oracle@xifenfei XFF]$ dbv file=xifenfei01.dbf
DBVERIFY: Release 10.2.0.4.0 - Production on Fri Sep 28 19:14:52 2012
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
DBVERIFY - Verification starting : FILE = xifenfei01.dbf
DBV-00200: Block, DBA 20971532, already marked corrupt
--这里可以看出来,该数据块已经被标志为坏块
DBVERIFY - Verification complete
Total Pages Examined         : 1280
Total Pages Processed (Data) : 2
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 14
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 1264
Total Pages Marked Corrupt   : 1
Total Pages Influx           : 0
Highest block SCN            : 318700 (10.318700)

查看恢复过程alert日志

Fri Sep 28 19:14:06 2012
Database mounted in Exclusive Mode
Completed: ALTER DATABASE   MOUNT
Fri Sep 28 19:14:06 2012
ALTER DATABASE OPEN
ORA-1113 signalled during: ALTER DATABASE OPEN...
Fri Sep 28 19:14:11 2012
ALTER DATABASE RECOVER  datafile 5
Media Recovery Start
 parallel recovery started with 2 processes
ORA-279 signalled during: ALTER DATABASE RECOVER  datafile 5  ...
Fri Sep 28 19:14:16 2012
ALTER DATABASE RECOVER    CONTINUE DEFAULT
Fri Sep 28 19:14:16 2012
--恢复数据库的时候,发现坏块
Media Recovery Log /u01/oracle/oradata/XFF/archivelog/1_24_792679299.dbf
Fri Sep 28 19:14:16 2012
Hex dump of (file 5, block 12) in trace file /u01/oracle/admin/XFF/bdump/xff_p001_23011.trc
Corrupt block relative dba: 0x0140000c (file 5, block 12)
Fractured block found during media recovery
Data in bad block:
 type: 6 format: 2 rdba: 0x0140000c
 last change scn: 0x000a.0004d7b0 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xd7b00611
 check value in block header: 0xe563
 computed block checksum: 0x0
Reread of rdba: 0x0140000c (file 5, block 12) found same corrupted data
--继续恢复
Fri Sep 28 19:14:16 2012
Recovery of Online Redo Log: Thread 1 Group 1 Seq 25 Reading mem 0
  Mem# 0: /u01/oracle/oradata/XFF/redo01.log
Fri Sep 28 19:14:16 2012
Recovery of Online Redo Log: Thread 1 Group 2 Seq 26 Reading mem 0
  Mem# 0: /u01/oracle/oradata/XFF/redo02.log
Fri Sep 28 19:14:16 2012
Recovery of Online Redo Log: Thread 1 Group 3 Seq 27 Reading mem 0
  Mem# 0: /u01/oracle/oradata/XFF/redo03.log
Fri Sep 28 19:14:16 2012
Media Recovery Complete (XFF)
Completed: ALTER DATABASE RECOVER    CONTINUE DEFAULT
Fri Sep 28 19:14:31 2012
alter database open

bbed查看修改相关信息

BBED> set filename '/u01/oracle/oradata/XFF/xifenfei01.dbf'
        FILENAME        /u01/oracle/oradata/XFF/xifenfei01.dbf
BBED> set block 12
        BLOCK#          12
BBED> map
 File: /u01/oracle/oradata/XFF/xifenfei01.dbf (0)
 Block: 12                                    Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Table/Cluster)
 struct kcbh, 20 bytes                      @0
 struct ktbbh, 96 bytes                     @20
 struct kdbh, 14 bytes                      @124
 struct kdbt[1], 4 bytes                    @138
 sb2 kdbr[10]                               @142
 ub1 freespace[7666]                        @162
 ub1 rowdata[360]                           @7828
 ub4 tailchk                                @8188
BBED> p kcbh
struct kcbh, 20 bytes                       @0
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub1 spare1_kcbh                          @2        0x00
   ub1 spare2_kcbh                          @3        0x00
   ub4 rdba_kcbh                            @4        0x0140000c
   ub4 bas_kcbh                             @8        0x00000000
   ub2 wrp_kcbh                             @12       0x0000
   ub1 seq_kcbh                             @14       0xff   <--因为被标记为坏块,所以为ff
   ub1 flg_kcbh                             @15       0x04 (KCBHFCKV)
   ub2 chkval_kcbh                          @16       0xe77d
   ub2 spare3_kcbh                          @18       0x0000
--查看数据块中记录
BBED> p *kdbr[5]
rowdata[69]
-----------
ub1 rowdata[69]                             @7897     0x2c
BBED> x /rnc
rowdata[69]                                 @7897
-----------
flag@7897: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@7898: 0x02
cols@7899:    2
col    0[2] @7900: 3
col   1[16] @7903: WWW.XIFENFEI.COM  <--确实没有被恢复,而是直接被跳过
BBED> set mode edit
        MODE            Edit
BBED> m /x 01 offset 14
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: /u01/oracle/oradata/XFF/xifenfei01.dbf (0)
 Block: 12               Offsets:   14 to  525           Dba:0x00000000
------------------------------------------------------------------------
 01047de7 00000100 00000dcc 000098d7 …………
 <32 bytes per line>
BBED> d offset 8188
 File: /u01/oracle/oradata/XFF/xifenfei01.dbf (0)
 Block: 12               Offsets: 8188 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 ff060000    <--这么说明:数据块被标志为坏块的时候,同时会修改tailchk值
 <32 bytes per line>
BBED> m /x 01 offset 8188
 File: /u01/oracle/oradata/XFF/xifenfei01.dbf (0)
 Block: 12               Offsets: 8188 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 01060000
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 12:
current = 0xe77d, required = 0xe77d
--验证块已经标记为正常块
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/oracle/oradata/XFF/xifenfei01.dbf
BLOCK = 12
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 1
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0

启动数据库测试

SQL> startup
ORACLE instance started.
Total System Global Area  318767104 bytes
Fixed Size                  1267236 bytes
Variable Size             109054428 bytes
Database Buffers          201326592 bytes
Redo Buffers                7118848 bytes
Database mounted.
Database opened.
SQL> conn chf/xifenfei
Connected.
SQL> col object_name for a30
SQL> select   object_id,object_name from t_xifenfei;
 OBJECT_ID OBJECT_NAME
---------- ------------------------------
        20 WWW.XIFENFEI.COM
        44 WWW.XIFENFEI.COM
        28 WWW.XIFENFEI.COM
        15 WWW.XIFENFEI.COM
        29 WWW.XIFENFEI.COM
         3 WWW.XIFENFEI.COM
        25 WWW.XIFENFEI.COM
        39 WWW.XIFENFEI.COM
        51 WWW.XIFENFEI.COM
        26 WWW.XIFENFEI.COM
10 rows selected.
--通过修改数据块的seq_kcbh和tailchk,让这个块恢复正常,但是记录依然丢失,
--因为应用日志恢复之时标记为坏块跳过该块的日志应用

通过实验证明:
1.如果只有数据块异常,应用日志恢复,不一定会出现ORA-600[3020],而是直接把该块标记为坏块,继续应用日志
2.标记坏块其实就是修改seq_kcbh为ff,同时也修改tailchk值
3.经验值:如果在数据库应用日志恢复的时候,如果出现ORA-600[3020]错误,可以使用allow 2 corruption来跳过坏块处理,其实也是修改seq_kcbh为ff,然后让数据库跳过该块的恢复.

使用plsql抢救数据

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:使用plsql抢救数据

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

“在oracle出现ORA-8103/ORA-1578/ORA-376″等情况下抢救数据的争论没有停止过,很多人想到的是使用bbed,dul等工具来抢救,其实在很多时候我们使用pl/sql也可以完美的抢救数据.在这里我们通过模拟ORA-8103错误,然后使用plsql来找回数据.这中处理方法相对于bbed风险小,但是缺点是如果数据量大处理时间可能比较长,可能比dul有的一比,但是dul的工具不是任何人都有的.所以整体来说,在大部分情况下,这种方法处理某个数据块错误,抢救某个对象数据,还是很好的方法.
1.有非空列index情况

--创建测试表
SQL> create table xifenfei
  2  as
  3  select * from dba_objects;
Table created.
--修改某个项为非空值
SQL> alter table xifenfei modify object_id not null;
Table altered.
--创建一个唯一index
SQL> create unique index ind_xifenfei  on xifenfei(object_id);
Index created.
--表总记录
SQL> select count(*) from xifenfei;
  COUNT(*)
----------
     50088
--extent的分布情况
SQL> set pages 100
SQL>  select file_id,block_id,block_id+blocks-1
  2    from dba_extents
  3   where segment_name ='XIFENFEI' AND owner='CHF';
   FILE_ID   BLOCK_ID BLOCK_ID+BLOCKS-1
---------- ---------- -----------------
         9       1545              1552
         9       1553              1560
         9       1561              1568
         9       1569              1576
         9       1577              1584
         9       1585              1592
         9       1593              1600
         9       1601              1608
         9       1609              1616
         9       1617              1624
         9       1625              1632
         9       1633              1640
         9       1641              1648
         9       1649              1656
         9       1657              1664
         9       1665              1672
         9       1673              1800
         9       1801              1928
         9       1929              2056
         9       2057              2184
         9       2185              2312
21 rows selected.
--2200数据块包含记录
SQL> select   count(*)
  2  from chf.xifenfei where dbms_rowid.rowid_block_number(rowid)=2200;
  COUNT(*)
----------
        69
--关闭数据库
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
--破坏数据块
[oracle@xifenfei ~]$ dd if=/dev/zero of=/u01/oracle/oradata/XFF/users03.dbf  bs=8192  count=1 seek=2200 conv=notrunc
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000151554 seconds, 54.1 MB/s
--启动数据库
SQL> startup
ORACLE instance started.
Total System Global Area  318767104 bytes
Fixed Size                  1267236 bytes
Variable Size             104860124 bytes
Database Buffers          205520896 bytes
Redo Buffers                7118848 bytes
Database mounted.
Database opened.
--查询结果
SQL>  select /*+ full(xifenfei) */ count(*) from chf.xifenfei;
 select /*+ full(xifenfei) */ count(*) from chf.xifenfei
                                                *
ERROR at line 1:
ORA-08103: object no longer exists
SQL> create table chf.xifenfei_new
  2  as
  3  select * from chf.xifenfei;
select * from chf.xifenfei
                  *
ERROR at line 3:
ORA-08103: object no longer exists
--创建备份表
SQL> create table chf.xifenfei_new
  2  as
  3  select * from chf.xifenfei where 1=0;
Table created.
--创建坏块相关rowid记录表
SQL> create table chf.bad_rows (row_id rowid, oracle_error_code number);
Table created.
--执行plsql脚本
DECLARE
 TYPE RowIDTab IS TABLE OF ROWID INDEX BY BINARY_INTEGER;
 CURSOR c1 IS  select /*+ index(xifenfei ind_xifenfei) */ rowid
 from chf.xifenfei
 where object_id is NOT NULL;
 r RowIDTab;
 rows  NATURAL := 20000;
 bad_rows number := 0 ;
 errors number;
 error_code number;
 myrowid rowid;
BEGIN
 OPEN c1;
 LOOP
   FETCH  c1 BULK COLLECT INTO r LIMIT rows;
   EXIT WHEN r.count=0;
   BEGIN
    FORALL i IN r.FIRST..r.LAST SAVE EXCEPTIONS
     insert into chf.xifenfei_new
     select /*+ ROWID(A) */ *
     from chf.xifenfei A where rowid = r(i);
   EXCEPTION
   when OTHERS then
    BEGIN
     errors := SQL%BULK_EXCEPTIONS.COUNT;
     FOR err1 IN 1..errors LOOP
       error_code := SQL%BULK_EXCEPTIONS(err1).ERROR_CODE;
       if error_code in (1410, 8103) then
         myrowid := r(SQL%BULK_EXCEPTIONS(err1).ERROR_INDEX);
         bad_rows := bad_rows + 1;
         insert into chf.bad_rows values(myrowid, error_code);
       else
         raise;
       end if;
     END LOOP;
     END;
   END;
  commit;
 END LOOP;
 commit;
 CLOSE c1;
 dbms_output.put_line('Total Bad Rows: '||bad_rows);
END;
/
--查询错误记录
SQL> select count(*) from chf.bad_rows ;
  COUNT(*)
----------
        69
SQL> select * from chf.bad_rows where rownum<10;
ROW_ID             ORACLE_ERROR_CODE
------------------ -----------------
AAAMugAAJAAAAiYAAA              8103
AAAMugAAJAAAAiYAAB              8103
AAAMugAAJAAAAiYAAC              8103
AAAMugAAJAAAAiYAAD              8103
AAAMugAAJAAAAiYAAE              8103
AAAMugAAJAAAAiYAAF              8103
AAAMugAAJAAAAiYAAG              8103
AAAMugAAJAAAAiYAAH              8103
AAAMugAAJAAAAiYAAI              8103
9 rows selected.
--查询备份表记录
SQL> select count(*) from chf.xifenfei_new;
  COUNT(*)
----------
     50019
50088-50019=69和被破坏块中记录一致,证明所有好块中记录全部被找回来

2.无非空列index情况

--创建表
SQL> CONN CHF/XIFENFEI
Connected.
SQL> create table t_xifenfei
  2  as
  3  select * from dba_objects;
Table created.
--表中记录总数
SQL> select count(*) from t_xifenfei;
  COUNT(*)
----------
     50086
--extent分布
SQL> SET PAGES 100
SQL>  select file_id,block_id,block_id+blocks-1
  2    from dba_extents
  3   where segment_name ='T_XIFENFEI' AND owner='CHF';
   FILE_ID   BLOCK_ID BLOCK_ID+BLOCKS-1
---------- ---------- -----------------
         9          9                16
         9         17                24
         9         25                32
         9         33                40
         9         41                48
         9         49                56
         9         57                64
         9         65                72
         9         73                80
         9         81                88
         9         89                96
         9         97               104
         9        105               112
         9        113               120
         9        121               128
         9        129               136
         9        137               264
         9        265               392
         9        393               520
         9        521               648
         9        649               776
21 rows selected.
--700数据块中记录数
SQL> select   count(*)
  2  from chf.t_xifenfei where dbms_rowid.rowid_block_number(rowid)=700;
  COUNT(*)
----------
        73
--关闭数据库
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
--破坏block 700的数据块
[oracle@xifenfei ~]$ dd if=/dev/zero of=/u01/oracle/oradata/XFF/users03.dbf  bs=8192  count=1 seek=700 conv=notrunc
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000156576 seconds, 52.3 MB/s
--启动数据库
SQL> startup
ORACLE instance started.
Total System Global Area  318767104 bytes
Fixed Size                  1267236 bytes
Variable Size             104860124 bytes
Database Buffers          205520896 bytes
Redo Buffers                7118848 bytes
Database mounted.
Database opened.
--查询报错
SQL> select count(*) from chf.t_xifenfei;
select count(*) from chf.t_xifenfei
                         *
ERROR at line 1:
ORA-08103: object no longer exists
--创建备份表
SQL> CREATE TABLE T_XIFENFEI_NEW
  2  AS
  3  SELECT * FROM T_XIFENFEI WHERE 1=0;
--找回记录
set serveroutput on
set concat off
DECLARE
 nrows number;
 rid rowid;
 dobj number;
 ROWSPERBLOCK number;
BEGIN
 ROWSPERBLOCK:=1000;  --估算最大的一个块中记录条数
 nrows:=0;
 select data_object_id  into dobj
 from dba_objects
 where owner = 'CHF'
 and object_name = 'T_XIFENFEI'
-- and subobject_name = '<table partition>'  Add this condition if table is partitioned
 ;
 for i in (select relative_fno, block_id, block_id+blocks-1 totblocks
           from dba_extents
           where owner = 'CHF'
             and segment_name = 'T_XIFENFEI'
-- and partition_name = '<table partition>' Add this condition if table is partitioned
-- and file_id != <OFFLINED DATAFILE> This condition is only used if a datafile needs to be skipped due to ORA-376 (A)
          order by extent_id)
 loop
   for br in i.block_id..i.totblocks loop
    for j in 1..ROWSPERBLOCK loop
    begin
      rid := dbms_rowid.ROWID_CREATE(1,dobj,i.relative_fno, br , j-1);
      insert into CHF.T_XIFENFEI_NEW
      select /*+ ROWID(A) */ *
      from CHF.T_XIFENFEI A
      where rowid = rid;
      if sql%rowcount = 1 then nrows:=nrows+1; end if;
      if (mod(nrows,10000)=0) then commit; end if;
    exception when others then null;
    end;
    end loop;
  end loop;
 end loop;
 COMMIT;
 dbms_output.put_line('Total rows: '||to_char(nrows));
END;
/
--找回记录数
SQL> SELECT COUNT(*) FROM CHF.T_XIFENFEI_NEW;
  COUNT(*)
----------
     50013
50086-50013=73  证明非坏块中的数据都被完全寻找回来

参考:
ORA-8103 Troubleshooting, Diagnostic and Solution [ID 268302.1]
Extract rows from a CORRUPT table creating ROWID from DBA_EXTENTS [ID 422547.1]

TXChecker初试

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:TXChecker初试

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

当我们对数据库进行异常恢复,很多时候要选择屏蔽回滚段,但是我们有没有办法来评估屏蔽回滚段到底会对我们的数据库的数据产生多大影响呢?其实我们可以通过TXChecker工具来评估数据库undo异常时候受到影响的数据对象有哪些,从而进一步确定是否真的需要对其undo下手处理.
模拟数据异常事务为提交情况

[oracle@xifenfei ~]$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.4.0 - Production on Thu Aug 30 20:32:27 2012
Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.
Connected to an idle instance.
SQL> startup;
ORACLE instance started.
Total System Global Area  318767104 bytes
Fixed Size                  1267236 bytes
Variable Size             100665820 bytes
Database Buffers          209715200 bytes
Redo Buffers                7118848 bytes
Database mounted.
Database opened.
SQL> create table t_xifenfei tablespace system
  2  as
  3  select * from dba_objects;
Table created.
SQL> create table chf.t_xifenfei tablespace users
  2  as
  3  select * from dba_objects;
Table created.
SQL> delete from t_xifenfei where rownum<10;
9 rows deleted.
SQL> delete from chf.t_xifenfei where rownum<100;
99 rows deleted.
SQL> alter system checkpoint;
System altered.
SQL> shutdow abort;
ORACLE instance shut down.

在system和users表空间分别模拟了sys和chf的t_xifenfei表含事务未提交情况

TXChecker用法说明

[oracle@xifenfei txchecker]$ ./TXChecker
TXChecker - v1.4 by Center Of Expertise (COE), Oracle Corporation (build 10/16/07)
Usage is: TXChecker [options]
Options:
  -a                     When scanning datafiles (with -d/-f/-l/-t options) report objects using any of the undo
                         segments (not just those with errors) (OPTIONAL)
  -b                     For objects found, print the datablock addresses. See readme for further details (OPTIONAL)
  -c<controlfile_name>   Fully qualified controlfile name to read (MANDATORY)
  -d                     Scan database for active TXs (use when undo not available) (OPTIONAL)
  -f<filename>           Scan the named datafile for active TXs (OPTIONAL)
  -g                     Indicates you want to find all blocks taking part in transactions with
                         a USN > than the USN supplied in -x<XID> parameter (same constraints as -w) (OPTIONAL)
  -l<listfile>           Scan all the datafiles listed in the listfile for active TXs (OPTIONAL)
  -m<minutes>            Number of minutes used to consider a TX as active (1-120) (DEFAULTS TO 60 MINUTES)
  -p                     Show the names and last known status of the UNDO segments (OPTIONAL)
  -s                     Skip read-only or offline normal datafiles (OPTIONAL)
  -t<tablespace>         Scan all the datafiles for this tablespace (OPTIONAL)
  -u                     Report ITL entries active if marked with an upper bound ('U' flag) fast commit SCN
                         instead if active transactions (OPTIONAL)
  -w<wrap#>              Wrap# for XID in ITL entry to report blocks where wrap# > this one (OPTIONAL)
                         Must use -x with this option. See the readme for details
  -x<XID>                XID for transaction wanting to search for (OPTIONAL)
                         Use format rrrr.ssss.wwwwwwww using Hexadecimal numbers
                         See the readme for full instructions on using -x, -w and -g options
NOTE: Options -d/-f/-l/-t are exclusive, and only one should be specified.
[/sql]
<strong>TXChecker初始</strong>

[oracle@xifenfei txchecker]$ ./TXChecker -c/u01/oracle/oradata/XFF/control01.ctl -d -a
TXChecker - v1.4 by Center Of Expertise (COE), Oracle Corporation (build 10/16/07)
Database Name: XFF   Version: 10.2.0
*** Database last checkpointed at 08/30/2012 20:34:43 (SCN: 0xa.0x1e142)
*** Using 60 minutes to find most active transactions (-m60)
*** Undo Segments:
USN:   0  Name: SYSTEM          TBS#:  0 File:   1  Block:      9  Instance:  0  SMU: N  Status:  3 - Online
                                SCN: 0000.00000000   XactSQN: 0x00000000   UndoSQN: 0x00000000
USN:   1  Name: _SYSSMU1$       TBS#:  1 File:   2  Block:      9  Instance:  0  SMU: Y  Status:  3 - Online
                                SCN: 000a.000191c3   XactSQN: 0x000000d2   UndoSQN: 0x0000013c
…………省略
USN:   9  Name: _SYSSMU9$       TBS#:  1 File:   2  Block:    137  Instance:  0  SMU: Y  Status:  3 - Online
                                SCN: 000a.000191d8   XactSQN: 0x0000011a   UndoSQN: 0x000000dc
USN:  10  Name: _SYSSMU10$      TBS#:  1 File:   2  Block:    153  Instance:  0  SMU: Y  Status:  3 - Online
                                SCN: 000a.000191d7   XactSQN: 0x000000c1   UndoSQN: 0x000000d9
…………省略
USN:  20  Name: _SYSSMU20$      TBS#:  5 File:   5  Block:    153  Instance:  0  SMU: Y  Status:  1 - Invalid / Dropped
                                SCN: 0000.00070ec6   XactSQN: 0x00000002   UndoSQN: 0x00000001
*** Active Transactions:
USN: 6  Name: _SYSSMU6$  File: 2  Block: 89  Instance: 0  Status: 3 - Online
* Active TX at slot 6  #undo blocks: 3  Last bk: 2.90
Obj#:  51938  Name: CHF.T_XIFENFEI                                    Type: TABLE               Undo recs: 99
              Used undo segment IDs: 6
Obj#:  51937  Name: SYS.T_XIFENFEI                                    Type: TABLE               Undo recs: 9
              Used undo segment IDs: 6
         File (validate_objects.sql) being created for object validattion already exists
         and will be overwritten!!!
         Do you want to continue overwriting [Y/N]?y
*** Undo segments (headers) that encountered errors preventing Active TX scan:
USN:  11  Name: _SYSSMU11$      File:   5  Block:       9  Instance:  0  Error: 27 - Undo segment was dropped
…………省略
USN:  20  Name: _SYSSMU20$      File:   5  Block:     153  Instance:  0  Error: 27 - Undo segment was dropped
WARNING: Analyzing the full database for active transactions will take some time!
         Are you sure you want to analyze the full database [Y/N]?
         Are you sure you want to analyze the full database [Y/N]?y
*** Scanning database for datablocks that may require undo (PLEASE WAIT...):
*** Asterisk ('*') denotes blocks being updated since 08/08/2012 03:30:32 (SCN: 0x0.0x79607)
Scanning datafile:  3 - /u01/oracle/oradata/XFF/sysaux01.dbf (SYSAUX)                           - Active TX blocks: 147 *
Scanning datafile:  1 - /u01/oracle/oradata/XFF/system01.dbf (SYSTEM)                           - Active TX blocks: 11597 *
--undo表空间跳过
Undo datafile (/u01/oracle/oradata/XFF/undotbs01.dbf) - SKIPPING
Scanning datafile:  4 - /u01/oracle/oradata/XFF/users01.dbf (USERS)                             - Active TX blocks: 2 *
--因为数据文件丢失控制文件中offline的跳过(其实只要数据文件丢失就会跳过)
Cannot access datafile (/u01/oracle/oradata/XFF/xifenfei01.dbf) (error 2 - No such file or directory) - SKIPPING
Cannot access datafile (/u01/oracle/oradata/XFF/xifenfei02.dbf) (error 2 - No such file or directory) - SKIPPING
Scanning datafile:  7 - /u01/oracle/oradata/XFF/xifenfei03.dbf (XIFENFEI2)                      - Active TX blocks: 0
*** Objects that may require undo data:
*** Asterisk ('*') denotes blocks being updated since 08/08/2012 03:30:32 (SCN: 0x0.0x79607)
DataObj#:  51938  Name: CHF.T_XIFENFEI                                    Type: TABLE               Datablocks: 2 *
Used undo segment IDs: 6
DataObj#:  46434  Name: MDSYS.SYS_IL0000046432C00006$$                    Type: INDEX               Datablocks: 4
Used undo segment IDs: 2
DataObj#:    124  Name: SYS.I_ACCESS1                                     Type: INDEX               Datablocks: 27
Used undo segment IDs: 1 2 3 4 5 6 7 8 9 10
…………省略
DataObj#:   8824  Name: SYS.SYS_IL0000008822C00008$$                      Type: INDEX               Datablocks: 1 *
Used undo segment IDs: 4
DataObj#:  51937  Name: SYS.T_XIFENFEI                                    Type: TABLE               Datablocks: 1 *
Used undo segment IDs: 6
DataObj#:  51557  Name: SYS.UTL_RECOMP_COMPILED                           Type: TABLE               Datablocks: 1
Used undo segment IDs: 8
…………省略
DataObj#:  42131  Name: XDB.XDB$ELEMENT_PROPNUMBER                        Type: INDEX               Datablocks: 1
Used undo segment IDs: 2
*** Use validate_objects.sql script file to validate the structure of possibly corrupt objects
    if the undo required is not available.
Undo Segment Usage Summary
**************************
*** Undo segments identified in use by active transaction datablocks AFTER 08/08/2012 03:30:32 (SCN: 0x0.0x79607):
USN:   1  Name: _SYSSMU1$
USN:   2  Name: _SYSSMU2$
USN:   3  Name: _SYSSMU3$
USN:   4  Name: _SYSSMU4$
USN:   5  Name: _SYSSMU5$
USN:   6  Name: _SYSSMU6$
USN:   9  Name: _SYSSMU9$
*** Undo segments identified in use by active transacation datablocks BEFORE 08/08/2012 03:30:32 (SCN: 0x0.0x79607):
USN:   1  Name: _SYSSMU1$
USN:   2  Name: _SYSSMU2$
USN:   3  Name: _SYSSMU3$
USN:   4  Name: _SYSSMU4$
USN:   5  Name: _SYSSMU5$
USN:   7  Name: _SYSSMU7$
USN:   8  Name: _SYSSMU8$
USN:   9  Name: _SYSSMU9$
USN:  10  Name: _SYSSMU10$
NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!
*************************************************************************************
Upload the logfile (TXChecker_083012_2042_XFF.log) to Oracle Support Services for analysis.
Do NOT attempt to force the database open until the logfile has been analyzed.

验证对象脚本

[oracle@xifenfei txchecker]$ more validate_objects.sql
rem validate_objects.sql - checks strcuture of objects needing unavailable undo data
rem                        Created by findtxns program, Oracle Corporation
set echo on
ANALYZE TABLE CHF.T_XIFENFEI VALIDATE STRUCTURE CASCADE;
ANALYZE INDEX MDSYS.SYS_IL0000046432C00006$$ VALIDATE STRUCTURE;
ANALYZE INDEX SYS.I_ACCESS1 VALIDATE STRUCTURE;
…………省略
ANALYZE INDEX XDB.SYS_C003167 VALIDATE STRUCTURE;
ANALYZE INDEX XDB.XDB$ELEMENT_PROPNAME VALIDATE STRUCTURE;
ANALYZE INDEX XDB.XDB$ELEMENT_PROPNUMBER VALIDATE STRUCTURE;

ORA-607/ORA-600[4194]不一定是重大灾难

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-607/ORA-600[4194]不一定是重大灾难

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

以前解决过ORA-607/ORA-600[4194]和模拟过ORA-607/ORA-600[4194]错误,所以固定思维任务ORA-607/ORA-600[4194]可能就是重大灾难,通过这个案例来说明ORA-607/ORA-600[4194]可能也就是一个常规的不能再常规的错误:有一网友数据库因意外关闭电源导致启动过程出现ORA-00607/ORA-00600[4194]/ORA-00600[4097]的错误,使得数据库启动失败.

SMON: enabling tx recovery
Fri Aug 31 23:14:08 2012
Database Characterset is ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=19, OS id=15619
Fri Aug 31 23:14:10 2012
Errors in file /u01/oradata/orcl/bdump/orcl_smon_15589.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Fri Aug 31 23:14:12 2012
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Fri Aug 31 23:14:12 2012
Completed: alter database open
Fri Aug 31 23:14:14 2012
Errors in file /u01/oradata/orcl/bdump/orcl_smon_15589.trc:
ORA-01595: error freeing extent (2) of rollback segment (4))
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [48], [34], [], [], [], [], []
Fri Aug 31 23:29:41 2012
Errors in file /u01/oradata/orcl/bdump/orcl_smon_15589.trc:
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [17], [10], [], [], [], [], []
Fri Aug 31 23:29:43 2012
Errors in file /u01/oradata/orcl/bdump/orcl_smon_15589.trc:
ORA-00600: internal error code, arguments: [4194], [48], [34], [], [], [], [], []
Fri Aug 31 23:29:44 2012
Errors in file /u01/oradata/orcl/bdump/orcl_pmon_15577.trc:
ORA-00474: SMON process terminated with error
Fri Aug 31 23:29:44 2012
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 15577

通过alert日志可以定位到SMON_SCN_TIME表或者其回滚操作可能异常,结合alert和trace分析,发现这次错误的操作主要sql语句为:

ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4194], [48], [34], [], [], [], [], []
Current SQL statement for this session:
UPDATE SYS.COL_USAGE$
   SET EQUALITY_PREDS    = EQUALITY_PREDS +
                           DECODE(BITAND(:FLAG, 1), 0, 0, 1),
       EQUIJOIN_PREDS    = EQUIJOIN_PREDS +
                           DECODE(BITAND(:FLAG, 2), 0, 0, 1),
       NONEQUIJOIN_PREDS = NONEQUIJOIN_PREDS +
                           DECODE(BITAND(:FLAG, 4), 0, 0, 1),
       RANGE_PREDS       = RANGE_PREDS + DECODE(BITAND(:FLAG, 8), 0, 0, 1),
       LIKE_PREDS        = LIKE_PREDS + DECODE(BITAND(:FLAG, 16), 0, 0, 1),
       NULL_PREDS        = NULL_PREDS + DECODE(BITAND(:FLAG, 32), 0, 0, 1),
       TIMESTAMP         = :TIME
 WHERE OBJ# = :OBJN
   AND INTCOL# = :COLN
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4194], [17], [10], [], [], [], [], []
Current SQL statement for this session:
UPDATE SYS.MON_MODS$
   SET INSERTS       = INSERTS + :INS,
       UPDATES       = UPDATES + :UPD,
       DELETES       = DELETES + :DEL,
       FLAGS        =
       (DECODE(BITAND(FLAGS, :FLAG), :FLAG, FLAGS, FLAGS + :FLAG)),
       DROP_SEGMENTS = DROP_SEGMENTS + :DROPSEG,
       TIMESTAMP     = :TIME
 WHERE OBJ# = :OBJN
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Current SQL statement for this session:
INSERT INTO SMON_SCN_TIME
  (THREAD,
   TIME_MP,
   TIME_DP,
   SCN,
   SCN_WRP,
   SCN_BAS,
   NUM_MAPPINGS,
   TIM_SCN_MAP)
VALUES
  (0, :1, :2, :3, :4, :5, :6, :7)

这里主要涉及到对oracle的三张表的操作
COL_USAGE$:主要是在收集统计信息的时候作为是否需要收集列直方图信息参考
MON_MODS$:Oracle主要利用该表来记录那些表的数据发生改变,方便收集统计信息
SMON_SCN_TIME:记录SCN和TIME的对应关系
通过这里的分析可以确定这三张表中的数据对于数据库来说不是致命的基表信息,在数据库运行过程中可以清理掉这些信息,最多就是因为数据库性能的下降或者SCN和TIME互转功能不完善.

解决思路
完整的undo异常处理顺序
1.从alert中可以看出来数据库是在open之后由于SMON回滚到上述几条sql异常导致数据库down,所以可以尝试使用system回滚段启动数据库,看看是否可以屏蔽相关问题
2.如果方法1不可行,那使用event屏蔽smon对回滚段的相关操作,使得数据库正常启动
3.如果由于存在特殊事务,event无法屏蔽,尝试使用隐含参数处理该问题
4.如果隐含参数尚无法解决给问题考虑使用bbed
5.如果bbed不能解决,那只能选择dul或者其类似工具处理
这个案例中我们明确的看到是因为上面的三条sql回滚异常出现问题导致,对于这样的问题,经过测试使用方法1和2都能够顺利解决问题(open库之后需要重建undo,删除有问题undo表空间,修改参数[可能包括event],切换undo表空间).因为遇到几次ORA-607/ORA-600[4194]是因为system rollback损坏导致,所以这次开始也认为是一次比较复杂的恢复,最后证明这次是一种非常常规的恢复.对于ORACLE的数据库恢复有经验可能会比较快的定位问题,但是如果按照固定的思路去想可能会让自己走进死胡同.

处理smon清理临时段导致数据库异常案例

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:处理smon清理临时段导致数据库异常案例

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一个朋友的数据库在经过自己的千辛万苦终于open成功,但是几分钟就down掉,使得他想导出数据重建库的目标不能实现.让我帮忙处理
alert日志报ORA-00600[kafspa:columnBuffer1]

Wed Aug  8 10:55:31 2012
Completed: ALTER DATABASE OPEN
Wed Aug  8 10:55:41 2012
Errors in file /oracle/ora10/admin/ora10g/udump/ora10g_ora_12160.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:55:47 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 2 out of maximum 100 non-fatal internal errors.
Wed Aug  8 10:55:47 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:55:58 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 3 out of maximum 100 non-fatal internal errors.
Wed Aug  8 10:55:59 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:56:10 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 4 out of maximum 100 non-fatal internal errors.
Wed Aug  8 10:56:11 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:56:22 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 5 out of maximum 100 non-fatal internal errors.
Wed Aug  8 10:56:32 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:56:43 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 6 out of maximum 100 non-fatal internal errors.
Wed Aug  8 10:56:53 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:57:04 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 7 out of maximum 100 non-fatal internal errors.
Wed Aug  8 10:57:14 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:57:25 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 8 out of maximum 100 non-fatal internal errors.
Wed Aug  8 10:57:35 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:57:38 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_pmon_8856.trc:
ORA-00474: SMON process terminated with error
Wed Aug  8 10:57:38 2012
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 8856

这里出现ORA-00600[kafspa:columnBuffer1],一个未知的错误,但是根据相关的提示,可以大概猜出来是什么原因导致数据库异常

Non-fatal internal error happenned while SMON was doing temporary segment drop.

出现这个错误,使得我们想到一个smon的功能,清理临时段.该数据库down掉很可能和smon清理临时段的过程发生失败有关系

SMON encountered 8 out of maximum 100 non-fatal internal errors.

这个错误提示是因为smon内部最多允许发生100次错误,记录错误发生了8次,当然这次数据库down掉是smon还没有达到100次就直接abort掉

SQL> col name for a32
SQL> col value for a24
SQL> col description for a70
SQL> set linesize 150
SQL> select a.ksppinm name,b.ksppstvl value,a.ksppdesc description
  2    from x$ksppi a,x$ksppcv b
  3   where a.inst_id = USERENV ('Instance')
  4     and b.inst_id = USERENV ('Instance')
  5     and a.indx = b.indx
  6     and upper(a.ksppinm) LIKE upper('%&param%')
  7  order by name
  8
SQL> /
Enter value for param: smon_internal_errlimit
old   6:    and upper(a.ksppinm) LIKE upper('%&param%')
new   6:    and upper(a.ksppinm) LIKE upper('%smon_internal_errlimit%')
NAME                             VALUE                    DESCRIPTION
-------------------------------- ------------------------ -----------------------------------
_smon_internal_errlimit          100                      limit of SMON internal errors

分析trace文件

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/ora10/product
System name:	Linux
Node name:	DBN-HLD-155-2
Release:	2.6.18-92.el5PAE
Version:	#1 SMP Tue Apr 29 13:31:02 EDT 2008
Machine:	i686
Instance name: ora10g
Redo thread mounted by this instance: 1
Oracle process number: 8
Unix process pid: 8941, image: oracle@DBN-HLD-155-2 (SMON)
*** SERVICE NAME:() 2012-08-08 10:55:20.208
*** SESSION ID:(274.1) 2012-08-08 10:55:20.208
*** 2012-08-08 10:55:20.208
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [], []
Current SQL statement for this session:
select file#, block#, ts# from seg$ where type# = 3

果然是smon在查询type#=3的时候发现异常,出现ORA-00600[25027]错误.通过对seg$相关视图分析,可以知道type#=3表示临时段,也就是说数据库smon在查询哪些segment是临时段的时候发生意外,而结合alert日志,完整的错误应该就是:数据库启动后,smon进程为了清理临时段,需要通过select file#, block#, ts# from seg$ where type# = 3查询临时段,但是在查询的时候发生错误.而这个错误累积几次导致数据库异常关闭.

解决问题

--修改两个参数
event='10061 trace name context forever, level 10'
_smon_internal_errlimit=1000000
--启动数据库
SQL> startup pfile='/tmp/pfile'
ORACLE instance started.
Total System Global Area 1577058304 bytes
Fixed Size                  1267716 bytes
Variable Size             385878012 bytes
Database Buffers         1174405120 bytes
Redo Buffers               15507456 bytes
Database mounted.
Database opened.

因为屏蔽了smon回收临时段,数据库未出现开始时错误,观察数据库几个小时,运行正常,到此基本上解决了此次异常,通过exp可以顺利导出数据然后导入到新库中.

导致该异常sql分析

SQL> select file#, block#, ts# from seg$ where type# = 3;
select file#, block#, ts# from seg$ where type# = 3
                               *
ERROR at line 1:
ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [],
[]
SQL> set autot trace exp
SQL> select file#, block#, ts# from seg$ where type# = 3;
Execution Plan
----------------------------------------------------------
Plan hash value: 1605285479
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    10 |   150 |   389   (1)| 00:00:05 |
|*  1 |  TABLE ACCESS FULL| SEG$ |    10 |   150 |   389   (1)| 00:00:05 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("TYPE#"=3)
SQL> select count(rowid) from seg$;
Execution Plan
----------------------------------------------------------
Plan hash value: 763549841
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    12 |   389   (1)| 00:00:05 |
|   1 |  SORT AGGREGATE    |      |     1 |    12 |            |          |
|   2 |   TABLE ACCESS FULL| SEG$ | 45682 |   535K|   389   (1)| 00:00:05 |
---------------------------------------------------------------------------
SQL> select /*+ full(t) */ count(*) from seg$;
Execution Plan
----------------------------------------------------------
Plan hash value: 763549841
-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    29   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| SEG$ |  3223 |    29   (0)| 00:00:01 |
-------------------------------------------------------------------
SQL> select ts# from seg$;
Execution Plan
----------------------------------------------------------
Plan hash value: 1605285479
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  3223 |  9669 |    29   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| SEG$ |  3223 |  9669 |    29   (0)| 00:00:01 |
--------------------------------------------------------------------------
SQL> set autot off
SQL> select count(rowid) from seg$;
SQL> /
COUNT(ROWID)
------------
       45727
SQL> select /*+ full(t) */ count(*) from seg$;
  COUNT(*)
----------
     45727
SQL> select  ts# from seg$;
……
7
7
ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [],
[]
SQL> !dbv file='/oracle/ora10/oradata/ora10g/system01.dbf'
DBVERIFY: Release 10.2.0.4.0 - Production on Thu Aug 9 14:05:09 2012
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
DBVERIFY - Verification starting : FILE = /oracle/ora10/oradata/ora10g/system01.dbf
Block Checking: DBA = 4225673, Block Type = KTB-managed data block
data header at 0xb6d8225c
kdbchk: bad row offset slot 6 offs 3030 fseo 3752 dtl 8168 bhs 72
Page 31369 failed with check code 6135
Block Checking: DBA = 4236289, Block Type = KTB-managed data block
data header at 0xb6d7225c
kdbchk: tosp bad (-13399)
Page 41985 failed with check code 6127
DBVERIFY - Verification complete
Total Pages Examined         : 192000
Total Pages Processed (Data) : 47588
Total Pages Failing   (Data) : 2
Total Pages Processed (Index): 40929
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 1784
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 101699
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Highest block SCN            : 1775671440 (0.1775671440)
SQL> ANALYZE TABLE sys.SEG$ VALIDATE STRUCTURE CASCADE;
ANALYZE TABLE sys.SEG$ VALIDATE STRUCTURE CASCADE
*
ERROR at line 1:
ORA-01498: block check failure - see trace file

通过这里的分析大概可以确定是由于两块KTB-managed data block数据块异常,导致直接对seg$进行TABLE ACCESS FULL操作的时候发生异常.因为这个库已经破坏了数据一致性,先导出来数据,至于出现该错误的原因,后续继续关注分析

创建控制文件遭遇ORA-00600[3753]故障解决

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:创建控制文件遭遇ORA-00600[3753]故障解决

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一位网友的数据库正常关闭,然后控制文件意外丢失,需要通过trace中的信息重建控制文件,但是在重建的过程中,出现ORA-00600[3753]错误,远程帮忙处理,记录处理过程如下
1.启动数据库至nomount状态,然后尝试noresetlogs模式重建控制文件

SQL>@XFF_NORESETLOGS_CTL.sql
CREATE CONTROLFILE REUSE DATABASE "ORA10G" NORESETLOGS  ARCHIVELOG
*
第 1 行出现错误:
ORA-01503: CREATE CONTROLFILE 失败
ORA-00600: 内部错误代码, 参数: [3753], [3], [2], [], [], [], [], []

2.检查alert日志

Tue Aug 07 20:40:47 2012
WARNING: Default Temporary Tablespace not specified in CREATE DATABASE command
Default Temporary Tablespace will be necessary for a locally managed database in future release
Tue Aug 07 20:40:48 2012
Errors in file d:\oracle\product\10.2.0\db_1\admin\ora10g\udump\ora10g_ora_11596.trc:
ORA-00600: 内部错误代码, 参数: [3753], [3], [2], [], [], [], [], []
Tue Aug 07 20:40:53 2012
Errors in file d:\oracle\product\10.2.0\db_1\admin\ora10g\udump\ora10g_ora_11596.trc:
ORA-00600: 内部错误代码, 参数: [3753], [3], [2], [], [], [], [], []
ORA-1503 signalled during: CREATE CONTROLFILE REUSE DATABASE "ORA10G" NORESETLOGS  ARCHIVELOG

3.分析trace文件

Tue Aug 07 20:40:48 2012
ORACLE V10.2.0.1.0 - Production vsnsta=0
vsnsql=14 vsnxtr=3
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
Windows NT Version V6.1 Service Pack 1
CPU                 : 2 - type 586, 2 Physical Cores
Process Affinity    : 0x00000000
Memory (Avail/Total): Ph:166M/1901M, Ph+PgF:619M/5536M, VA:812M/2047M
Instance name: ora10g
Redo thread mounted by this instance: 0 <none>
Oracle process number: 16
Windows thread id: 11596, image: ORACLE.EXE (SHAD)
*** SERVICE NAME:() 2012-08-07 20:40:48.413
*** SESSION ID:(158.7) 2012-08-07 20:40:48.413
*** 2012-08-07 20:40:48.413
ksedmp: internal or fatal error
ORA-00600: 内部错误代码, 参数: [3753], [3], [2], [], [], [], [], []
Current SQL statement for this session:
CREATE CONTROLFILE REUSE DATABASE "ORA10G" NORESETLOGS  ARCHIVELOG
…………
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
_ksedst+38           CALLrel  _ksedst1+0           0 1
_ksedmp+898          CALLrel  _ksedst+0            0
_ksfdmp+14           CALLrel  _ksedmp+0            3
603A816A             CALLreg  00000000             87CF110 3
603A83FF             CALLrel  603A80D8             87CF110 8191090 EA9 2 8CCC438
_ksesic2+59          CALLrel  _kgesiv+0            87CF110 8191090 EA9 2 8CCC438
                                                   EA9 2 8CCC438
__VInfreq__kctbce+1  CALLrel  _ksesic2+0           EA9 0 3 0 0 2 0
63
_kcfccfl+356         CALLrel  _kctbce+0            543414C 81DB8A8
_cdbdrv+1037         CALLrel  _kcfccfl+0           543414C 1 8CCD060 8CCD04C
                                                   19000 3
_opiexe+11999        CALLrel  _cdbdrv+0            1
_opiosq0+6088        CALLrel  _opiexe+0            4 0 8CCD894
_kpooprx+232         CALLrel  _opiosq0+0           3 E 8CCD9AC A4
_kpoal8+775          CALLrel  _kpooprx+0           8CCF6CC 8196414 A16 1 0 A4
_opiodr+1099         CALLreg  00000000             5E 17 8CCF6C8
60FEFF8D             CALLreg  00000000             5E 17 8CCF6C8 0
_opitsk+1017         CALL???  00000000
_opiino+1087         CALLrel  _opitsk+0            0 0
_opiodr+1099         CALLreg  00000000             3C 4 8CCFC60
_opidrv+819          CALLrel  _opiodr+0            3C 4 8CCFC60 0
_sou2o+45            CALLrel  _opidrv+0            3C 4 8CCFC60
_opimai_real+112     CALLrel  _sou2o+0             8CCFC54 3C 4 8CCFC60
_opimai+92           CALLrel  _opimai_real+0       2 8CCFC8C
_OracleThreadStart@  CALLrel  _opimai+0
4+708
__pRawDllMain+10931  CALLptr  00000000
2903
__pRawDllMain+12925  CALLreg  00000000
4809
__pRawDllMain+12925  CALLrel  __pRawDllMain+12925
4761                          4772
--------------------- Binary Stack Dump ---------------------
    ----------------------------------------
    SO: 4FB3DF5C, type: 4, owner: 4FA4CBFC, flag: INIT/-/-/0x00
    (session) sid: 158 trans: 4EBB8954, creator: 4FA4CBFC, flag: (41) USR/- BSY/-/-/-/-/-
              DID: 0000-0010-0000000A, short-term DID: 0000-0000-00000000
              txn branch: 00000000
              oct: 0, prv: 0, sql: 00000000, psql: 4F707298, user: 0/SYS
    O/S info: user: superv06-PC\superv06, term: SUPERV06-PC, ospid: 7788:11636, machine: WORKGROUP\SUPERV06-PC
              program: sqlplus.exe
    application name: sqlplus.exe, hash value=0
    last wait for 'log file sequential read' blocking sess=0x00000000 seq=31
    wait_time=159 seconds since wait started=0
                log#=0, block#=1, blocks=1
    Dumping Session Wait History
     for 'log file sequential read' count=1 wait_time=159
                log#=0, block#=1, blocks=1
     for 'log file sequential read' count=1 wait_time=502
                log#=0, block#=1, blocks=1
     for 'log file sequential read' count=1 wait_time=163
                log#=0, block#=1, blocks=1
     for 'db file sequential read' count=1 wait_time=18840
                file#=ffffffff, block#=1, blocks=1
     for 'db file sequential read' count=1 wait_time=254
                file#=ffffffff, block#=1, blocks=1
     for 'db file sequential read' count=1 wait_time=7654
                file#=ffffffff, block#=1, blocks=1
     for 'db file sequential read' count=1 wait_time=150
                file#=ffffffff, block#=1, blocks=1
     for 'db file sequential read' count=1 wait_time=102
                file#=ffffffff, block#=1, blocks=1
     for 'db file sequential read' count=1 wait_time=123
                file#=ffffffff, block#=1, blocks=1
     for 'db file sequential read' count=1 wait_time=14010
                file#=ffffffff, block#=1, blocks=1

通过这里我们发现创建控制文件的进程在读取redo log的时候出现了等待比较多而且时间比较长,而对于ORA-00600[3753]错误互联网上没有任何更多的信息.通过对于创建控制文件时候因为使用noresetlogs的分析:这种模式下需要读取redo log,所以导致等待较多,从而出现ORA-00600[3753]错误使得创建控制文件失败.因为本库是shutdown immediate关闭,所以我们完全可以通过resetlogs模式来创建控制文件,从而避免读取redo log.

4.创建resetlogs控制文件

SQL>@XFF_RESETLOGS_CTL.sql
Control file created.

5.然后不完全恢复使用resetlogs open数据库

这次的处理我也没有什么经验可以借鉴,MOS和互联网上没有该错误的任何信息,解决这个问题关键凭的是自己对于noresetlogs和resetlogs的理解.对于数据库原理的理解,对解决一些陌生问题帮助很大;在学习ORACLE过程中注重对原理的理解和消化

记录8.0.5数据库恢复过程

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:记录8.0.5数据库恢复过程

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

某银行需要恢复2004年磁带备份数据库 8.0.5,当时因为开发商使用begin backup命令进行热备,然后压缩到磁带上.在硬盘紧张,rman不成熟的时代,使用这样的方法备份本身没有大问题,可是因为没有备份归档日志,给现在的恢复带来了很多的问题.
尝试不完全恢复启动库

SVRMGR> startup pfile='init.ora' mount
ORACLE instance started.
Total System Global Area                        141826208 bytes
Fixed Size                                          47264 bytes
Variable Size                                   124829696 bytes
Database Buffers                                 16777216 bytes
Redo Buffers                                       172032 bytes
Database mounted.
SVRMGR> recover database using backup controlfile until cancel;
ORA-00279: change 613561556 generated at 08/21/04 22:42:48 needed for thread 1
ORA-00289: suggestion : /oracle/product/8.0.5/dbs/arch1_55329.dbf
ORA-00280: change 613561556 for thread 1 is in sequence #55329
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
Media recovery cancelled.
SVRMGR> alter database open resetlogs;
alter database open resetlogs
*
ORA-00600: internal error code, arguments: [4146], [31400], [31370], [], [], [], [], []

这个错误是因为回滚段corruption导致,客户已经没有了一致性要求,所以解决办法是通过一些手段找出回滚段并屏蔽

屏蔽回滚段重启库

SVRMGR> shutdown abort
ORACLE instance shut down.
SVRMGR> startup  pfile='init.ora' mount
ORACLE instance started.
Total System Global Area                        141826208 bytes
Fixed Size                                          47264 bytes
Variable Size                                   124829696 bytes
Database Buffers                                 16777216 bytes
Redo Buffers                                       172032 bytes
Database mounted.
SVRMGR> alter database open;
alter database open
*
ORA-00600: internal error code, arguments: [3668], [1], [2], [55992], [55992], [4], [], []

根据ORA-00600[3668]错误的提示,因为重建控制文件,有一个数据文件因为在磁带恢复丢失,所以控制文件在上次重建的时候没有包含进去,根据经验在8.0中可以通过resetlogs解决

resetlogs重新打开数据库

SVRMGR> alter database open resetlogs;
alter database open resetlogs
*
ORA-01139: RESETLOGS option only valid after an incomplete database recovery
SVRMGR> recover datatabase using backup controlfile until cancel;
ORA-00274: illegal recovery option DATATABASE
SVRMGR> recover database using backup controlfile until cancel;
ORA-00279: change 613561558 generated at 07/30/12 09:14:49 needed for thread 1
ORA-00289: suggestion : /oracle/product/8.0.5/dbs/arch1_1.dbf
ORA-00280: change 613561558 for thread 1 is in sequence #1
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
Media recovery cancelled.
SVRMGR> alter database open resetlogs;
alter database open resetlogs
*
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small

根据经验可能是屏蔽了undo或者undo损坏,然后有事务可能需要读undo,无法读到对应记录,可能出现此错误,因为无trace文件,尝试推进scn解决.当然这个问题可以通过open时做10046然后深入分析也许可以找到原因.

open数据库成功
根据上面的分析,重启库,推进scn,resetlogs库解决问题

SVRMGR> shutdown immediate
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
SVRMGR> startup pfile='init.ora' mount;
ORACLE instance started.
Total System Global Area                        141826208 bytes
Fixed Size                                          47264 bytes
Variable Size                                   124829696 bytes
Database Buffers                                 16777216 bytes
Redo Buffers                                       172032 bytes
Database mounted.
SVRMGR> alter database open ;
alter database open
*
ORA-00600: internal error code, arguments: [3668], [1], [2], [55994], [55994], [4], [], []
SVRMGR> recover database using backup controlfile until cancel;
ORA-00279: change 613561560 generated at 07/30/12 09:17:11 needed for thread 1
ORA-00289: suggestion : /oracle/product/8.0.5/dbs/arch1_1.dbf
ORA-00280: change 613561560 for thread 1 is in sequence #1
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
Media recovery cancelled.
SVRMGR> alter database open resetlogs;
Statement processed.

这次恢复比较幸运,在备份到磁带过程中被覆盖的一个同名的数据文件,因为是index,通过查询dba_extents,然后删除并重建相关index,s数据库算恢复完全.有些时候,对数据库多一份备份,可能成为最后的救命稻草,哪怕是一份不是十分完整的备份.再次重复:对dba而言,数据库备份重于一切

记录一次AIX 4.3.0+ORACLE 8.0.5恢复过程

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:记录一次AIX 4.3.0+ORACLE 8.0.5恢复过程

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最近接手一个AIX下面ORACLE 8.0.5恢复需求.这个需求要从2个月前说起.2个月前有客户提出需求,帮他们恢复8.0.5的库(该库被9.2.0.4的ORACLE强制拉过,但是未成功).当我准备去恢复的时候,客户和我说数据库的存储找不到了,暂时不用处理.本来这个事情到此也就算结束了,最多算一个笑话(找人数据库恢复,发现数据库丢了).前几天又接到关于这个恢复的请求,说他们还有磁带的备份(备份方式:begin tablespace backup+cp+archivelog),让我去协助恢复.按照客户的描述,他们购买了国外的读磁带机器可以读取磁带到硬盘,他们解压好数据文件,然后我去恢复,而且数据文件,控制文件,归档日志都存在.我当时认为这个是一个简单的恢复,顺利的话,一个小时就可以搞定.这里告诉我们:哪怕是N久的备份,也可能是最后的救命数据(2004年的磁带备份),一定要做好备份
今天到达现场,客户第一句话:文件没有解压,第二句话:归档日志没有恢复出来.听到这里,我有点感觉情况很不妙.
详细一问:文件没有解压是因为AIX 4.3.0默认的是文件系统格式是jfs,最大支持lv的大小为20g(估计和客户参数有关系),现在如果解压需要分到多个目录中(数据库原始运行就是按照该模式进行的,如:oradata001,oradata002等分别放几个数据文件),需要我去给他们做规划,如果存放这些文件.大脑一晕,dba原来真的什么活都要干啊.本身就对AIX不熟悉,还要去想办法处理这些问题,而且是从来没有见过的AIX 4.3.0.最后通过我和客户的一起努力终于解决了这个问题:使用其他技巧在AIX 4.3中建立jfs2文件系统(先划分小的LV,使用jfs系统,然后修改系统为jfs2,然后增加lv大小),解决以前jfs文件系统限制,解压的时候需要规划文件目录的体力活.这个问题告诉我们:有时候解决问题需要学会变通
归档日志没有恢复出来的原因:因为连续几天的恢复,加上客户本身工作繁忙,可能实在是太累,在最后一盘磁带的恢复的时候(一盘磁带15小时,一共4盘),客户敲错了命令tar -xvf输入成了-cvf,使得磁盘头被覆盖,磁带原则上报废,从而使得归档日志无法恢复出来.这一点点的事故告诉我们:越疲劳越容易出错,越到最后越容易出错,一定要小心谨慎
到这一步,没有解压(已经解决文件系统问题,接下来的解压问题不大),没有归档(修改scn原则上可以解决),这些东西总的来说问题都不大,当我安装好AIX FOR ORACLE 8.0.5,启动数据库到mount状态,核对恢复出来的数据文件和控制文件中的数据文件的时候,发现少了好几个,这下不能容忍了(能够open库,但是可能丢失需要数据,这个太不划算[因为用户还有该备份的前几天的备份]),寻找出现数据文件从磁带中丢失原因:1.因为jfs文件系统限制,不停的mv到其他目录导致丢失.2.uncompress解压丢失.3.最后一盘磁带损坏导致丢失.一切原因都是浮云,解决了jfs2文件系统,客户根据当前的情况,决定使用其他的备份再次从磁带中导出,然后进行恢复
AIX 4.3安装ORACLE 8.0.5

--检查内存
 lsattr -El sys0 -a realmem
--检查交换分区
 lsps -a
--检查临时目录
 df -k /tmp
--检查操作系统位数
getconf HARDWARE_BITMODE
bootinfo -y
--检查操作系统版本号
 oslevel -r
--检查软件包
 lslpp -l bos.adt.base, bos.adt.libm
--检查补丁包
 instfix -i | grep IX71948
--升级aix
smit install_latest
smit update_all
增加用户:
useradd   oracle
增加组:
mkgrp   dba
更改用户所属组:
usermod   -g  dba oracle
更改用户密码:
passwd   oracle
pwdadm   oracle
--关于用户
smit mkuser	建立用户
smit lsuser	列出所有用户的属性
lsuser ALL
smit chuser	改变用户属性
rmuser -p *	删除用户*
smit rmuser	只删除用户,但是不删除所属目录,等于rmuser
smit passwd	修改密码
smit lockuser	给用户加锁
--关于用户组
smit mkgroup	建立新组
smit lsgroup	显示所有组的属性
smit chgroup	修改组的属性
smit rmgroup	删除*组
rmgroup *
配置shell limits( smit chuser)
  soft FILE size -1
  soft CPU time -1
  soft DATA segment -1
  soft STACK size -1
新建目录:
mkdir   /u01
更改目录属主:
chown   oracle   /u01
更改目录所属组:
chgrp   dba   /u01
vi   /home/oracle/.profile
export   LINK_CNTRL=L_PTHREADS_D7
export   NLS_LANG=american_america.zhs16cgb231280
export	 ORACLE_OWNER=oracle
export   ORACLE_TERM=vt100
#export   ORACLE_TERM=xterm
export   ORACLE_BASE=/oracle
export   ORACLE_HOME=$ORACLE_BASE/product/8.0.5
export   ORACLE_SID=ora8
export   LD_LIBRARY_PATH=$/ORACLE_HOME/lib:$LD_LIBRARY_PATH
export   LIBPATH=$ORACLE_HOME/lib:$LIBPATH
export   ORA_NLS32=$ORACLE_HOME/ocommon/nls/admin/data
export   PATH=$ORACLE_HOME/bin:$PATH
export   TMPDIR=/tmp
export   DISPLAY=172.100.1.2:0.0
set -o vi
umask 022
--Mount产品光盘
$   su   root
-查看光驱
$ lsdev -Cc cdrom
#   mkdir   /cdrom
#   chmod   777   /cdrom
#   /etc/mount   -rv   cdrfs   /dev/cd0   /cdrom
#   exit
--运行rootpre.sh脚本。
$   su   root
#   cd   /cdrom/orainst
#   ./rootpre.sh
#   exit
--运行安装程序
$   cd   /cdrom/orainst
-图形
$   ./orainst   /m
-字符
$   ./orainst   /c
# cd $ORACLE_HOME/orainst
# ./root.sh

建立密码文件

cd $ORACLE_HOME/dbs
orapwd file=orapw$ORACLE_SID password=oracle

创建pfile文件

vi $ORACLE_HOME/dbs/init.ora
db_name=ORCL
db_files = 5000
control_files = /oradata/ctl1ORCL.ora
db_file_multiblock_read_count =  8
db_block_buffers =  100000
shared_pool_size =  115343360
log_checkpoint_interval = 10000
processes =  590
parallel_max_servers = 5
log_buffer =  163840
sequence_cache_entries =  100
sequence_cache_hash_buckets =  90
max_dump_file_size = 102400
global_names = TRUE
background_dump_dest=/oracle/trace
user_dump_dest=/oracle/trace
db_block_size = 4196
remote_login_passwordfile = shared
text_enable = TRUE
job_queue_processes = 2
job_queue_interval = 10
job_queue_keep_connections = false
distributed_lock_timeout = 300
distributed_transactions = 5
open_links = 4

操作8.0.5数据库

--结果测试与win,linux/unix使用svrmgrl命令
C:\oracle\ora80\BIN>SVRMGR30.EXE
Oracle Server Manager Release 3.0.5.0.0 - Production
(c) Copyright 1997, Oracle Corporation.  All Rights Reserved.
Oracle8 Release 8.0.5.0.0 - Production
PL/SQL Release 8.0.5.0.0 - Production
SVRMGR> connect system/manager
Connected.
SVRMGR> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle8 Release 8.0.5.0.0 - Production
PL/SQL Release 8.0.5.0.0 - Production
CORE Version 4.0.5.0.0 - Production
TNS for 32-bit Windows: Version 8.0.5.0.0 - Production
NLSRTL Version 3.3.2.0.0 - Production
5 rows selected.
SVRMGR> connect internal/oracle
Connected.
SVRMGR> archive log list
Database log mode              No Archive Mode
Automatic archival             Disabled
Archive destination            %RDBMS80%\
Oldest online log sequence     3
Current log sequence           6
SVRMGR> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SVRMGR> startup mount;
ORACLE instance started.
Total System Global Area                         15077376 bytes
Fixed Size                                          49152 bytes
Variable Size                                    12906496 bytes
Database Buffers                                  2048000 bytes
Redo Buffers                                        73728 bytes
Database mounted.
SVRMGR> alter database archivelog;
Statement processed.
SVRMGR> archive log list;
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            C:\oracle\ora80\RDBMS80\
Oldest online log sequence     3
Next log sequence to archive   6
Current log sequence           6