undo segment header坏块异常恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:undo segment header坏块异常恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

alert日志报ORA-00600[4137]与ORA-00600 [4198]错误
数据库报如下错误,运行一段时间数据库自动down掉

Fri Jul  6 18:00:40 2012
SMON: ignoring slave err,downgrading to serial rollback
Fri Jul  6 18:00:41 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_smon_16636.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
ORACLE Instance techdb (pid = 8) - Error 600 encountered while recovering transaction (3, 17).
Fri Jul  6 18:00:41 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_smon_16636.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Fri Jul  6 18:05:53 2012
SMON: Restarting fast_start parallel rollback
Fri Jul  6 18:05:54 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_p000_17124.trc:
ORA-00600: internal error code, arguments: [4198], [9], [], [], [], [], [], []
…………
Wed Jul  6 18:50:38 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_pmon_4473.trc:
ORA-00474: SMON process terminated with error
Wed Jul  6 18:50:38 2012
PMON: terminating instance due to error 474

从三个地方得出3号回滚段异常
1.trace文件

SMON: about to recover undo segment 3
Parallel Transaction recovery caught exception 12801
Parallel Transaction recovery caught error 30317
*** 2012-07-06 17:55:19.042
SMON: Restarting fast_start parallel rollback
SMON: about to recover undo segment 3
SMON: mark undo segment 3 as available
SMON: about to recover undo segment 3
SMON: mark undo segment 3 as available
Parallel Transaction recovery caught exception 12801
Parallel Transaction recovery caught error 607
*** 2012-07-06 17:55:19.761
SMON: ignoring slave err,downgrading to serial rollback
SMON: about to recover undo segment 3
XID passed in =xid: 0x0003.011.00003c2b
XID from Undo block =xid: 0x0004.020.00002b35

2.alert中提示while recovering transaction (3, 17)
3.查询dba_rollback_segs发现_SYSSMU3$是NEED RECOVERY状态

尝试删除_SYSSMU3$
使用隐含参数_offline_rollback_segments= _SYSSMU3$

Fri Jul  6 18:16:19 2012
Completed: ALTER DATABASE OPEN
Fri Jul  6 18:16:56 2012
drop rollback segment "_SYSSMU3$"
Fri Jul  6 18:16:57 2012
Errors in file /usr/local/oracle/admin/techdb/udump/techdb_ora_17381.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [2], [41], [38508], [], [], [], []
Fri Jul  6 18:16:57 2012
Doing block recovery for file 2 block 41
Block recovery from logseq 209591, block 183 to scn 7788878085
Fri Jul  6 18:16:57 2012
Recovery of Online Redo Log: Thread 1 Group 1 Seq 209591 Reading mem 0
  Mem# 0 errs 0: /usr/local/oracle/oradata/techdb/redo01.log
Block recovery completed at rba 209591.225.16, scn 1.3493910790
ORA-607 signalled during: drop rollback segment "_SYSSMU3$"...
Fri Jul  6 18:16:57 2012
Corrupt Block Found
         TSN = 1, TSNAME = UNDOTBS1
         RFN = 2, BLK = 41, RDBA = 8388649
         OBJN = 0, OBJD = -1, OBJECT = _NEXT_OBJECT, SUBOBJECT =
         SEGMENT OWNER = SYS, SEGMENT TYPE = Invalid Type
Fri Jul  6 18:16:57 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_smon_17367.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [2], [41], [38508], [], [], [], []
Doing block recovery for file 2 block 41
Block recovery from logseq 209591, block 183 to scn 7788878085
Fri Jul  6 18:17:46 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_pmon_17355.trc:
ORA-00474: SMON process terminated with error
Fri Jul  6 18:17:46 2012
PMON: terminating instance due to error 474
Fri Jul  6 18:17:46 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_dbw0_17361.trc:
ORA-00474: SMON process terminated with error
Fri Jul  6 18:17:46 2012
Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_lgwr_17363.trc:
ORA-00474: SMON process terminated with error
Instance terminated by PMON, pid = 17355

这里可以看出在使用隐含参数删除异常回滚段的时候,因为该回滚段有坏块出现ORA-00600[kddummy_blkchk]使得数据库donw掉,重启过几次该库都因为这个错误直接down.
查看trace文件发现

SMON: about to recover undo segment 3
SMON: mark undo segment 3 as needs recovery
*** 2012-07-06 18:16:57.734
Block Checking: DBA = 8388649, Block Type = System Managed Segment Header Block
ERROR: SMU Segment Header Corrupted.  Error Code = 38508
ktu4smck: starting extent(0x77) of txn slot #0x11 is  invalid.
  valid value (0 - 0x76)
  TRN CTL:: seq: 0xed38 chd: 0x0020 ctl: 0x002a inc: 0x00000000 nfb: 0x0000
            mgc: 0x8201 xts: 0x0068 flg: 0x0001 opt: 2147483646 (0x7ffffffe)
            uba: 0x00a6610a.ed38.1d scn: 0x0001.d030de86
            Version: 0x01

因为该库是因为undo的3号回滚段的header出现坏块,即使使用了隐含参数屏蔽该回滚段恢复,smon进程依然会去读回滚段header,从而出现该错误导致直接down掉.

处理方案
1.使用隐含参数屏蔽异常回滚段_offline_rollback_segments= _SYSSMU3$
2.修改undo_tablespace=SYSTEM/undo_management=MANUAL
3.启动数据库,快速删除包含_SYSSMU3$ undo表空间
4.新建undo表空间
5.修改undo_tablespace=new_undo/undo_management=AUTO,除掉隐含参数
6.使用新参数文件重启数据库
7.建议:使用逻辑导出导入重建数据库

使用copy实现long类型转移表空间

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:使用copy实现long类型转移表空间

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在一次8.1.6的数据库恢复过程中,发现一个表空间的数据文件损坏,在转移该表空间相关表时,遇到让人郁闷的long类型.不能使用ctas和move来实现转移,最后通过古老的copy来实现该项工作.
模拟LONG类型表

SQL> create table chf.t_long (id number,name long) tablespace ts_xifenfei;
Table created.
SQL> insert into chf.t_long  select object_id,object_name from dba_objects where rownum<10;
9 rows created.
SQL> commit;
Commit complete.
SQL> desc chf.t_long
 Name                                      Null?    Type
 ----------------------------------------- -------- -----------------
 ID                                                 NUMBER
 NAME                                               LONG

测试ctas和move

SQL> create table chf.t_long_bak
  2  as
  3  select * from chf.t_long;
select * from chf.t_long
       *
ERROR at line 3:
ORA-00997: illegal use of LONG datatype
SQL> alter table chf.t_long move tablespace users;
alter table chf.t_long move tablespace users
*
ERROR at line 1:
ORA-00997: illegal use of LONG datatype

使用copy实现LONG表跟换表空间

SQL> SET LONG 1000
SQL> select dbms_metadata.get_ddl('TABLE','T_LONG','CHF') from dual;
DBMS_METADATA.GET_DDL('TABLE','T_LONG','CHF')
--------------------------------------------------------------------------------
  CREATE TABLE "CHF"."T_LONG"
   (    "ID" NUMBER,
        "NAME" LONG
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "TS_XIFENFEI"
SQL>   CREATE TABLE "CHF"."T_LONG_BAK"
  2     (    "ID" NUMBER,
  3          "NAME" LONG
  4     ) SEGMENT CREATION IMMEDIATE
  5    PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
  6   NOCOMPRESS LOGGING
  7    STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  8    PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  9    BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
 10    TABLESPACE "USERS";
Table created.
SQL> copy from chf/xifenfei@ora11g_d INSERT chf.t_long_bak using select * from chf.t_long;
Array fetch/bind size is 15. (arraysize is 15)
Will commit when done. (copycommit is 0)
Maximum long size is 80. (long is 80)
   9 rows selected from chf@ora11g_d.
   9 rows inserted into CHF.T_LONG_BAK.
   9 rows committed into CHF.T_LONG_BAK at DEFAULT HOST connection.
SQL> alter table t_long rename to t_long_old;
Table altered.
SQL> alter table t_long_bak rename to t_long;
Table altered.
SQL> select tablespace_name,table_name from dba_tables where table_name like 'T_LONG%';
TABLESPACE_NAME TABLE_NAME
--------------- ---------------
TS_XIFENFEI     T_LONG_OLD
USERS           T_LONG
SQL> DROP TABLE T_LONG_OLD PURGE;
Table dropped.

Oracle 11g丢失access$恢复方法

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:Oracle 11g丢失access$恢复方法

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最近接触到两个案例都是11g数据库因为异常关闭导致access$表丢失,使得数据库不能正常open.为什么这个表会丢失还未找到原因.我这里提供一种在upgrade模式下解决给问题方法.
数据库版本

SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') "xifenfei" from dual;
xifenfei
--------------------------------------
2012-06-22 05:28:57

数据库启动报ORA-00704

SQL> startup
ORACLE instance started.
Total System Global Area  523108352 bytes
Fixed Size                  1346052 bytes
Variable Size             448792060 bytes
Database Buffers           67108864 bytes
Redo Buffers                5861376 bytes
Database mounted.
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-00942: table or view does not exist
Process ID: 1782
Session ID: 125 Serial number: 5

找出ORA-00704报错原因

SQL> conn / as sysdba
Connected to an idle instance.
SQL> startup mount;
ORACLE instance started.
Total System Global Area  523108352 bytes
Fixed Size                  1346052 bytes
Variable Size             448792060 bytes
Database Buffers           67108864 bytes
Redo Buffers                5861376 bytes
Database mounted.
SQL> oradebug setmypid
Statement processed.
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
Statement processed.
SQL> oradebug TRACEFILE_NAME
/u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_2010.trc
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-00942: table or view does not exist
Process ID: 2010
Session ID: 125 Serial number: 5

查看trace文件发现

PARSE ERROR #3063868604:len=56 dep=1 uid=0 oct=3 lid=0 tim=1340312320595472 err=942
select order#,columns,types from access$ where d_obj#=:1
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-00942: table or view does not exist
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-00942: table or view does not exist
*** 2012-06-22 04:58:40.596
USER (ospid: 2010): terminating the instance due to error 704

启动数据库至upgrade模式

SQL> startup  upgrade
ORACLE instance started.
Total System Global Area  523108352 bytes
Fixed Size                  1346052 bytes
Variable Size             448792060 bytes
Database Buffers           67108864 bytes
Redo Buffers                5861376 bytes
Database mounted.
Database opened.

创建access$表和index

SQL> create table access$
  2  ( d_obj#        number not null,
  3    order#        number not null,
  4    columns       raw(126),
  5    types         number not null)
  6    storage (initial 10k next 100k maxextents unlimited pctincrease 0)
  7  /
Table created.
SQL> create index i_access1 on
  2    access$(d_obj#, order#)
  3    storage (initial 10k next 100k maxextents unlimited pctincrease 0)
  4  /
Index created.
--创建语句可以在?\RDBMS\ADMIN\dcore.bsq中找到

重启数据库

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area  523108352 bytes
Fixed Size                  1346052 bytes
Variable Size             448792060 bytes
Database Buffers           67108864 bytes
Redo Buffers                5861376 bytes
Database mounted.
Database opened.

access$表作用(感谢vmcd同学提供)
When a database object is first referenced in a PL/SQL program, the PL/SQL engine checks the ACCESS$ table (owned by SYS) to see if the executor of the program has authority on that database object.
对于access$表丢失以前记录是否对系统产生严重影响还未知,希望知道的朋友告知下

忘记执行end backup命令数据库恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:忘记执行end backup命令数据库恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

遇到两次begin backup忘记end backup导致的悲剧.虽然不是自己亲身经历,但是感触很深,这里做了一个小实验,说明在begin backup后忘记end backup,而又丢失了备份归档日志,且数据库异常重启的事故恢复(这里为了加大实验难道,并且使用begin backup命令后的热备文件恢复)
模拟begin end

SQL> archive log list;
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            /u01/oracle/oradata/xifenfei/archive
Oldest online log sequence     37
Next log sequence to archive   39
Current log sequence           39
SQL> alter tablespace bbed begin backup;
Tablespace altered.
SQL> archive log list;
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            /u01/oracle/oradata/xifenfei/archive
Oldest online log sequence     37
Next log sequence to archive   39
Current log sequence           39
SQL> drop table chf.t_xff;
Table dropped.
SQL> create table chf.t_xff
  2    as
  3  select * from dba_objects;
Table created.
SQL> alter system switch logfile;
System altered.
SQL>   delete from chf.t_XFF;
30811 rows deleted.
SQL>   commit;
Commit complete.
SQL>   alter system  switch logfile;
System altered.
SQL> /
System altered.
SQL> archive log list;
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            /u01/oracle/oradata/xifenfei/archive
Oldest online log sequence     40
Next log sequence to archive   42
Current log sequence           42

cp备份文件

[oracle@xifenfei xifenfei]$ cp bbed01.dbf bbed01.dbf_05
[oracle@xifenfei xifenfei]$ cp bbed02.dbf bbed02.dbf_05

继续操作数据库

SQL> alter system switch logfile;
System altered.
SQL>   insert into chf.t_xff
  2    select * from dba_objects;
30811 rows created.
SQL> commit;
Commit complete.
SQL>  archive log list;
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            /u01/oracle/oradata/xifenfei/archive
Oldest online log sequence     41
Next log sequence to archive   43
Current log sequence           43
SQL> alter system switch logfile;
System altered.

模拟异常关闭数据库

SQL> shutdown immediate;
ORA-01149: cannot shutdown - file 11 has online backup set
ORA-01110: data file 11: '/u01/oracle/oradata/xifenfei/bbed01.dbf'
SQL> shutdown abort;
ORACLE instance shut down.

删除部分归档日志(模拟归档日志丢失)

[oracle@xifenfei archive]$ mv 1_39.dbf 1_39.dbf_bak
[oracle@xifenfei archive]$ mv 1_40.dbf 1_40.dbf_bak

启动数据库

[oracle@xifenfei xifenfei]$ sqlplus "/ as sysdba"
SQL*Plus: Release 9.2.0.4.0 - Production on Tue Jun 5 03:02:56 2012
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area  353441008 bytes
Fixed Size                   451824 bytes
Variable Size             184549376 bytes
Database Buffers          167772160 bytes
Redo Buffers                 667648 bytes
Database mounted.
ORA-01113: file 11 needs media recovery
ORA-01110: data file 11: '/u01/oracle/oradata/xifenfei/bbed01.dbf'

分析相关SCN

SQL> select file#,online_status "STATUS",to_char(change#,'9999999999999999') "SCN",
  2  To_char(time,'yyyy-mm-dd hh24:mi:ss')"TIME" from v$recover_file;
     FILE# STATUS  SCN               TIME
---------- ------- ----------------- -------------------
        11 ONLINE     12286828683164 2012-06-05 02:55:43
        12 ONLINE     12286828683164 2012-06-05 02:55:43
SQL> select file#,to_char(checkpoint_change#,'999999999999999') "SCN",
  2  to_char(last_change#,'999999999999999')"STOP_SCN" from v$datafile;
     FILE# SCN              STOP_SCN
---------- ---------------- ----------------
         1   12286828684636
         2   12286828684636
         3   12286828684636
         4   12286828684636
         5   12286828684636
         6   12286828684636
         7   12286828684636
         8   12286828684636
         9   12286828684636
        10   12286828684636
        11   12286828683164
        12   12286828683164
12 rows selected.
SQL> select file#,to_char(checkpoint_change#,'9999999999999999') "SCN",
  2  to_char(RESETLOGS_CHANGE#,'9999999999999999') "RESETLOGS SCN"
  3  from v$datafile_header;
     FILE# SCN               RESETLOGS SCN
---------- ----------------- -----------------
         1    12286828684636            174968
         2    12286828684636            174968
         3    12286828684636            174968
         4    12286828684636            174968
         5    12286828684636            174968
         6    12286828684636            174968
         7    12286828684636            174968
         8    12286828684636            174968
         9    12286828684636            174968
        10    12286828684636            174968
        11    12286828683164            174968
        12    12286828683164            174968
12 rows selected.
SQL> select file#,to_char(CHANGE#,'9999999999999999') "SCN",
  2  to_char(TIME,'yyyy-mm-dd hh24:mi:ss') "TIME" from v$backup;
     FILE# SCN               TIME
---------- ----------------- -------------------
         1                 0
         2                 0
         3                 0
         4                 0
         5                 0
         6                 0
         7                 0
         8                 0
         9                 0
        10                 0
        11    12286828683164 2012-06-05 02:55:43
        12    12286828683164 2012-06-05 02:55:43
12 rows selected.

发现数据库未end backup

Tue Jun  5 02:55:43 2012
alter tablespace bbed begin backup
Tue Jun  5 02:55:43 2012
Completed: alter tablespace bbed begin backup

尝试end backup
出现这个错误是正常的,因为我替换回来的bbed表空间数据文件的版本信息可能和控制文件的不一致,解决方法是重建控制文件

SQL> alter tablespace bbed end backup;
alter tablespace bbed end backup
*
ERROR at line 1:
ORA-01235: END BACKUP failed for 2 file(s) and succeeded for 0
ORA-01122: database file 12 failed verification check
ORA-01110: data file 12: '/u01/oracle/oradata/xifenfei/bbed02.dbf'
ORA-01208: data file is an old version - not accessing current version
ORA-01122: database file 11 failed verification check
ORA-01110: data file 11: '/u01/oracle/oradata/xifenfei/bbed01.dbf'
ORA-01208: data file is an old version - not accessing current version

重建控制文件

SQL> shutdown abort;
ORACLE instance shut down.
SQL> STARTUP NOMOUNT
Total System Global Area  353441008 bytes
Fixed Size                   451824 bytes
Variable Size             184549376 bytes
Database Buffers          167772160 bytes
Redo Buffers                 667648 bytes
SQL>@ctl.sql
Control file created.

尝试恢复数据库

SQL> recover database;
ORA-00279: change 12286828683164 generated at 06/05/2012 02:55:43 needed for
thread 1
ORA-00289: suggestion : /u01/oracle/oradata/xifenfei/archive/1_39.dbf
ORA-00280: change 12286828683164 for thread 1 is in sequence #39
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
auto
ORA-00308: cannot open archived log
'/u01/oracle/oradata/xifenfei/archive/1_39.dbf'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
ORA-00308: cannot open archived log
'/u01/oracle/oradata/xifenfei/archive/1_39.dbf'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3

执行end backup

SQL> alter tablespace bbed end backup;
Tablespace altered.

再次查看相关SCN
可以发现end backup之后,datafile header 的scn发生了改变,说明begin backup主要是冻住了datafile header scn

SQL> select file#,online_status "STATUS",to_char(change#,'9999999999999999') "SCN",
  2  To_char(time,'yyyy-mm-dd hh24:mi:ss')"TIME" from v$recover_file;
     FILE# STATUS  SCN               TIME
---------- ------- ----------------- -------------------
         1 ONLINE     12286828684636 2012-06-05 03:00:46
         2 ONLINE     12286828684636 2012-06-05 03:00:46
         3 ONLINE     12286828684636 2012-06-05 03:00:46
         4 ONLINE     12286828684636 2012-06-05 03:00:46
         5 ONLINE     12286828684636 2012-06-05 03:00:46
         6 ONLINE     12286828684636 2012-06-05 03:00:46
         7 ONLINE     12286828684636 2012-06-05 03:00:46
         8 ONLINE     12286828684636 2012-06-05 03:00:46
         9 ONLINE     12286828684636 2012-06-05 03:00:46
        10 ONLINE     12286828684636 2012-06-05 03:00:46
        11 ONLINE     12286828683821 2012-06-05 02:56:26
        12 ONLINE     12286828683821 2012-06-05 02:56:26
12 rows selected.
SQL> select file#,to_char(checkpoint_change#,'999999999999999') "SCN",
  2  to_char(last_change#,'999999999999999')"STOP_SCN" from v$datafile;
     FILE# SCN              STOP_SCN
---------- ---------------- ----------------
         1   12286828684636
         2   12286828684636
         3   12286828684636
         4   12286828684636
         5   12286828684636
         6   12286828684636
         7   12286828684636
         8   12286828684636
         9   12286828684636
        10   12286828684636
        11   12286828684636
        12   12286828684636
12 rows selected.
SQL> select file#,to_char(checkpoint_change#,'9999999999999999') "SCN",
  2  to_char(RESETLOGS_CHANGE#,'9999999999999999') "RESETLOGS SCN"
  3  from v$datafile_header;
     FILE# SCN               RESETLOGS SCN
---------- ----------------- -----------------
         1    12286828684636            174968
         2    12286828684636            174968
         3    12286828684636            174968
         4    12286828684636            174968
         5    12286828684636            174968
         6    12286828684636            174968
         7    12286828684636            174968
         8    12286828684636            174968
         9    12286828684636            174968
        10    12286828684636            174968
        11    12286828683821            174968
        12    12286828683821            174968
12 rows selected.

再次尝试恢复数据库

SQL> recover database;
ORA-00279: change 12286828683821 generated at 06/05/2012 02:56:26 needed for
thread 1
ORA-00289: suggestion : /u01/oracle/oradata/xifenfei/archive/1_41.dbf
ORA-00280: change 12286828683821 for thread 1 is in sequence #41
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
auto
Log applied.
Media recovery complete.
SQL> alter database open;
Database altered.

总结说明
在数据库忘记end backup,而又被异常重启数据库时候,会提示你需要恢复.这个时候如果你有所有的归档日志,那没有任何问题,直接recover就可以了.如果因为begin backup命令执行比较久,部分归档日志丢失,这个时候不能直接recover,可以先尝试end backup,然后在recover.如果在这个时候还发现有部分日志不存在,那只能考虑bbed修改datafile header的scn.
温馨提醒:各位dba在执行begin backup之后一定要记得end backup

undo坏块导致数据库异常终止案例

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:undo坏块导致数据库异常终止案例

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在处理的众多undo问题的数据库中,这个是第一例遇到因为undo出现坏块导致数据库自动down案例(oracle 9.2.0.8 aix)
undo坏块导致数据库down

Fri Jun  1 00:45:13 2012
Successfully onlined Undo Tablespace 1.
Fri Jun  1 00:45:13 2012
SMON: enabling tx recovery
Fri Jun  1 00:45:13 2012
Database Characterset is ZHS16GBK
Fri Jun  1 00:45:13 2012
SMON: about to recover undo segment 52
SMON: about to recover undo segment 52
SMON: mark undo segment 52 as available
SMON: Restarting fast_start parallel rollback
SMON: about to recover undo segment 52
SMON: mark undo segment 52 as available
SMON: ignoring slave err,downgrading to serial rollback
SMON: about to recover undo segment 52
ORACLE Instance acc1 (pid = 9) - Error 1578 encountered while recovering transaction (52, 29).
Fri Jun  1 00:45:14 2012
Errors in file /oraacc/app/admin/acc/bdump/acc1_smon_734734.trc:
ORA-01578: ORACLE data block corrupted (file # 169, block # 55887)
ORA-01110: data file 169: '/dev/raccount07_01lv'
Fri Jun  1 00:45:15 2012
replication_dependency_tracking turned off (no async multimaster replication found)
Completed: alter database open
Fri Jun  1 00:45:16 2012
Errors in file /oraacc/app/admin/acc/bdump/acc1_pmon_766940.trc:
ORA-00474: SMON process terminated with error
Fri Jun  1 00:45:16 2012
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 766940

这里可以看出数据库因为回滚段52出现坏块导致回滚的时候smon终止,数据库down

检测坏块

$ dbv file='/dev/raccount07_01lv' blocksize=8192
DBVERIFY: Release 9.2.0.8.0 - Production on Fri Jun 1 01:25:10 2012
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.
DBVERIFY - Verification starting : FILE = /dev/raccount07_01lv
DBV-00200: Block, dba 708893135, already marked corrupted
DBV-00200: Block, dba 708893151, already marked corrupted
DBV-00200: Block, dba 708893263, already marked corrupted
DBVERIFY - Verification complete
Total Pages Examined         : 1048320
Total Pages Processed (Data) : 0
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 1048320
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 0
Total Pages Marked Corrupt   : 3
Total Pages Influx           : 0
Highest block SCN            : 12843497215383 (2990.1545000343)

检测对应文件号/数据块号

SQL> select DBMS_UTILITY.data_block_address_file(708893135) file#,
  2  DBMS_UTILITY.data_block_address_block(708893135) block#  from dual;
     FILE#     BLOCK#
---------- ----------
       169      55759
SQL> select DBMS_UTILITY.data_block_address_file(708893151) file#,
  2  DBMS_UTILITY.data_block_address_block(708893151) block#  from dual;
     FILE#     BLOCK#
---------- ----------
       169      55775
SQL> select DBMS_UTILITY.data_block_address_file(708893263) file#,
  2  DBMS_UTILITY.data_block_address_block(708893263) block#  from dual;
     FILE#     BLOCK#
---------- ----------
       169      55887

解决办法

--隐含参数
_corrupted_rollback_segments= _SYSSMU52$
--open库后
alter session set "_smu_debug_mode"=4;
drop rollback segment "_SYSSMU52$";

rac redo log file被意外覆盖数据库恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:rac redo log file被意外覆盖数据库恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

当前日志被覆盖导致错误
朋友的一客户在一套rac上包含了两个数据库,其其中一个库增加redo group时候,覆盖了另外一个库的redo,悲剧的是刚好是current redo

Wed May 16 17:03:05 2012
ALTER DATABASE OPEN
This instance was first to open
Wed May 16 17:03:09 2012
Beginning crash recovery of 2 threads
 parallel recovery started with 15 processes
Wed May 16 17:03:11 2012
Started redo scan
Wed May 16 17:03:11 2012
Errors in file /oracle/admin/odsdb/udump/odsdb1_ora_2040024.trc:
ORA-00305: log 14 of thread 1 inconsistent; belongs to another database
ORA-00312: online log 14 thread 1: '/dev/rods_redo1_2_2'
ORA-00305: log 14 of thread 1 inconsistent; belongs to another database
ORA-00312: online log 14 thread 1: '/dev/rods_redo1_2_1'
Abort recovery for domain 0
Wed May 16 17:03:11 2012
Aborting crash recovery due to error 305
Wed May 16 17:03:11 2012
Errors in file /oracle/admin/odsdb/udump/odsdb1_ora_2040024.trc:
ORA-00305: log 14 of thread 1 inconsistent; belongs to another database
ORA-00312: online log 14 thread 1: '/dev/rods_redo1_2_2'
ORA-00305: log 14 of thread 1 inconsistent; belongs to another database
ORA-00312: online log 14 thread 1: '/dev/rods_redo1_2_1'
ORA-305 signalled during: ALTER DATABASE OPEN...
Wed May 16 17:03:13 2012
Shutting down instance (abort)

使用_allow_resetlogs_corruption= TRUE进行恢复

Wed May 16 18:16:48 2012
SMON: enabling cache recovery
Wed May 16 18:16:48 2012
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
Wed May 16 18:16:48 2012
Errors in file /oracle/admin/odsdb/udump/odsdb1_ora_2105454.trc:
ORA-00600: internal error code, arguments: [kclchkblk_4], [2522], [18446744072024280773],
[2522], [18446744072024247666], [], [], []
Wed May 16 18:16:50 2012
Errors in file /oracle/admin/odsdb/udump/odsdb1_ora_2105454.trc:
ORA-00600: internal error code, arguments: [kclchkblk_4], [2522], [18446744072024280773],
[2522], [18446744072024247666], [], [], []
Wed May 16 18:16:50 2012
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 2105454
ORA-1092 signalled during: alter database open resetlogs...

ORA-600[KCLCHKBLK_4], is signaled because the SCN in a tempfile block is too high.
The same reason caused the ORA-600[2662]s in the alert logs.
因为是临时文件的scn太大的问题,那就比较好解决:
启动数据库到mount状态,查询出来相关temp file,然后drop掉.

ORA-00600[6856]

Wed May 16 20:25:16 2012
Errors in file /oracle/admin/odsdb/bdump/odsdb1_smon_2482210.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/dev/rods_redo2_1_1'
ORA-00600: internal error code, arguments: [6856], [0], [0], [], [], [], [], []
ORACLE Instance odsdb1 (pid = 16) - Error 600 encountered while recovering transaction
(10, 8) on object 7162533.
Wed May 16 20:25:16 2012
Errors in file /oracle/admin/odsdb/bdump/odsdb1_smon_2482210.trc:
ORA-00600: internal error code, arguments: [6856], [0], [0], [], [], [], [], []

这里的4193和4194是比较熟悉的,根据这里的提示猜测6856也是和undo有关系
ORA-600[6856]SMON is trying to recover a dead transaction.
But the undo application runs into an internal error (trying to delete a row that is already deleted).
因为smon回滚的时候出现上面错误,解决方法是想办法终止回滚,使用event=”10513 trace name context forever, level 2″.

ORA-00600[4193]/ORA-00600[4194]

Wed May 16 20:25:17 2012
Errors in file /oracle/admin/odsdb/udump/odsdb1_ora_2547936.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/dev/rods_redo2_1_1'
ORA-00600: internal error code, arguments: [4194], [22], [25], [], [], [], [], []
Wed May 16 20:25:18 2012
Errors in file /oracle/admin/odsdb/udump/odsdb1_ora_2547936.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/dev/rods_redo2_1_1'
ORA-00600: internal error code, arguments: [4194], [22], [25], [], [], [], [], []
Wed May 16 20:25:56 2012
Errors in file /oracle/admin/odsdb/udump/odsdb1_ora_2547936.trc:
ORA-00600: internal error code, arguments: [4193], [22248], [22252], [], [], [], [], []

太常见错误,不再做说明,虽然使用event是的库open成功,因为部分回滚段有问题,该错误还是会出现(还是喜欢直接屏蔽回滚段)

ORA-00600[ktpridestroy2]

Wed May 16 20:36:26 2012
Errors in file /oracle/admin/odsdb/bdump/odsdb1_smon_2101296.trc:
ORA-00600: internal error code, arguments: [ktpridestroy2], [], [], [], [], [], [], []

This error could be the result of a corruption and involves the parallel rollback that SMON enables each startup.
解决:fast_start_parallel_rollback=false

ORA-00600[kturacf1]/ORA-00600[kcbgcur_9]

Wed May 16 20:49:15 2012
Errors in file /oracle/admin/odsdb/bdump/odsdb1_j000_2007088.trc:
ORA-00600: internal error code, arguments: [kturacf1], [2097152], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kcbgcur_9], [780140563], [4], [4294959056], [2097152], [], [], []

ORA-00600[kturacf1]错误未查询到原因
ORA-00600[kcbgcur_9]错误原因可能是:Buffers are pinned in a specific class order to prevent internal deadlocks.
因为这两个错误是job产生非致命错误,在这次的处理过程中可以忽略

ORA-00600[4097]

Wed May 16 21:05:05 2012
Errors in file /oracle/admin/odsdb/bdump/odsdb1_j000_1716282.trc:
ORA-12012: error on auto execute of job 6603
ORA-20001: ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
ORA-06512: at "EPBI.UP_SYSLOG_ONLINE_USER", line 141
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []

When an instance has a rollback segment offline and the instance crashes, or
the user does a shutdown abort, the rollback segment wrap number does not get
updated. If that segment is then dropped and recreated immediately after the
instance is restarted, the wrap number could be lower than existing wrap
numbers. This will cause the ORA-600[4097] to occur in subsequent
transactions using Rollback.
这个错误也是因为回滚段wrap number未被及时更新导致的异常.

总结这次恢复过程
1.因当前redo丢失,使用隐含参数打开库,然后出现ORA-600[KCLCHKBLK_4](这个错误比较少见,更何况这个库是10.2.0.4)
2.undo出现问题出现ORA-00600[6856]错误不是很多见
3.接下来都是一些列undo导致的问题,其实如果开始就直接使用隐含参数删除掉有问题回滚段,效果可能会比event好.
4.因为部分trace文件没有拿到,未做深入分析,文章列出整体恢复思路
5.本次恢复的数据库版本是10.2.0.4,根据mos描述,很多错误应不会发生,但是实际还是发生了,MOS的版本范围,不要太看重.
6.其实这篇文章的本质不是展示恢复过程,而是再一次提醒:操作数据库慎重,特别是一台机器上多套库.

ORA-00600[kcratr1_lostwrt]/ORA-00600[3020]错误恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-00600[kcratr1_lostwrt]/ORA-00600[3020]错误恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

open数据库alert日志报ORA-00600[kcratr1_lostwrt]错误

Mon May 14 14:57:28 2012
ALTER DATABASE OPEN
Mon May 14 14:57:29 2012
Beginning crash recovery of 1 threads
Mon May 14 14:57:29 2012
Started redo scan
Mon May 14 14:57:29 2012
Errors in file d:\oracle\admin\cqgasold\udump\cqgasold_ora_504.trc:
ORA-00600: 内部错误代码,参数: [kcratr1_lostwrt], [], [], [], [], [], [], []
ORA-600 signalled during: alter database open...

查询相关SCN
同一个查询中SCN相同,省略

SQL> select file#,online_status "STATUS",to_char(change#,'9999999999999999') "SCN",
  2  To_char(time,'yyyy-mm-dd hh24:mi:ss')"TIME" from v$recover_file;
未选定行
SQL> select file#,to_char(checkpoint_change#,'999999999999999') "SCN",
  2  to_char(last_change#,'999999999999999')"STOP_SCN" from v$datafile;
     FILE# SCN              STOP_SCN
---------- ---------------- ----------------
         1       7842987188
         2       7842987188
         3       7842987188
SQL> select file#,to_char(checkpoint_change#,'9999999999999999') "SCN",
  2  to_char(RESETLOGS_CHANGE#,'9999999999999999') "RESETLOGS SCN"
  3  from v$datafile_header;
     FILE# SCN               RESETLOGS SCN
---------- ----------------- -----------------
         1        7842991811                 1
         2        7842991811                 1
         3        7842991811                 1

这里看到奇怪现象datafile scn小于datafile_header scn,数据库异常断电一般来说也不会出现这样的情况,个人猜测是错误的恢复或者使用历史控制文件导致,对于这样的现状,我先尝试着使用using backup controlfile方式恢复,结果失败.估计控制文件有异常,本着先拉起库原则,重建控制文件.

进行完全恢复

SQL> recover database;
ORA-00283: 恢复会话因错误而取消
ORA-00600: 内部错误代码,参数: [3020], [8388617], [1], [23403], [25], [112],[], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 9)
ORA-10564: tablespace UNDOTBS1
ORA-01110: 数据文件 2: 'D:\ORACLE\ORADATA\CQGASOLD\UNDO_1.DBF'
ORA-10560: block type 'KTU SMU HEADER BLOCK'

尝试跳过坏块继续恢复

SQL> recover database allow 1 corruption;
ORA-00283: 恢复会话因错误而取消
ORA-00600: 内部错误代码,参数: [3020], [8388610], [1], [23403], [2264], [16],[], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 2)
ORA-10564: tablespace UNDOTBS1
ORA-01110: 数据文件 2: 'D:\ORACLE\ORADATA\CQGASOLD\UNDO_1.DBF'
ORA-10560: block type 'KTFB Bitmapped File Space Header'

使用dbv检查坏块数量

C:\>dbv file='d:\oracle\oradata\cqgasold\undo_1.dbf' blocksize=8192
DBVERIFY: Release 9.2.0.5.0 - Production on 星期二 5月 15 19:43:42 2012
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.
DBVERIFY - 验证正在开始 : FILE = d:\oracle\oradata\cqgasold\undo_1.dbf
DBV-00200: 块, dba 8388617, 已经标记为崩溃
汇入的页369 - 可能是介质损坏
***
Corrupt block relative dba: 0x00800171 (file 2, block 369)
Fractured block found during dbv:
Data in bad block -
 type: 2 format: 2 rdba: 0x00800171
 last change scn: 0x0001.d37c57db seq: 0x1 flg: 0x04
 consistency value in tail: 0x4e930260
 check value in block header: 0x8202, computed block checksum: 0x4e5f
 spare1: 0x0, spare2: 0x0, spare3: 0x0
***
汇入的页417 - 可能是介质损坏
***
Corrupt block relative dba: 0x008001a1 (file 2, block 417)
Fractured block found during dbv:
Data in bad block -
 type: 2 format: 2 rdba: 0x008001a1
 last change scn: 0x0001.d37c53d4 seq: 0x2 flg: 0x04
 consistency value in tail: 0x4b6b0201
 check value in block header: 0x6ae7, computed block checksum: 0x5abc
 spare1: 0x0, spare2: 0x0, spare3: 0x0
***
…………
--类此记录很多,我放弃了跳过坏块修复的方法

恢复过程中提示坏块数据库文件离线恢复

SQL> alter database datafile 'd:\oracle\oradata\cqgasold\undo_1.dbf' offline;
数据库已更改。
SQL> recover database;
完成介质恢复。
SQL> alter database open;
alter database open
*
ERROR 位于第 1 行:
ORA-00604: 递归 SQL 层 1 出现错误
ORA-00376: 此时无法读取文件 2
ORA-01110: 数据文件 2: 'D:\ORACLE\ORADATA\CQGASOLD\UNDO_1.DBF'

到了这一步,根据经验,数据库被open的可能性很多了,很有可能是open以后因为smon回滚导致数据库down

查看日志,屏蔽回滚段,完成恢复

Tue May 15 19:59:52 2012
alter database open
Tue May 15 19:59:52 2012
Beginning crash recovery of 1 threads
Tue May 15 19:59:52 2012
Started redo scan
Tue May 15 19:59:52 2012
Completed redo scan
 323 redo blocks read, 82 data blocks need recovery
Tue May 15 19:59:52 2012
Started recovery at
 Thread 1: logseq 23404, block 3, scn 0.0
Recovery of Online Redo Log: Thread 1 Group 4 Seq 23404 Reading mem 0
  Mem# 0 errs 0: F:\ORACLE\ORADATA\LOGCQGASOLD4.ORA
Tue May 15 19:59:52 2012
Completed redo application
Tue May 15 19:59:52 2012
Ended recovery at
 Thread 1: logseq 23404, block 326, scn 1.3548264979
 82 data blocks read, 82 data blocks written, 323 redo blocks read
Crash recovery completed successfully
Tue May 15 19:59:53 2012
Thread 1 advanced to log sequence 23405
Thread 1 opened at log sequence 23405
  Current log# 2 seq# 23405 mem# 0: D:\ORACLE\ORADATA\CQGASOLD\REDO02.LOG
Successful open of redo thread 1
Tue May 15 19:59:53 2012
SMON: enabling cache recovery
SMON: enabling tx recovery
Tue May 15 19:59:54 2012
Database Characterset is ZHS16GBK
Tue May 15 19:59:55 2012
replication_dependency_tracking turned off (no async multimaster replication found)
ORA-604 signalled during: alter database open...
Tue May 15 19:59:56 2012
SMON: about to recover undo segment 1
SMON: mark undo segment 1 as needs recovery
SMON: about to recover undo segment 2
SMON: mark undo segment 2 as needs recovery
SMON: about to recover undo segment 3
SMON: mark undo segment 3 as needs recovery
SMON: about to recover undo segment 4
SMON: mark undo segment 4 as needs recovery
SMON: about to recover undo segment 5
SMON: mark undo segment 5 as needs recovery
SMON: about to recover undo segment 6
SMON: mark undo segment 6 as needs recovery
SMON: about to recover undo segment 7
SMON: mark undo segment 7 as needs recovery
SMON: about to recover undo segment 8
SMON: mark undo segment 8 as needs recovery
SMON: about to recover undo segment 9
SMON: mark undo segment 9 as needs recovery
SMON: about to recover undo segment 10
SMON: mark undo segment 10 as needs recovery
Tue May 15 20:00:37 2012
Shutting down instance (abort)

看到这里,可以大概确定是因为undo文件离线,导致回滚段异常.
这个问题,基本上可以确定通过隐含参数屏蔽回滚段,然后open数据库,重建undo删除异常undo,数据库恢复完成。

记录一次比较棘手数据库恢复要点

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:记录一次比较棘手数据库恢复要点

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在最近的一次数据库异常恢复过程中遇到不少问题,把重点记录下
ORA-00704/ORA-01555错误

Fri May  4 21:04:21 2012
select ctime, mtime, stime from obj$ where obj# = :1
Fri May  4 21:04:21 2012
Errors in file /oracle/admin/standdb/udump/perfdb_ora_1286288.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 40 with name "_SYSSMU40$" too small
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 1286288
ORA-1092 signalled during: alter database open resetlogs...
这里的提示可以看出obj$基表中有事务存在,查询这个表的时候,要去找40号回滚段中相关数据;通过非常规方法,
查找到40号回滚段的状态是offliine了(这个查询出来的信息和是否使用隐含参数无关).
问题原因,为什么40号回滚段变得offline?
Fri May  4 17:36:26 2012
alter tablespace undotbs offline
Fri May  4 17:36:26 2012
ORA-1109 signalled during: alter tablespace undotbs offline...
Fri May  4 17:37:29 2012
alter database datafile  '/dev/rundodbs01' offline drop
Fri May  4 17:37:29 2012
Completed: alter database datafile  '/dev/rundodbs01' offline drop
因为强制offline 了file# 2文件导致(一个undo表空间文件)
解决方法:
1.bbed提交事务
因为现在生产的trace文件中未有关于obj$ 未提交事务的记录,做10046也为发现该记录,如果要使用bbed修改该事务,
那需要dump obj$相关的数据块(在mount状态下dump),然后找到相关事务,再修改
2.强制让file# 2 online
因为在resetlogs前file#2 已经offline掉了,所以要使得该文件能够成功online,需要先推进scn

ORA-00600[krhpfh_03-1209]

SQL> recover database until cancel;
ORA-00283: recovery session canceled due to errors
ORA-00600: internal error code, arguments: [krhpfh_03-1209], [2], [782415504],
[782428968], [3987078030], [2379], [0], [0]
ORA-01110: data file 2: '/dev/rundodbs01'
问题原因:
数据库处于非归档模式下,连续三次resetlogs,引起该bug
解决办法:
重建控制文件
但是这里问题出现了,因为file# 2的resetlogs scn和其他数据文件不一致,导致在file# 2 online的前提下,无法重建.
这样就处在了一个循环中(需要online file# 2 又要重建控制文件),这样的问题,可以通过bbed修改file# 2的resetlogs scn完成
或者先让file# 2 offline(没有加drop)掉,重建控制文件(除掉file# 2的文件记录)

ORA-00600[25025]

SMON: enabling cache recovery
Fri May  4 22:36:36 2012
Errors in file /oracle/admin/standdb/udump/perfdb_ora_1167402.trc:
ORA-00600: internal error code, arguments: [25025], [2], [], [], [], [], [], []
Fri May  4 22:36:38 2012
Errors in file /oracle/admin/standdb/udump/perfdb_ora_1167402.trc:
ORA-00600: internal error code, arguments: [25025], [2], [], [], [], [], [], []
Fri May  4 22:36:38 2012
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 1167402
错误原因:
因为有undo文件不在undo对应的表空间中,而我们的file# 2文件确实是undo文件,而且重建控制文件时候未加入进来
解决办法:
  undo_management          = AUTO
  undo_tablespace          = UNDODBS(file# 2属于该表空间)
修改为
  undo_management          = MANUAL
  undo_tablespace          = SYSTEM
或者bbed修改file# 2的header,然后重建控制文件

ORA-00600[4137]

Errors in file /oracle/admin/standdb/bdump/perfdb_smon_1290564.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Fri May  4 23:20:52 2012
create undo tablespace undotbs3 datafile '/dev/rundodbs21' size 20400M
Fri May  4 23:23:47 2012
Errors in file /oracle/admin/standdb/bdump/perfdb_smon_1290564.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Fri May  4 23:23:48 2012
Errors in file /oracle/admin/standdb/bdump/perfdb_pmon_1520126.trc:
ORA-00474: SMON process terminated with error
Fri May  4 23:23:48 2012
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 1520126
错误原因:
_smon_internal_errlimit(limit of SMON internal errors) SMON遇到了内部错误,最大允许100次,
不断计数增长,达到100的时候,数据库smon进程自动down掉,从而导致数据库down
解决办法:
1.临时解决办法:设置_smon_internal_errlimit一个较大值
3.根本解决办法:使用undo隐含参数,删除有问题undo 回滚段和undo表空间或者使用10513 事件

恢复备份控制文件避免resetlogs方式打开数据库

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:恢复备份控制文件避免resetlogs方式打开数据库

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在很多时候,我们需要使用备份控制文件恢复数据库,在恢复完成后,准备打开库,很多人知道这个时候如果要打开这个库,需要使用resetlogs操作,虽然在oracle 10g及其以后版本中在恢复的时候可以跨越resetlogs操作,但是很多时候大家还是希望使用备份的控制文件能够正常的open一个库,而不是resetlogs.这里通过实验展示使用备份控制文件正常open库的过程,整体思路是:先使用备份控制文件正常恢复数据库,然后重建该控制文件,继而可以正常open库

SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
PL/SQL Release 9.2.0.4.0 - Production
CORE    9.2.0.3.0       Production
TNS for Linux: Version 9.2.0.4.0 - Production
NLSRTL Version 9.2.0.4.0 - Production
SQL> alter database backup controlfile to '/tmp/controlfile.bak';
Database altered.
SQL> alter system switch logfile;
System altered.
SQL> /
System altered.
SQL> /
System altered.
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
--替换备份的控制文件
SQL> startup mount;
ORACLE instance started.
Total System Global Area  353441008 bytes
Fixed Size                   451824 bytes
Variable Size             184549376 bytes
Database Buffers          167772160 bytes
Redo Buffers                 667648 bytes
Database mounted.
SQL> recover database using backup controlfile;
ORA-00279: change 12286827844770 generated at 04/12/2012 00:21:54 needed for
thread 1
ORA-00289: suggestion : /u01/oracle/oradata/xifenfei/archive/1_4.dbf
ORA-00280: change 12286827844770 for thread 1 is in sequence #4
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
auto
ORA-00279: change 12286827844772 generated at 04/12/2012 00:21:55 needed for
thread 1
ORA-00289: suggestion : /u01/oracle/oradata/xifenfei/archive/1_5.dbf
ORA-00280: change 12286827844772 for thread 1 is in sequence #5
ORA-00278: log file '/u01/oracle/oradata/xifenfei/archive/1_4.dbf' no longer
needed for this recovery
ORA-00279: change 12286827844776 generated at 04/12/2012 00:21:58 needed for
thread 1
ORA-00289: suggestion : /u01/oracle/oradata/xifenfei/archive/1_6.dbf
ORA-00280: change 12286827844776 for thread 1 is in sequence #6
ORA-00278: log file '/u01/oracle/oradata/xifenfei/archive/1_5.dbf' no longer
needed for this recovery
ORA-00308: cannot open archived log
'/u01/oracle/oradata/xifenfei/archive/1_6.dbf'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
SQL>  recover database using backup controlfile;
ORA-00279: change 12286827844776 generated at 04/12/2012 00:21:58 needed for
thread 1
ORA-00289: suggestion : /u01/oracle/oradata/xifenfei/archive/1_6.dbf
ORA-00280: change 12286827844776 for thread 1 is in sequence #6
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/u01/oracle/oradata/xifenfei/redo02.log
ORA-00310: archived log contains sequence 3; sequence 6 required
ORA-00334: archived log: '/u01/oracle/oradata/xifenfei/redo02.log'
SQL>  recover database using backup controlfile;
ORA-00279: change 12286827844776 generated at 04/12/2012 00:21:58 needed for
thread 1
ORA-00289: suggestion : /u01/oracle/oradata/xifenfei/archive/1_6.dbf
ORA-00280: change 12286827844776 for thread 1 is in sequence #6
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/u01/oracle/oradata/xifenfei/redo03.log
Log applied.
Media recovery complete.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01589: must use RESETLOGS or NORESETLOGS option for database open
--提示需要resetlogs
SQL> alter database backup controlfile to trace as '/tmp/1.txt';
Database altered.
SQL> shutdown immediate;
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
SQL> STARTUP NOMOUNT
Total System Global Area  353441008 bytes
Fixed Size                   451824 bytes
Variable Size             184549376 bytes
Database Buffers          167772160 bytes
Redo Buffers                 667648 bytes
--重建控制文件
--自动启动到mount状态
--数据库直接open成功
SQL> alter database open;
Database altered.

ASM DISK HEADER 备份与恢复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ASM DISK HEADER 备份与恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最近遇到几次ASM DATA HEADER出问题导致DISK GROUP 不能正常的MOUNT,是的数据库不能正常工作,从来带来了无穷的麻烦,这个时候心想,如果我做了ASM DATA HEADER的备份该多好啊,可惜世上没有后悔药,建议大家检查下自己的ASM库,ASM HEADER是否已经做了备份,如果没有请及时处理下.这里试验提供了dd和kfed备份和恢复ASM HEADER
查询ASM相关信息

SQL> select group_number,name,state,total_mb,free_mb from v$asm_diskgroup;
GROUP_NUMBER NAME                           STATE        TOTAL_MB    FREE_MB
------------ ------------------------------ ---------- ---------- ----------
           1 DATA                           MOUNTED          2046       1314
           2 XIFENFEI                       MOUNTED          6141       4378
SQL> select group_number,DISK_NUMBER,PATH,STATE from v$asm_disk;
GROUP_NUMBER DISK_NUMBER PATH                           STATE
------------ ----------- ------------------------------ ----------------
           1           1 /dev/oracleasm/disks/VOL2      NORMAL
           1           0 /dev/oracleasm/disks/VOL1      NORMAL
           2           1 /dev/oracleasm/disks/VOL4      NORMAL
           2           0 /dev/oracleasm/disks/VOL3      NORMAL

找出ASM DISK对应磁盘分区

[grid@rac1 ~]$ /etc/init.d/oracleasm  querydisk -d VOL1
Disk "VOL1" is a valid ASM disk on device [8,33]
[grid@rac1 ~]$ ll /dev |grep 8|grep 33
brw-r----- 1 root disk     8,  33 Apr 17 11:37 sdc1
[grid@rac1 ~]$ /etc/init.d/oracleasm  querydisk -d VOL2
Disk "VOL2" is a valid ASM disk on device [8,34]
[grid@rac1 ~]$ ll /dev |grep 8|grep 34
brw-r----- 1 root disk     8,  34 Apr 17 11:37 sdc2
[grid@rac1 ~]$ /etc/init.d/oracleasm  querydisk -d VOL3
Disk "VOL3" is a valid ASM disk on device [8,17]
[grid@rac1 ~]$ ll /dev |grep 8,|grep 17
brw-r----- 1 root disk     8,  17 Apr 17 11:37 sdb1
[grid@rac1 ~]$ /etc/init.d/oracleasm  querydisk -d VOL4
Disk "VOL4" is a valid ASM disk on device [8,18]
[grid@rac1 ~]$ ll /dev |grep 8,|grep 18
brw-r----- 1 root disk     8,  18 Apr 17 11:37 sdb2

备份ASM HEADER信息(备份XIFENFEI磁盘组)

--dd方式备份
[root@rac1 backup_asmheader]# dd if=/dev/sdb1 of=vol3header.dd bs=4096 count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000143581 seconds, 28.5 MB/s
[root@rac1 backup_asmheader]# dd if=/dev/sdb2 of=vol4header.dd bs=4096 count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000147727 seconds, 27.7 MB/s
--kfed方式备份
[grid@rac1 backup_asmheader]$ kfed read /dev/oracleasm/disks/VOL3 text=vol3header.txt
[grid@rac1 backup_asmheader]$ more vol3header.txt
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                   198826195 ; 0x00c: 0x0bd9d8d3
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:     ORCLDISKVOL3 ; 0x000: length=12
kfdhdb.driver.reserved[0]:    860639062 ; 0x008: 0x334c4f56
kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                186646528 ; 0x020: 0x0b200000
kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:           XIFENFEI_0000 ; 0x028: length=13
kfdhdb.grpname:                XIFENFEI ; 0x048: length=8
kfdhdb.fgname:            XIFENFEI_0000 ; 0x068: length=13
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             32967790 ; 0x0a8: HOUR=0xe DAYS=0x3 MNTH=0x3 YEAR=0x7dc
kfdhdb.crestmp.lo:           2015933440 ; 0x0ac: USEC=0x0 MSEC=0x22d SECS=0x2 MINS=0x1e
kfdhdb.mntstmp.hi:             32969259 ; 0x0b0: HOUR=0xb DAYS=0x11 MNTH=0x4 YEAR=0x7dc
kfdhdb.mntstmp.lo:           2707277824 ; 0x0b4: USEC=0x0 MSEC=0x372 SECS=0x15 MINS=0x28
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80
kfdhdb.dsksize:                    2353 ; 0x0c4: 0x00000931
kfdhdb.pmcnt:                         2 ; 0x0c8: 0x00000002
kfdhdb.fstlocn:                       1 ; 0x0cc: 0x00000001
kfdhdb.altlocn:                       2 ; 0x0d0: 0x00000002
kfdhdb.f1b1locn:                      2 ; 0x0d4: 0x00000002
kfdhdb.redomirrors[0]:                0 ; 0x0d8: 0x0000
kfdhdb.redomirrors[1]:                0 ; 0x0da: 0x0000
kfdhdb.redomirrors[2]:                0 ; 0x0dc: 0x0000
kfdhdb.redomirrors[3]:                0 ; 0x0de: 0x0000
kfdhdb.dbcompat:              168820736 ; 0x0e0: 0x0a100000
kfdhdb.grpstmp.hi:             32967790 ; 0x0e4: HOUR=0xe DAYS=0x3 MNTH=0x3 YEAR=0x7dc
kfdhdb.grpstmp.lo:           2015746048 ; 0x0e8: USEC=0x0 MSEC=0x176 SECS=0x2 MINS=0x1e
kfdhdb.vfstart:                       0 ; 0x0ec: 0x00000000
kfdhdb.vfend:                         0 ; 0x0f0: 0x00000000
kfdhdb.spfile:                        0 ; 0x0f4: 0x00000000
kfdhdb.spfflg:                        0 ; 0x0f8: 0x00000000
kfdhdb.ub4spare[0]:                   0 ; 0x0fc: 0x00000000
kfdhdb.ub4spare[1]:                   0 ; 0x100: 0x00000000
kfdhdb.ub4spare[2]:                   0 ; 0x104: 0x00000000
kfdhdb.ub4spare[3]:                   0 ; 0x108: 0x00000000
kfdhdb.ub4spare[4]:                   0 ; 0x10c: 0x00000000
kfdhdb.ub4spare[5]:                   0 ; 0x110: 0x00000000
kfdhdb.ub4spare[6]:                   0 ; 0x114: 0x00000000
kfdhdb.ub4spare[7]:                   0 ; 0x118: 0x00000000
kfdhdb.ub4spare[8]:                   0 ; 0x11c: 0x00000000
kfdhdb.ub4spare[9]:                   0 ; 0x120: 0x00000000
kfdhdb.ub4spare[10]:                  0 ; 0x124: 0x00000000
kfdhdb.ub4spare[11]:                  0 ; 0x128: 0x00000000
kfdhdb.ub4spare[12]:                  0 ; 0x12c: 0x00000000
kfdhdb.ub4spare[13]:                  0 ; 0x130: 0x00000000
kfdhdb.ub4spare[14]:                  0 ; 0x134: 0x00000000
kfdhdb.ub4spare[15]:                  0 ; 0x138: 0x00000000
kfdhdb.ub4spare[16]:                  0 ; 0x13c: 0x00000000
kfdhdb.ub4spare[17]:                  0 ; 0x140: 0x00000000
kfdhdb.ub4spare[18]:                  0 ; 0x144: 0x00000000
kfdhdb.ub4spare[19]:                  0 ; 0x148: 0x00000000
kfdhdb.ub4spare[20]:                  0 ; 0x14c: 0x00000000
kfdhdb.ub4spare[21]:                  0 ; 0x150: 0x00000000
kfdhdb.ub4spare[22]:                  0 ; 0x154: 0x00000000
kfdhdb.ub4spare[23]:                  0 ; 0x158: 0x00000000
kfdhdb.ub4spare[24]:                  0 ; 0x15c: 0x00000000
kfdhdb.ub4spare[25]:                  0 ; 0x160: 0x00000000
kfdhdb.ub4spare[26]:                  0 ; 0x164: 0x00000000
kfdhdb.ub4spare[27]:                  0 ; 0x168: 0x00000000
kfdhdb.ub4spare[28]:                  0 ; 0x16c: 0x00000000
kfdhdb.ub4spare[29]:                  0 ; 0x170: 0x00000000
kfdhdb.ub4spare[30]:                  0 ; 0x174: 0x00000000
kfdhdb.ub4spare[31]:                  0 ; 0x178: 0x00000000
kfdhdb.ub4spare[32]:                  0 ; 0x17c: 0x00000000
kfdhdb.ub4spare[33]:                  0 ; 0x180: 0x00000000
kfdhdb.ub4spare[34]:                  0 ; 0x184: 0x00000000
kfdhdb.ub4spare[35]:                  0 ; 0x188: 0x00000000
kfdhdb.ub4spare[36]:                  0 ; 0x18c: 0x00000000
kfdhdb.ub4spare[37]:                  0 ; 0x190: 0x00000000
kfdhdb.ub4spare[38]:                  0 ; 0x194: 0x00000000
kfdhdb.ub4spare[39]:                  0 ; 0x198: 0x00000000
kfdhdb.ub4spare[40]:                  0 ; 0x19c: 0x00000000
kfdhdb.ub4spare[41]:                  0 ; 0x1a0: 0x00000000
kfdhdb.ub4spare[42]:                  0 ; 0x1a4: 0x00000000
kfdhdb.ub4spare[43]:                  0 ; 0x1a8: 0x00000000
kfdhdb.ub4spare[44]:                  0 ; 0x1ac: 0x00000000
kfdhdb.ub4spare[45]:                  0 ; 0x1b0: 0x00000000
kfdhdb.ub4spare[46]:                  0 ; 0x1b4: 0x00000000
kfdhdb.ub4spare[47]:                  0 ; 0x1b8: 0x00000000
kfdhdb.ub4spare[48]:                  0 ; 0x1bc: 0x00000000
kfdhdb.ub4spare[49]:                  0 ; 0x1c0: 0x00000000
kfdhdb.ub4spare[50]:                  0 ; 0x1c4: 0x00000000
kfdhdb.ub4spare[51]:                  0 ; 0x1c8: 0x00000000
kfdhdb.ub4spare[52]:                  0 ; 0x1cc: 0x00000000
kfdhdb.ub4spare[53]:                  0 ; 0x1d0: 0x00000000
kfdhdb.acdb.aba.seq:                  0 ; 0x1d4: 0x00000000
kfdhdb.acdb.aba.blk:                  0 ; 0x1d8: 0x00000000
kfdhdb.acdb.ents:                     0 ; 0x1dc: 0x0000
kfdhdb.acdb.ub2spare:                 0 ; 0x1de: 0x0000
[grid@rac1 backup_asmheader]$ kfed read /dev/oracleasm/disks/VOL4 text=vol4header.txt

破坏VOL3 ASM磁盘

[root@rac1 backup_asmheader]# dd if=/dev/zero of=/dev/sdb1 bs=4096 count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.00165264 seconds, 2.5 MB/s
[grid@rac1 backup_asmheader]$ kfed read /dev/oracleasm/disks/VOL3
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
B4BFE200 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

启动ASM

[grid@rac1 backup_asmheader]$ sqlplus / as sysoper
SQL*Plus: Release 11.2.0.3.0 Production on Tue Apr 17 12:29:42 2012
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup
ASM instance started
ORA-15032: not all alterations performed
ORA-15017: diskgroup "XIFENFEI" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup
"XIFENFEI"

恢复ASM DATE HEADER

--dd 恢复
[root@rac1 backup_asmheader]# dd if=vol3header.dd of=/dev/sdb1 bs=4096 count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.00197913 seconds, 2.1 MB/s
--kfed 恢复
[grid@rac1 backup_asmheader]$ kfed merge /dev/oracleasm/disks/VOL3 text=vol3header.txt

MOUNT DISKGROUP

SQL> alter diskgroup xifenfei mount;
Diskgroup altered.