案例背景:在我接手这个库之前,因为某种原因sysaux表空间的数据文件离线,该库非归档模式,无备份
一、sysaux数据文件离线原因
Mon Jun 7 03:03:22 2010
KCF: write/open error block=0x67009 online=1
file=3 /opt/app/oracle/oradata/BAS/sysaux01.dbf
error=27072 txt: 'Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 421897
Additional information: -1'
Automatic datafile offline due to write error on
file 3: /opt/app/oracle/oradata/BAS/sysaux01.dbf
因为该数据库是非归档模式,估计以前的dba也是一段时间后发现sysaux被离线,因为不是归档模式,无法恢复,就一直放置着,让库处于这样的状态中。
二、sysaux数据文件online
1、使用bbed修改sysaux数据文件的scn,见:bbed 修改datafile header
2、尝试online过程日志如下
Sat Dec 17 19:33:36 2011
ORACLE Instance BAS (pid = 17) - Error 376 encountered while recovering transaction (70, 41) on object 8964.
Sat Dec 17 19:33:36 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc:
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/opt/app/oracle/oradata/BAS/sysaux01.dbf'
Sat Dec 17 19:33:37 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 4571)
ORA-01110: data file 1: '/opt/app/oracle/oradata/BAS/system01.dbf'
Sat Dec 17 19:38:38 2011
ORACLE Instance BAS (pid = 17) - Error 376 encountered while recovering transaction (70, 41) on object 8964.
Sat Dec 17 19:38:38 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc:
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/opt/app/oracle/oradata/BAS/sysaux01.dbf'
Sat Dec 17 19:38:38 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 4571)
ORA-01110: data file 1: '/opt/app/oracle/oradata/BAS/system01.dbf'
Sat Dec 17 19:39:47 2011
ALTER DATABASE RECOVER datafile 3
Sat Dec 17 19:39:47 2011
Media Recovery Start
parallel recovery started with 7 processes
Sat Dec 17 19:39:47 2011
Recovery of Online Redo Log: Thread 1 Group 6 Seq 13545 Reading mem 0
Mem# 0 errs 0: /opt/app/oracle/oradata/BAS/redo0602.log
Mem# 1 errs 0: /opt/app/oracle/oradata/BAS/redo0601.log
Sat Dec 17 19:39:47 2011
Recovery of Online Redo Log: Thread 1 Group 7 Seq 13546 Reading mem 0
Mem# 0 errs 0: /opt/app/oracle/oradata/BAS/redo0702.log
Mem# 1 errs 0: /opt/app/oracle/oradata/BAS/redo0701.log
Sat Dec 17 19:39:47 2011
Media Recovery Complete (BAS)
Completed: ALTER DATABASE RECOVER datafile 3
Sat Dec 17 19:39:58 2011
alter database datafile 3 online
Sat Dec 17 19:39:58 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc:
ORA-01157: cannot identify/lock data file 3 - see DBWR trace file
ORA-01110: data file 3: '/opt/app/oracle/oradata/BAS/sysaux01.dbf'
Sat Dec 17 19:39:58 2011
Completed: alter database datafile 3 online
这个过程虽然在sqlplus中提示online成功,但是alert中的错误警告,以及smon进程占用100%的cup资源,最终导致数据库hang住。
2、分析alert日志和trace文件
alert日志中
Sat Dec 17 19:38:38 2011
ORACLE Instance BAS (pid = 17) - Error 376 encountered while recovering transaction (70, 41) on object 8964.
bas_smon_27197.trc中
[oracle@bas bdump]$ more /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc
/opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /opt/app/oracle/product/10.2.0/db_1
System name: Linux
Node name: bas
Release: 2.6.9-78.ELsmp
Version: #1 SMP Wed Jul 9 15:46:26 EDT 2008
Machine: x86_64
Instance name: BAS
Redo thread mounted by this instance: 1
Oracle process number: 17
Unix process pid: 27197, image: oracle@bas (SMON)
*** SERVICE NAME:() 2011-12-17 19:23:33.179
*** SESSION ID:(5490.1) 2011-12-17 19:23:33.179
SMON: about to recover undo segment 70
ORACLE Instance BAS (pid = 17) - Error 376 encountered while recovering transaction (70, 41) on object 8964.
*** 2011-12-17 19:23:33.188
ksedmp: internal or fatal error
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/opt/app/oracle/oradata/BAS/sysaux01.dbf'
通过这些证明smon在利用undo segment 70在回滚sysaux中的内容,但是因为某种原因该回滚段异常,不能进行回滚,是的smon一直尝试回滚,但是始终不成功,最后数据库hang住,需要解决sysaux的问题,首先需要解决这个回滚段问题(删除异常回滚段)
3、删除异常回滚段,online datafile 3
强制kill掉smon进程,重启数据库
[oracle@bas bdump]$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.1.0 - Production on Sat Dec 17 19:52:14 2011
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup mount
ORACLE instance started.
Total System Global Area 8589934592 bytes
Fixed Size 2034520 bytes
Variable Size 1124074664 bytes
Database Buffers 7398752256 bytes
Redo Buffers 65073152 bytes
Database mounted.
--为了数据库不hang掉,先offline datafile 3
SQL> alter database datafile 3 offline;
Database altered.
SQL> select segment_name,status from dba_rollback_segs;
select segment_name,status from dba_rollback_segs
*
ERROR at line 1:
ORA-01219: database not open: queries allowed on fixed tables/views only
SQL> alter database open;
Database altered.
SQL> select segment_name,status from dba_rollback_segs where status='NEEDS RECOVERY';
SEGMENT_NAME STATUS
------------------------------ ----------------
_SYSSMU70$ NEEDS RECOVERY
SQL> create pfile='/tmp/pfile' from spfile;
File created.
关闭数据库,在pfile中增加
*._corrupted_rollback_segments=(_SYSSMU70$)
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup pfile='/tmp/pfile'
ORACLE instance started.
Total System Global Area 8589934592 bytes
Fixed Size 2034520 bytes
Variable Size 1124074664 bytes
Database Buffers 7398752256 bytes
Redo Buffers 65073152 bytes
Database mounted.
Database opened.
SQL> drop rollback segment "_SYSSMU70$";
Rollback segment dropped.
SQL> alter database datafile 3 online;
alter database datafile 3 online
*
ERROR at line 1:
ORA-01113: file 3 needs media recovery
ORA-01110: data file 3: '/opt/app/oracle/oradata/BAS/sysaux01.dbf'
SQL> recover datafile 3 ;
Media recovery complete.
SQL> alter database datafile 3 online;
Database altered.
三、解决坏块问题
1、alert日志中坏块记录
Sat Dec 17 20:33:31 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27772.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 4571)
ORA-01110: data file 1: '/opt/app/oracle/oradata/BAS/system01.dbf'
Sat Dec 17 20:33:54 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28027.trc:
Sat Dec 17 20:43:32 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27772.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 4571)
ORA-01110: data file 1: '/opt/app/oracle/oradata/BAS/system01.dbf'
2、查询坏块对象
SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
2 FROM DBA_EXTENTS A
WHERE FILE_ID = &FILE_ID
AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1; 3 4
Enter value for file_id: 1
old 3: WHERE FILE_ID = &FILE_ID
new 3: WHERE FILE_ID = 1
Enter value for block_id: 4571
old 4: AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
new 4: AND 4571 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
OWNER
------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE TABLESPACE_NAME PARTITION_NAME
------------------ ------------------------------ ------------------------------
SYS
SMON_SCN_TIME_SCN_IDX
INDEX SYSTEM
SQL> alter index SMON_SCN_TIME_SCN_IDX rebulid online;
alter index SMON_SCN_TIME_SCN_IDX rebulid online
*
ERROR at line 1:
ORA-02243: invalid ALTER INDEX or ALTER MATERIALIZED VIEW option
SQL> select table_name from dba_indexes where index_name='SMON_SCN_TIME_SCN_IDX';
TABLE_NAME
------------------------------
SMON_SCN_TIME
3、解决坏块问题
SQL> truncate table SMON_SCN_TIME;
truncate table SMON_SCN_TIME
*
ERROR at line 1:
ORA-03292: Table to be truncated is part of a cluster
SQL> truncate cluster SMON_SCN_TIME;
truncate cluster SMON_SCN_TIME
*
ERROR at line 1:
ORA-00943: cluster does not exist
SQL> SELECT dbms_metadata.get_ddl('TABLE','SMON_SCN_TIME','SYS') FROM dual;
DBMS_METADATA.GET_DDL('TABLE','SMON_SCN_TIME','SYS')
--------------------------------------------------------------------------------
CREATE TABLE "SYS"."SMON_SCN_TIME"
( "THREAD" NUMBER,
"TIME_MP" NUMBER,
"TIME_DP" DATE,
"SCN_WRP" NUMBER,
"SCN_BAS" NUMBER,
"NUM_MAPPINGS" NUMBER,
"TIM_SCN_MAP" RAW(1200),
"SCN" NUMBER DEFAULT 0,
"ORIG_THREAD" NUMBER DEFAULT 0 /* for downgrade */
) CLUSTER "SYS"."SMON_SCN_TO_TIME" ("THREAD")
SQL> truncate cluster smon_scn_to_time;
Cluster truncated.
SQL> alter system flush buffer_cache;
System altered.
四、解决AUTO_SPACE_ADVISOR_JOB引起bug
Sat Dec 17 21:00:38 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_1], [0], [0], [1], [], [], [], []
Sat Dec 17 21:00:41 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_1], [0], [0], [1], [], [], [], []
Sat Dec 17 21:00:44 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_3], [0], [0], [1], [], [], [], []
Sat Dec 17 21:00:47 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_1], [0], [0], [1], [], [], [], []
Sat Dec 17 21:00:50 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_1], [0], [0], [1], [], [], [], []
Sat Dec 17 21:00:54 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_3], [0], [0], [1], [], [], [], []
Sat Dec 17 21:00:57 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_3], [0], [0], [1], [], [], [], []
Sat Dec 17 21:01:00 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_3], [0], [0], [1], [], [], [], []
Sat Dec 17 21:01:03 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_3], [0], [0], [1], [], [], [], []
查看MOS发现(430223.1,785899.1)Segment advisor带来的buffer坏块,可以禁用AUTO_SPACE_ADVISOR_JOB并清空buffer来解决,最终解决办法,升级数据库
SQL> exec dbms_scheduler.disable('AUTO_SPACE_ADVISOR_JOB');
PL/SQL procedure successfully completed.
SQL> alter system flush buffer_cache;
System altered.
至此这次数据库sysaux数据文件异常恢复完全结束。再次提醒各位,数据库一定要做好备份和归档工作。