在Oracle职业生涯中,恢复过生产环境数据库也有几百个.对于Oracle恢复我还是相当的自信,今天因为自己的一时过于自信,对于环境错了错误的判断,简单问题复杂化,差点变成悲剧
数据库最初故障
Thu Sep 25 09:27:26 2014
MMON started with pid=15, OS id=1968
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
starting up 1 shared server(s) ...
ORACLE_BASE from environment = F:\oracle
Thu Sep 25 09:27:26 2014
ALTER DATABASE MOUNT
Thu Sep 25 09:27:26 2014
MMNL started with pid=16, OS id=5976
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_4624.trc:
ORA-00202: ????: ''F:\ORACLE\ORADATA\ORCL\CONTROL01.CTL
ORA-27070: ????/????
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 23) 数据错误(循环冗余检查)。
Thu Sep 25 09:28:31 2014
ORA-204 signalled during: ALTER DATABASE MOUNT...
因为硬件或者系统层面问题,导致控制文件无法正常访问
重建控制文件
Fri Sep 26 12:28:44 2014
Successful mount of redo thread 1, with mount id 1387065723
Completed: CREATE CONTROLFILE REUSE DATABASE "orcl" RESETLOGS ARCHIVELOG
MAXLOGFILES 5
MAXLOGMEMBERS 3
MAXDATAFILES 100
MAXINSTANCES 2
MAXLOGHISTORY 226
LOGFILE
GROUP 1 'F:\oracle\oradata\orcl\REDO01.LOG' SIZE 50M, --redo log ????
GROUP 2 'F:\oracle\oradata\orcl\REDO02.LOG' SIZE 50M, --redo log ????
GROUP 3 'F:\oracle\oradata\orcl\REDO03.LOG' SIZE 50M --redo log ????
-- STANDBY LOGFILE
DATAFILE
'F:\oracle\oradata\orcl\SYSAUX01.DBF', --sysaux???????
'F:\oracle\oradata\orcl\SYSTEM01.DBF',
'F:\oracle\oradata\orcl\USERS01.DBF', --user????????
'F:\oracle\oradata\orcl\UNDOTBS01.DBF' --undo???????
CHARACTER SET ZHS16GBK
Fri Sep 26 12:29:55 2014
alter database open resetlogs
ORA-1194 signalled during: alter database open resetlogs...
埋下了雷,创建控制文件中未全部列举出来所有数据文件
进行不完全恢复,尝试resetlogs库发现redo异常
Fri Sep 26 14:13:24 2014
ALTER DATABASE MOUNT
Fri Sep 26 14:13:24 2014
MMNL started with pid=16, OS id=9024
Successful mount of redo thread 1, with mount id 1387037444
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE MOUNT
Fri Sep 26 14:14:08 2014
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
Fri Sep 26 14:15:16 2014
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_3720.trc:
ORA-00333: 重做日志读取块 2049 计数 6143 出错
ORA-00312: 联机日志 1 线程 1: 'F:\ORACLE\ORADATA\ORCL\REDO01.LOG'
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 23) 数据错误(循环冗余检查)。
Fri Sep 26 14:16:24 2014
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_3720.trc:
ORA-00333: 重做日志读取块 1 计数 8191 出错
ORA-00312: 联机日志 1 线程 1: 'F:\ORACLE\ORADATA\ORCL\REDO01.LOG'
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 23) 数据错误(循环冗余检查)。
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_3720.trc:
ORA-00333: 重做日志读取块 1 计数 8191 出错
ARCH: All Archive destinations made inactive due to error 333
使用隐含参数尝试拉库,报ORA-600[2662]
Fri Sep 26 14:16:45 2014
SMON: enabling cache recovery
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_3720.trc (incident=57761):
ORA-00600: 内部错误代码, 参数: [2662], [0], [38221304], [0], [38352371], [4194545], [], [], [], [], [], []
Incident details in: f:\oracle\diag\rdbms\orcl\orcl\incident\incdir_57761\orcl_ora_3720_i57761.trc
Fri Sep 26 14:16:45 2014
ARC3 started with pid=23, OS id=9692
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_3720.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [2662], [0], [38221304], [0], [38352371], [4194545], [], [], [], [], [], []
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_3720.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [2662], [0], [38221304], [0], [38352371], [4194545], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER (ospid: 3720): terminating the instance due to error 704
Instance terminated by USER, pid = 3720
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (3720) as a result of ORA-1092
数据库在未使用所有数据文件的情况下,进行了resetlogs操作,悲剧的本质已经注定,我的失误是没有评估好现状,还继续在错误的道路上越走越远.
我开始接手该库现况
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE MOUNT
Fri Sep 26 14:18:55 2014
alter database open
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_8968.trc:
ORA-01113: 文件 1 需要介质恢复
ORA-01110: 数据文件 1: 'F:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF'
ORA-1113 signalled during: alter database open...
Fri Sep 26 14:19:31 2014
alter database open
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_8968.trc:
ORA-01113: 文件 1 需要介质恢复
ORA-01110: 数据文件 1: 'F:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF'
ORA-1113 signalled during: alter database open ...
Fri Sep 26 14:22:26 2014
ALTER DATABASE RECOVER database
Media Recovery Start
started logmerger process
Fri Sep 26 14:22:26 2014
Media Recovery failed with error 16433
Recovery Slave PR00 previously exited with exception 283
ORA-283 signalled during: ALTER DATABASE RECOVER database ...
Fri Sep 26 14:24:25 2014
ALTER DATABASE RECOVER datafile 'F:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF'
Media Recovery Start
Media Recovery failed with error 16433
ORA-283 signalled during: ALTER DATABASE RECOVER datafile 'F:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF' ...
Fri Sep 26 14:28:47 2014
alter database open read write
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_8968.trc:
ORA-01113: 文件 1 需要介质恢复
ORA-01110: 数据文件 1: 'F:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF'
ORA-1113 signalled during: alter database open read write...
Fri Sep 26 14:31:48 2014
ALTER DATABASE RECOVER datafile 'F:\oracle\oradata\orcl\SYSTEM01.DBF'
Media Recovery Start
Media Recovery failed with error 16433
ORA-283 signalled during: ALTER DATABASE RECOVER datafile 'F:\oracle\oradata\orcl\SYSTEM01.DBF' ...
提示ORA-01110: 数据文件 1需要恢复,尝试recover操作
尝试recover操作
连接到:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> recover database ;
ORA-00283: recovery session canceled due to errors
ORA-16433: The database must be opened in read/write mode.
SQL> alter database backup controlfile to trace as 'd:\ctl.txt';
alter database backup controlfile to trace as 'd:\ctl.txt'
*
第 1 行出现错误:
ORA-16433: 必须以读/写模式打开数据库。
SQL> recover database using backup controlfile;
ORA-00283: recovery session canceled due to errors
ORA-16433: The database must be opened in read/write mode.
重建控制文件
SQL> shutdown immediate;
ORA-01109: 数据库未打开
已经卸载数据库。
ORACLE 例程已经关闭。
SQL> STARTUP NOMOUNT
ORACLE 例程已经启动。
Total System Global Area 970895360 bytes
Fixed Size 1375452 bytes
Variable Size 603980580 bytes
Database Buffers 360710144 bytes
Redo Buffers 4829184 bytes
SQL> CREATE CONTROLFILE REUSE DATABASE orcl NORESETLOGS FORCE LOGGING ARCHIVELOG
2 MAXLOGFILES 16
3 MAXLOGMEMBERS 3
4 MAXDATAFILES 100
5 MAXINSTANCES 8
6 MAXLOGHISTORY 2921
7 LOGFILE
8 GROUP 1 'F:\ORACLE\ORADATA\ORCL\REDO01.LOG' SIZE 50M,
9 GROUP 2 'F:\ORACLE\ORADATA\ORCL\REDO02.LOG' SIZE 50M,
10 GROUP 3 'F:\ORACLE\ORADATA\ORCL\REDO03.LOG' SIZE 50M
11 DATAFILE
12 'F:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF',
13 'F:\ORACLE\ORADATA\ORCL\SYSAUX01.DBF',
14 'F:\ORACLE\ORADATA\ORCL\UNDOTBS01.DBF',
15 'F:\ORACLE\ORADATA\ORCL\USERS01.DBF'
16 CHARACTER SET ZHS16GBK
17 ;
控制文件已创建。
这一步严重发错,在恢复前未认真看alert日志,太依赖v$datafile查询出来结果,导致重建控制文件丢失数据文件,埋下大雷。根据前面alert日志报错ORA-600 2662,决定一并处理该问题,然后进行恢复
SQL> shutdown immediate;
ORA-01109: ??????
已经卸载数据库。
ORACLE 例程已经关闭。
SQL> startup pfile='d:\pfile.txt' mount;
ORACLE 例程已经启动。
Total System Global Area 970895360 bytes
Fixed Size 1375452 bytes
Variable Size 603980580 bytes
Database Buffers 360710144 bytes
Redo Buffers 4829184 bytes
数据库装载完毕。
SQL> recover database;
完成介质恢复。
SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [4194], [], [
数据库报ORA-600 4194,直接修改undo_management=manual,然后尝试启动数据库
SQL> conn / as sysdba
已连接到空闲例程。
SQL> startup pfile='d:\pfile.txt'
ORACLE 例程已经启动。
Total System Global Area 970895360 bytes
Fixed Size 1375452 bytes
Variable Size 603980580 bytes
Database Buffers 360710144 bytes
Redo Buffers 4829184 bytes
数据库装载完毕。
数据库已经打开。
SQL> select name from v$datafile;
NAME
--------------------------------------------------------------------------------
F:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF
F:\ORACLE\ORADATA\ORCL\SYSAUX01.DBF
F:\ORACLE\ORADATA\ORCL\UNDOTBS01.DBF
F:\ORACLE\ORADATA\ORCL\USERS01.DBF
F:\ORACLE\PRODUCT\11.2.0\DBHOME_1\DATABASE\MISSING00005
F:\ORACLE\PRODUCT\11.2.0\DBHOME_1\DATABASE\MISSING00006
已选择6行。
SQL> alter database rename file 'F:\ORACLE\PRODUCT\11.2.0\DBHOME_1\DATABASE\MISSING00005'
2 to 'F:\oracle\oradata\SOURCE_DATA1.DBF';
数据库已更改。
SQL> alter database rename file 'F:\ORACLE\PRODUCT\11.2.0\DBHOME_1\DATABASE\MISSING00006'
2 to 'F:\oracle\oradata\SOURCE_idx1.DBF';
数据库已更改。
SQL> shutdown immediate;
数据库已经关闭。
已经卸载数据库。
ORACLE 例程已经关闭。
SQL> startup mount pfile='d:\pfile.txt'
ORACLE 例程已经启动。
Total System Global Area 970895360 bytes
Fixed Size 1375452 bytes
Variable Size 603980580 bytes
Database Buffers 360710144 bytes
Redo Buffers 4829184 bytes
数据库装载完毕。
SQL> alter datafile 5 online;
alter datafile 5 online
*
第 1 行出现错误:
ORA-00940: 无效的 ALTER 命令
SQL> alter database datafile 5 online;
数据库已更改。
SQL> alter database datafile 6 online;
数据库已更改。
SQL> recover database until cancel;
ORA-00283: recovery session canceled due to errors
ORA-19909: datafile 5 belongs to an orphan incarnation
ORA-01110: data file 5: 'F:\ORACLE\ORADATA\SOURCE_DATA1.DBF'
SQL> alter database open resetlogs;
alter database open resetlogs
*
第 1 行出现错误:
ORA-01139: RESETLOGS 选项仅在不完全数据库恢复后有效
SQL> alter database datafile 6 offline;
数据库已更改。
SQL> alter database datafile 5 offline;
数据库已更改。
SQL> recover database until cancel;
完成介质恢复。
SQL> alter database datafile 6 online;
数据库已更改。
SQL> alter database datafile 5 online;
数据库已更改。
SQL> alter database open resetlogs;
数据库已更改。
还好结合一些隐含参数侥幸恢复成功,差点到了要使用bbed的程度
这次的恢复告诉我:Oracle数据库恢复千万比大意,需要认真分析alert日志和咨询客户做了那些操作,不然可能导致万劫不复之禁地