数据库open报ORA-00959: tablespace ‘UNDOTBS1′ does not exist分析

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:数据库open报ORA-00959: tablespace ‘UNDOTBS1′ does not exist分析

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有一个朋友找到我,说数据库重启之后无法正常启动,提示ORA-00959 UNDOTBS1表空间不存在
ORA-00959-undotbs1


在数据库的启动过程中,这个是一个很常见的错误,一般出现这类错误的原因是由于undo_tablespace指定的undo表空间不存在导致.但是这个库比较明显,设置了undo_management=manual, undo_tablespace=system,依旧数据库需要找undotbs1表空间,进一步分析数据库当前表空间情况
20220508224920

该数据库确实没有undotbs1表空间,基于以上信息,初步怀疑很可能是undo回滚段异常,通过对于oracle基表进行分析,发现信息
undo-tbs
undotbs1

基于上面的信息,可以确认回滚段中确实有四条记录指向被删除的undotbs1,而且还有一条undo回滚段信息为need recovery状态.进一步分析数据库alert日志

--创建undotbs2表空间,并重启数据库
Sun Sep 26 14:42:16 2021
create undo tablespace UNDOTBS2 datafile '/data/oradata/xifenfei/undotbs001.dbf' size 120G
reuse autoextend on next 500m maxsize unlimited
Sun Sep 26 14:46:46 2021
Completed: create undo tablespace UNDOTBS2 datafile '/data/oradata/xifenfei/undotbs001.dbf' size 120G
reuse autoextend on next 500m maxsize unlimited
Sun Sep 26 14:47:13 2021
[16927] Successfully onlined Undo Tablespace 10.
[16927] **** active transactions found in undo Tablespace 2 - moved to Pending Switch-Out state.
[16927] active transactions found/affinity dissolution incompletein undo tablespace 2 during switch-out.
ALTER SYSTEM SET undo_tablespace='UNDOTBS2' SCOPE=BOTH;
Sun Sep 26 14:49:25 2021
Shutting down instance (immediate)
Shutting down instance: further logons disabled
Stopping background process QMNC

---第一次尝试删除undotbs1失败
Sun Sep 26 15:06:30 2021
drop tablespace UNDOTBS1 INCLUDING CONTENTS AND DATAFILES CASCADE CONSTRAINTS
Sun Sep 26 15:47:26 2021
ORA-1013 signalled during: drop tablespace UNDOTBS1 INCLUDING CONTENTS AND DATAFILES CASCADE CONSTRAINTS
...
Sun Sep 26 15:48:40 2021
Shutting down instance (immediate)
Shutting down instance: further logons disabled
Stopping background process QMNC

--加上该隐含参数,继续重启库删除undotbs1,依旧删除失败
  _corrupted_rollback_segments= "_SYSSMU28_1306132068$"

Sun Sep 26 15:53:34 2021
QMNC started with pid=31, OS id=20454 
Completed: ALTER DATABASE OPEN
Sun Sep 26 15:55:32 2021
 drop tablespace UNDOTBS1 INCLUDING CONTENTS AND DATAFILES CASCADE CONSTRAINTS
…………
Sun Sep 26 16:45:47 2021
ORA-1013 signalled during:  drop tablespace UNDOTBS1 INCLUDING CONTENTS AND DATAFILES CASCADE CONSTRAINTS...
Sun Sep 26 16:46:00 2021
Shutting down instance (immediate)
Shutting down instance: further logons disabled
Stopping background process QMNC

---加上以下参数,并尝试重启数据库删除undotbs1成功
  _corrupted_rollback_segments= "_SYSSMU31_201790566$"
  _corrupted_rollback_segments= "_SYSSMU30_2395098326$"
  _corrupted_rollback_segments= "_SYSSMU29_306369076$"
  _corrupted_rollback_segments= "_SYSSMU28_1306132068$"

Sun Sep 26 16:47:24 2021
QMNC started with pid=31, OS id=23421 
Completed: ALTER DATABASE OPEN
Sun Sep 26 16:47:40 2021
drop tablespace UNDOTBS1 INCLUDING CONTENTS AND DATAFILES CASCADE CONSTRAINTS
…………
Mon Sep 27 01:44:56 2021
Deleted file /data/oradata/xifenfei/undotbs01.dbf
Deleted file /data/oradata/xifenfei/undotbs02.dbf
Deleted file /data/oradata/xifenfei/undotbs03.dbf
Deleted file /data/oradata/xifenfei/undotbs04.dbf
Deleted file /data/oradata/xifenfei/undotbs05.dbf
Deleted file /data/oradata/xifenfei/undotbs06.dbf
Completed: drop tablespace UNDOTBS1 INCLUDING CONTENTS AND DATAFILES CASCADE CONSTRAINTS

从这里基本上可以看出来,操作之人非常草率,在oracle回滚段还在被占用状态,直接尝试删除老undo表空间,在无法删除之后,直接暴力的使用undo回滚段异常参数,然后进行undo表空间删除.这样操作的后果屏蔽了事务的一致性,导致后续可能导致数据库一系列异常.对当前库启动过程进行跟踪发现
20220511192523


数据库启动的过程中查询undo$中的undotbs1信息,而该表空间不存在,所以出现此类报错,从而导致数据库无法正常启动.知道问题原因所在,那通过undo$记录,数据库即可正常启动.

_OFFLINE_ROLLBACK_SEGMENTS/_CORRUPTED_ROLLBACK_SEGMENTS

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:_OFFLINE_ROLLBACK_SEGMENTS/_CORRUPTED_ROLLBACK_SEGMENTS

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

对于oracle undo异常的时候恢复中,经常需要使用的_OFFLINE_ROLLBACK_SEGMENTS和_CORRUPTED_ROLLBACK_SEGMENTS参数,关于这两个参数的区别进行说明
_OFFLINE_ROLLBACK_SEGMENTS 参数说明
_offline_rollback_segments


_CORRUPTED_ROLLBACK_SEGMENTS 参数说明
_corrupted_rollback_segments


_OFFLINE_ROLLBACK_SEGMENTS 和 _CORRUPTED_ROLLBACK_SEGMENTS 区别
offline_corrupted


这两个参数属于oracle隐含参数,在没有oracle support的情况下,请慎用.该相关参数可能导致数据库逻辑不一致风险,如果使用了,建议逻辑方式导出导入库

Oracle 12c undo异常处理—root pdb undo异常

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:Oracle 12c undo异常处理—root pdb undo异常

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在12c pdb环境中如果root pdb的undo文件异常,数据库该如何恢复呢?这篇文章模拟undo丢失的情况下进行恢复
模拟环境
三个会话,其中第一个会话对pdb1中的表进行操作,并且有事务未提交,第二个会话对pdb2进行操作,也未提交事务;第三个会话直接abort库,模拟突然库异常,然后删除root pdb下面的undo文件

--会话1
[oracle@ora1221 oradata]$ sqlplus  / as sysdba
SQL*Plus: Release 12.2.0.0.3 Production on Thu Jun 16 22:24:20 2016
Copyright (c) 1982, 2016, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 2516582400 bytes
Fixed Size                  8260048 bytes
Variable Size             671090224 bytes
Database Buffers         1828716544 bytes
Redo Buffers                8515584 bytes
Database mounted.
Database opened.
SQL>
SQL>
SQL>
SQL> show pdbs;
    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           MOUNTED
         4 PDB2                           MOUNTED
SQL> alter session set container=pdb1;
Session altered.
SQL> alter database open;
Database altered.
SQL>  create user chf identified by oracle;
User created.
SQL> grant dba to chf;
Grant succeeded.
SQL> create table chf.t_xifenfei_p1 as
  2  select * from dba_objects;
Table created.
SQL> insert into chf.t_xifenfei_p1
  2  select * from dba_objects;
72427 rows created.
SQL> select count(*) from chf.t_xifenfei_p1;
  COUNT(*)
----------
    144853
--会话2
[oracle@ora1221 ~]$ ss
SQL*Plus: Release 12.2.0.0.3 Production on Thu Jun 16 22:34:01 2016
Copyright (c) 1982, 2016, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.0.3 - 64bit Production
SQL> alter session set container=pdb2;
Session altered.
SQL> alter database open;
Database altered.
SQL>  create user chf identified by oracle;
User created.
SQL> grant dba to chf;
Grant succeeded.
SQL>  create table chf.t_xifenfei_p2
  2  as select * from dba_objects;
Table created.
SQL> delete from chf.t_xifenfei_p2;
72426 rows deleted.
SQL> select count(*) from chf.t_xifenfei_p2;
  COUNT(*)
----------
    0
--会话3
[oracle@ora1221 ~]$ ss
SQL*Plus: Release 12.2.0.0.3 Production on Thu Jun 16 22:36:16 2016
Copyright (c) 1982, 2016, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.0.3 - 64bit Production
SQL> shutdown abort
ORACLE instance shut down.
--删除cdb undo文件
[oracle@ora1221 orcl12c2]$ ls -ltr
total 2040912
drwxr-x---. 2 oracle oinstall      4096 Jun 16 18:26 pdbseed
drwxr-x---. 2 oracle oinstall      4096 Jun 16 18:27 pdb2
drwxr-x---. 2 oracle oinstall      4096 Jun 16 18:28 pdb1
-rw-r-----. 1 oracle oinstall 209715712 Jun 16 22:24 redo03.log
-rw-r-----. 1 oracle oinstall   5251072 Jun 16 22:24 users01.dbf
-rw-r-----. 1 oracle oinstall  34611200 Jun 16 22:25 temp01.dbf
-rw-r-----. 1 oracle oinstall 849354752 Jun 16 22:35 system01.dbf
-rw-r-----. 1 oracle oinstall  73408512 Jun 16 22:35 undotbs01.dbf
-rw-r-----. 1 oracle oinstall 492838912 Jun 16 22:35 sysaux01.dbf
-rw-r-----. 1 oracle oinstall 209715712 Jun 16 22:36 redo02.log
-rw-r-----. 1 oracle oinstall 209715712 Jun 16 22:36 redo01.log
-rw-r-----. 1 oracle oinstall  18726912 Jun 16 22:36 control02.ctl
-rw-r-----. 1 oracle oinstall  18726912 Jun 16 22:36 control01.ctl
[oracle@ora1221 orcl12c2]$ rm undotbs01.dbf
[oracle@ora1221 orcl12c2]$ ls -l un*
ls: cannot access un*: No such file or directory

启动数据库
由于有undo文件丢失数据库在启动的时候检测到文件丢失(ORA-01157),无法open,offline文件后依旧无法启动(ORA-00376)

[oracle@ora1221 orcl12c2]$ ss
SQL*Plus: Release 12.2.0.0.3 Production on Thu Jun 16 22:51:21 2016
Copyright (c) 1982, 2016, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup pfile='/tmp/pfile'
ORACLE instance started.
Total System Global Area 2516582400 bytes
Fixed Size                  8260048 bytes
Variable Size             671090224 bytes
Database Buffers         1828716544 bytes
Redo Buffers                8515584 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
ORA-01110: data file 4: '/u01/app/oracle/oradata/orcl12c2/undotbs01.dbf'
offline 数据文件
SQL> alter database datafile 4 offline ;
alter database datafile 4 offline
*
ERROR at line 1:
ORA-01145: offline immediate disallowed unless media recovery enabled
SQL>  alter database datafile 4 offline drop;
Database altered.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 4 cannot be read at this time
ORA-01110: data file 4: '/u01/app/oracle/oradata/orcl12c2/undotbs01.dbf'
Process ID: 7547
Session ID: 16 Serial number: 56234

把undo_management修改为manual后启动库,依旧报ORA-00376

SQL> startup pfile='/tmp/pfile' mount;
ORACLE instance started.
Total System Global Area 2516582400 bytes
Fixed Size                  8260048 bytes
Variable Size             671090224 bytes
Database Buffers         1828716544 bytes
Redo Buffers                8515584 bytes
Database mounted.
SQL> show parameter undo_management;
NAME                                 TYPE
------------------------------------ ----------------------
VALUE
------------------------------
undo_management                      string
MANUAL
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 4 cannot be read at this time
ORA-01110: data file 4: '/u01/app/oracle/oradata/orcl12c2/undotbs01.dbf'
Process ID: 7981
Session ID: 16 Serial number: 56572

设置_corrupted_rollback_segments参数

SQL> startup pfile='/tmp/pfile' mount;
ORACLE instance started.
Total System Global Area 2516582400 bytes
Fixed Size                  8260048 bytes
Variable Size             671090224 bytes
Database Buffers         1828716544 bytes
Redo Buffers                8515584 bytes
Database mounted.
SQL> show parameter _corrupted_rollback_segments;
NAME                                 TYPE
------------------------------------ ----------------------
VALUE
------------------------------
_corrupted_rollback_segments         string
_SYSSMU1_3200770482$, _SYSSMU2
_3597554035$, _SYSSMU3_2898427
493$, _SYSSMU4_670955920$, _SY
SSMU5_1233449977$, _SYSSMU6_32
67641983$, _SYSSMU7_2822479342
$, _SYSSMU8_1645196706$, _SYSS
MU9_3032014485$, _SYSSMU10_474
465626$
SQL> alter database open;
Database altered.

通过设置_corrupted_rollback_segments参数之后,数据库正常启动,下面继续其他pdb

open pdb1

SQL> alter session set container=pdb1;
Session altered.
SQL> alter database open;
Database altered.
SQL> select count(*) from chf.t_xifenfei_p1;
  COUNT(*)
----------
     72426

pdb2 open

SQL> alter session set container=pdb2;
Session altered.
SQL> alter database open;
Database altered.
SQL> select count(*) from chf.t_xifenfei_p2;
  COUNT(*)
----------
     72426

至此数据库基本上恢复完成,但是看到的pdb里面两个测试表的数据和我们预测的有一定的偏差,看来cdb中的undo和pdb中的undo还是有一定的依赖关系.同时也说明了root的undo异常对于其他pdb的open最少在恢复上面影响不大.下一篇测试业务pdb中undo异常处理