记录block 0损坏,数据文件大量坏块,使用不当数据库版本恢复等各种操作之后的故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:记录block 0损坏,数据文件大量坏块,使用不当数据库版本恢复等各种操作之后的故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

今天一个恢复行业的朋友让我帮忙看一个oracle故障,说是vmdk文件被加密(加密破坏的很少),然后他从里面恢复出来了oracle的数据库文件,客户那边拿到数据文件然后说有三个文件头(被重命名为.bak)损坏了,无法打开数据库,让我这边给他们分析处理.分析他们说的被破坏文件,确实有损坏(初步看是block 0)
block0


使用oracle自带的dbv进行检测

C:\Users\XFF>dbv file=H:\BaiduNetdisk\D\SYSTEM06.DBF.bak

DBVERIFY: Release 11.2.0.4.0 - Production on 星期五 7月 3 21:06:28 2026

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.


DBV-00107: 未知标头格式 (85) (1311029317)

由于block 0损坏,oracle自带的dbv无法检测,使用obet的dbv功能进行检测(obet实现对数据文件坏块检测功能)

===============================================
DBV (Data Block Verification) Report
Started: 2026-07-03 21:04:22
Block Size: 8192 bytes
Target File: ALL files in listfile
===============================================

File #1: H:\BaiduNetdisk\D\SYSTEM06.DBF.bak (12801 blocks) - Started: 2026-07-03 21:04:22
File #1: rfile=19 (0x00000013)  header_block_num=12800 (0x00003200)  filesize_status:OK
file 1, block 0: rdba error (expected 0, got 4083969), bad block
  File #1 completed: 0 all zero, 0 soft corrupted, 0 tailchk error, 0 checksum error, 1 rdba error

File #1: H:\BaiduNetdisk\D\TSP_XXXXS06.DBF.bak (4194303 blocks) - Started: 2026-07-03 21:04:22
File #1: rfile=24 (0x00000018)  header_block_num=4194302 (0x003FFFFE)  filesize_status:OK
file 1, block 0: rdba error (expected 0, got 1367117), bad block
  File #1 completed: 0 all zero, 0 soft corrupted, 0 tailchk error, 0 checksum error, 1 rdba error

File #1: H:\BaiduNetdisk\D\TSP_XXXXS07.DBF.bak (1933313 blocks) - Started: 2026-07-03 21:04:42
File #1: rfile=25 (0x00000019)  header_block_num=1933312 (0x001D8000)  filesize_status:OK
file 1, block 0: rdba error (expected 0, got 2445572), bad block
  File #1 completed: 0 all zero, 0 soft corrupted, 0 tailchk error, 0 checksum error, 1 rdba error


DBV completed at: 2026-07-03 21:04:51
===============================================
DBV Summary:
Total blocks checked: 6140414
Total all zero blocks found: 0
Total all rdba error blocks found: 3
Total all tailchk error blocks found: 0
Total all soft corrupted blocks found: 0
Total all checksum error blocks found: 0
Total bad blocks found: 3
Execution time: 29.00 seconds
===============================================

通过检测确认这三个数据文件就是block 0损坏其他的数据块是ok的.检查其他数据文件发现有一个数据文件有近2w个坏块

DBVERIFY: Release 11.2.0.4.0 - Production on 星期五 7月 3 12:55:08 2026

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.


DBVERIFY - 开始验证: FILE = H:\BAIDUNETDISK\ORCL\TSP_XXXX.DBF
页 409601 标记为损坏
Corrupt block relative dba: 0x01864001 (file 6, block 409601)
Bad header found during dbv: 
Data in bad block:
 type: 219 format: 1 rdba: 0xdf6e738f
 last change scn: 0xd7ed.50be2d8b seq: 0x44 flg: 0x52
 spare1: 0x39 spare2: 0x92 spare3: 0xb5c5
 consistency value in tail: 0x6e7debc9
 check value in block header: 0x6d54
 block checksum disabled

………………
DBVERIFY - 验证完成

检查的页总数: 4194302
处理的页总数 (数据): 1842107
失败的页总数 (数据): 0
处理的页总数 (索引): 2143604
失败的页总数 (索引): 0
处理的页总数 (其他): 13763
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 437
标记为损坏的总页数: 194391
流入的页总数: 2
加密的总页数        : 0
最高块 SCN            : 1869110545 (9.1869110545)

对于这个文件,由于客户有部分rman备份,尝试通过备份找出来这个文件历史文件

RMAN> run
2> {
3> set newname for datafile 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\TSP_XXXX.DBF' 
4>   to 'H:\BaiduNetdisk\orcl\TSP_XXXX.DBF_rman';
5> restore datafile 6;
6> }

正在执行命令: SET NEWNAME

启动 restore 于 03-7月 -26
使用通道 ORA_DISK_1

通道 ORA_DISK_1: 正在开始还原数据文件备份集
通道 ORA_DISK_1: 正在指定从备份集还原的数据文件
通道 ORA_DISK_1: 将数据文件 00006 还原到 H:\BaiduNetdisk\orcl\TSP_XXXX.DBF_rman
通道 ORA_DISK_1: 正在读取备份片段 H:\BAIDUNETDISK\BACKDB\FULLDB_ORCL_20250704_TC3TM4LJ_1_1.BAK.RESTORED0
通道 ORA_DISK_1: ORA-19870: 还原备份片段 H:\BAIDUNETDISK\BACKDB\FULLDB_ORCL_20250704_TC3TM4LJ_1_1.BAK.RESTORED0 时出错
ORA-19599: 块编号 1065697 已在 backup piece H:\BAIDUNETDISK\BACKDB\FULLDB_ORCL_20250704_TC3TM4LJ_1_1.BAK.RESTORED0 中损 坏

故障转移到上一个备份

通道 ORA_DISK_1: 正在开始还原数据文件备份集
通道 ORA_DISK_1: 正在指定从备份集还原的数据文件
通道 ORA_DISK_1: 将数据文件 00006 还原到 H:\BaiduNetdisk\orcl\TSP_XXXX.DBF_rman
通道 ORA_DISK_1: 正在读取备份片段 H:\BAIDUNETDISK\BACKDB\FULLDB_ORCL_20250703_T33TJG9L_1_1.BAK
通道 ORA_DISK_1: ORA-19870: 还原备份片段 H:\BAIDUNETDISK\BACKDB\FULLDB_ORCL_20250703_T33TJG9L_1_1.BAK 时出错
ORA-19599: 块编号 95352 已在 backup piece H:\BAIDUNETDISK\BACKDB\FULLDB_ORCL_20250703_T33TJG9L_1_1.BAK 中损坏

故障转移到上一个备份

通道 ORA_DISK_1: 正在开始还原数据文件备份集
通道 ORA_DISK_1: 正在指定从备份集还原的数据文件
通道 ORA_DISK_1: 将数据文件 00006 还原到 H:\BaiduNetdisk\orcl\TSP_XXXX.DBF_rman
通道 ORA_DISK_1: 正在读取备份片段 H:\BAIDUNETDISK\BACKDB\FULLDB_ORCL_20250702_SQ3TGRTU_1_1.BAK
通道 ORA_DISK_1: 段句柄 = H:\BAIDUNETDISK\BACKDB\FULLDB_ORCL_20250702_SQ3TGRTU_1_1.BAK 标记 = TAG20250702T000518
通道 ORA_DISK_1: 已还原备份片段 1
通道 ORA_DISK_1: 还原完成, 用时: 00:28:56
完成 restore 于 03-7月 -26

RMAN> exit

运气还不错,从备份里面找一份好的该文件的备份,然后通过使用该文件好的block替换最初损坏的block,实现该文件无坏块(由于这个文件比较靠前,而且已经写满,所以只差3天左右数据可能改变很小,因此可以采用这种替代方法最大限度恢复数据).数据文件检测和明显坏块处理完成,接下来开始打开数据库操作.
由于我这边的数据文件路径和原库的不一致,修改ctl中的datafile 路径报ORA-17503错误
rename


对于这些情况,通过修复block 0,然后尝试重命名数据文件成功

SQL> alter database rename file 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\TSP_XXX06.DBF' to 'H:\BaiduNetdisk\orcl\TSP_XXX06.DBF'
  2  ;

数据库已更改。

SQL> alter database rename file 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\TSP_XXX07.DBF' to 'H:\BaiduNetdisk\orcl\TSP_XXX07.DBF'
  2  ;

数据库已更改。

SQL> alter database rename file 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM06.DBF' to 'H:\BaiduNetdisk\orcl\SYSTEM06.DBF'
  2  ;

数据库已更改。

然后尝试打开数据库成功

SQL> recover database;
完成介质恢复。
SQL> alter database open;

数据库已更改。

尝试expdp导出数据,结果报UDE-22303

C:\Users\XFF>expdp "'/ as sysdba'" full=y dumpfile=expdp_0703_%U.dmp DIRECTORY=expdp_dir logfile=expdp_0703.log 
 parallel=4 compression=all EXCLUDE=STATISTICS,AUDIT

Export: Release 11.2.0.4.0 - Production on 星期五 7月 3 13:10:12 2026

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

连接到: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

UDE-22303: 操作产生了 ORACLE 错误 22303
OCI-22303: 未找到类型 "SYS"."KU$_STATUS1020"

正常打开的数据库,没有明显坏块,出现这个错误,理论上不太应该,怀疑是客户的版本问题,查询组件版本
v1


确认该数据库版本可能是11.2.0.1而不是我现在看到的数据文件头为11.2.0.4,查询WRM$_DATABASE_INSTANCE,确认数据库之前版本是11.2.0.1.对于这种情况,直接简单的把compatible从11.2.0.4修改为11.2.0.0肯定不行,因为ctl,dbf,redo里面都写了11.2.0.4的信息
1. ctl报版本不匹配

SQL> startup mount pfile='i:/pfile.txt';
ORACLE 例程已经启动。

Total System Global Area 3206836224 bytes
Fixed Size                  2180024 bytes
Variable Size             654314568 bytes
Database Buffers         2533359616 bytes
Redo Buffers               16982016 bytes
ORA-00201: ?????? 11.2.0.4.0 ? ORACLE ?? 11.2.0.0.0 ???
ORA-00202: ????: ''H:\BAIDUNETDISK\ORCL\CONTROL01.CTL''

2. 这种情况需要rectl,然后报dbf版本不对

CREATE CONTROLFILE REUSE DATABASE "ORCL" NORESETLOGS  NOARCHIVELOG
*
第 1 行出现错误:
ORA-01503: CREATE CONTROLFILE ??
ORA-01130: ??????? 11.2.0.4.0 ? ORACLE ?? 11.2.0.0.0 ???
ORA-01110: ???? 1: 'H:\BAIDUNETDISK\ORCL\SYSTEM01.DBF'

这种情况,dbf中的版本信息无法通过重建解决,只能使用obet工具修改,每个文件类似修改(Oracle数据块编辑工具( Oracle Block Editor Tool)-obet)

OBET> set file 25
filename set to: H:\BAIDUNETDISK\ORCL\TSP_XXXX07.DBF (file#25)

OBET> set offset 24
offset set to: 24

OBET> m 0000

Confirm modification:
File: H:\BAIDUNETDISK\ORCL\TSP_XXXX07.DBF
Block: 1
Offset: 24 (file offset: 0x00002018)
Original value: 0004
New value:      0000
Confirm? (Y/YES to proceed): y
Verification successful: Data written correctly.
Modified 2 bytes at offset 0x00002018 successfully.

OBET> sum apply

Confirm applying checksum:
File: H:\BAIDUNETDISK\ORCL\TSP_XXXX07.DBF
Block: 1
Offset in block: 16 (file offset: 0x00002010)
Original value: 0x94C3
New value:      0x94C7
Confirm? (Y/YES to proceed): y
Verification successful: Stored checksum matches calculated value (0x94C7).
Checksum applied successfully.

OBET>

然后重建ctl报redo版本不兼容ORA-00331

CREATE CONTROLFILE REUSE DATABASE "ORCL" NORESETLOGS  NOARCHIVELOG
*
第 1 行出现错误:
ORA-01503: CREATE CONTROLFILE 失败
ORA-00331: 日志版本 0.0.0.0.0 与 ORACLE 版本 11.2.0.0.0 不兼容
ORA-01517: 日志成员: 'H:\BAIDUNETDISK\ORCL\REDO01.LOG'

采用resetlogs方式重建(不读取redo信息),重建ctl成功,然后直接打开数据库

 37  CHARACTER SET AL32UTF8
 38  ;

控制文件已创建。

SQL> alter database open resetlogs;

数据库已更改。

然后expdp导出数据完成本次恢复任务

需要注意:dbv 检测controlfile可能不准

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:需要注意:dbv 检测controlfile可能不准

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

oracle dbv工具是Oracle数据库离线检测坏块的工具主要是用来检测数据文件坏块(物理和逻辑坏块),虽然在一定程度上面可以检测controlfile的坏块,但是不是特别准,在一次恢复案例中数据库启动报controlfile损坏,但是dbv检测是正常的
数据库在mount的过程中报controlfile 损坏ORA-00227
ctl


把控制文件从asm里面拷贝到文件系统,然后通过dbv进行检测,一切正常

[oracle@oracle1 ~]$ dbv blocksize=16384 file=/tmp/control01.ctl 

DBVERIFY: Release 11.2.0.4.0 - Production on Wed Jul 1 14:21:58 2026

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /tmp/control01.ctl


DBVERIFY - Verification complete

Total Pages Examined         : 1312
Total Pages Processed (Data) : 0
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 395
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 917
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 4294967295 (65535.4294967295)

这个故障通过重建ctl,打开数据库成功

达梦数据库redo异常强制拉库

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:达梦数据库redo异常强制拉库

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

通过模拟事务不提交,并删除掉redo来模仿达梦数据库故障的恢复过程
模拟达梦事务并不提交直接abort掉库

[dmdba@xifenfei ~]$ disql SYSDBA/Oracle123@localhost:5236

Server[localhost:5236]:mode is normal, state is open
login used time : 5.575(ms)
disql V8
SQL> 
SQL> create table t1 as select * from dba_objects;
executed successfully
used time: 85.929(ms). Execute id is 601.
SQL> select count(1) from t1;

LINEID     COUNT(1)            
---------- --------------------
1          1067

used time: 2.106(ms). Execute id is 602.
SQL> delete from t1;
affect rows 1067

used time: 3.726(ms). Execute id is 603.
SQL> shutdown abort;
executed successfully
used time: 1.324(ms). Execute id is 0.

删除掉redo文件

[dmdba@xifenfei ~]$ cd /dmdb/data/DAMENG/
[dmdba@xifenfei DAMENG]$ rm -rf DAMENG0
DAMENG01.log  DAMENG02.log  
[dmdba@xifenfei DAMENG]$ rm -rf DAMENG0*.log

尝试启动达梦数据库

[dmdba@xifenfei bin]$ ./dmserver  /dmdb/data/DAMENG/dm.ini 
file dm.key not found, use default license!
version info: develop
csek2_vm_t = 9456
nsql_vm_t = 336
prjt2_vm_t = 176
ltid_vm_t = 272
nins2_vm_t = 1144
nset2_vm_t = 272
ndlck_vm_t = 192
ndel2_vm_t = 760
slct2_vm_t = 352
nli2_vm_t = 200
aagr2_vm_t = 312
pscn_vm_t = 416
dist_vm_t = 1000
DM Database Server 64 V8 03134284458-20251113-301923-20178 startup...
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
Database mode = 0, oguid = 0
/dmdb/data/DAMENG/DAMENG01.log not exist, can not startup

重命名现在的库文件,然后参考当时的创建库的init.log 进行重新初始化一个新库

[dmdba@xifenfei data]$ cat DAMENG_BAK/dminit_DAMENG_20260315093214.log 
start init database: V8, 2026-03-15 09:32:14
init params:
        db path: /dmdb/data/DAMENG
        db name: DAMENG
        auto overwrite: 0
        page size: 8192
        extent size: 16
        char_fix_storage: 0
        sql_log_forbid: 0
        secur_flag: 2
        enable mac: 0
        page checksum policy: 1
        time zone: +08:00
        string case sensitive: 1
        charset: 0
        page check mode: 3
        page check algorithm id: 0
        priv flag: 0
        env label: 0
        rlog enc flag: 0
        use new hash: 1
        blank pad mode: 0
        sec priv mode: 0
        huge with delta: 1
        rlog gen for huge: 1
        pseg_mgr_flag: 0
        auto_adj_para: 0
[dmdba@xifenfei data]$ dminit PATH=/dmdb/data/DAMENG  PAGE_SIZE=8 EXTENT_SIZE=16 
> LOG_SIZE=256 PORT_NUM=5236 CASE_SENSITIVE=Y CHARSET=0 DB_NAME=DAMENG INSTANCE_NAME=DMSERVER
> SYSDBA_PWD=Oracle123 SYSAUDITOR_PWD=Oracle123 RLOG_POSTFIX_NAME=log
initdb V8
db version: 0x7000d
file dm.key not found, use default license!
License will expire on 2026-11-13
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL

 log file path: /dmdb/data/DAMENG/DAMENG/DAMENG01.log


 log file path: /dmdb/data/DAMENG/DAMENG/DAMENG02.log

write to dir [/dmdb/data/DAMENG/DAMENG].
create dm database success. 2026-06-27 23:40:17

正常启动这个新库并干净关闭

[dmdba@xifenfei bin]$ disql SYSDBA/Oracle123@localhost:5236

Server[localhost:5236]:mode is normal, state is open
login used time : 5.523(ms)
disql V8
SQL> shutdown normal
2   ;
executed successfully
used time: 1.417(ms). Execute id is 0.
SQL> 

直接拷贝redo文件替换尝试启动

[dmdba@xifenfei bin]$ ./dmserver  /dmdb/data/DAMENG/dm.ini 
file dm.key not found, use default license!
version info: develop
csek2_vm_t = 9456
nsql_vm_t = 336
prjt2_vm_t = 176
ltid_vm_t = 272
nins2_vm_t = 1144
nset2_vm_t = 272
ndlck_vm_t = 192
ndel2_vm_t = 760
slct2_vm_t = 352
nli2_vm_t = 200
aagr2_vm_t = 312
pscn_vm_t = 416
dist_vm_t = 1000
DM Database Server 64 V8 03134284458-20251113-301923-20178 startup...
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
Database mode = 0, oguid = 0
License will expire on 2026-11-13
rfil grp log file error in (db_magic, permanent_magic), 
        log file /dmdb/data/DAMENG/DAMENG01.log is (475644558, 854749702), 
        dbfile is(92637567, 1763199417).
Floating point exception
[dmdba@xifenfei bin]$ 

提示db_magic和permanent_magic不匹配,使用dmmdf修改新库redo文件

[dmdba@xifenfei DAMENG]$ dmmdf type=1 FILE=SYSTEM.DBF
dmmdf V8
**********************************************************
1 db_magic=92637567
2 next_trxid=38064
3 pemnt_magic=1763199417
4 enable_page_check=3
**********************************************************
Please input which parameter you want to change(1-4), q to quit: Q

然后在使用dmmdf type=2 修改redo文件的db_magic和pemnt_magic修改之后,设置PSEG_RECV = 0,RLOG_CHECK_SPACE = 2尝试启动库

[dmdba@xifenfei bin]$ ./dmserver  /dmdb/data/DAMENG/dm.ini
file dm.key not found, use default license!
version info: develop
csek2_vm_t = 9456
nsql_vm_t = 336
prjt2_vm_t = 176
ltid_vm_t = 272
nins2_vm_t = 1144
nset2_vm_t = 272
ndlck_vm_t = 192
ndel2_vm_t = 760
slct2_vm_t = 352
nli2_vm_t = 200
aagr2_vm_t = 312
pscn_vm_t = 416
dist_vm_t = 1000
DM Database Server 64 V8 03134284458-20251113-301923-20178 startup...
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
Database mode = 0, oguid = 0
License will expire on 2026-11-13
file lsn: 49136
ndct db load finished, code:100
ndct fill fast pool finished
pseg_set_gtv_trxid_low next_trxid in mem:[40065]
pseg recv finished
nsvr_startup end.
uthr_pipe_create, create pipe[read:10, write:11]
uthr_pipe_create, create pipe[read:12, write:13]
uthr_pipe_create, create pipe[read:14, write:15]
uthr_pipe_create, create pipe[read:16, write:17]
uthr_pipe_create, create pipe[read:18, write:19]
uthr_pipe_create, create pipe[read:20, write:21]
uthr_pipe_create, create pipe[read:22, write:23]
uthr_pipe_create, create pipe[read:24, write:25]
uthr_pipe_create, create pipe[read:26, write:27]
uthr_pipe_create, create pipe[read:28, write:29]
uthr_pipe_create, create pipe[read:30, write:31]
uthr_pipe_create, create pipe[read:32, write:33]
uthr_pipe_create, create pipe[read:34, write:35]
uthr_pipe_create, create pipe[read:36, write:37]
uthr_pipe_create, create pipe[read:38, write:39]
uthr_pipe_create, create pipe[read:40, write:41]
aud sys init success.
aud rt sys init success.
systables desc init success.
ndct_db_load_info finished, code:100.
nsvr_process_before_open begin.
nsvr_process_before_open success.
SYSTEM IS READY.

查询拉起来库中的数据

[dmdba@xifenfei DAMENG]$ disql SYSDBA/Oracle123@localhost:5236

Server[localhost:5236]:mode is normal, state is open
login used time : 6.192(ms)
disql V8
SQL> select count(1) from t1;

LINEID     COUNT(1)            
---------- --------------------
1          1067

used time: 8.976(ms). Execute id is 601.
SQL> 

dd破坏包含50多个pdb的asm 磁盘组恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:dd破坏包含50多个pdb的asm 磁盘组恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

前段时间刚刚恢复了一个客户dd了34个磁盘中的2个磁盘的1-2G多的数据(在生产环境错误执行dd命令破坏asm磁盘故障恢复),这次又遇到一个客户dd了asm 的两个磁盘的100m和10m,这次故障麻烦的是由于dd了disk 0和disk 1,ausize为4M,有50多个pdb在该磁盘组中,相对恢复比较麻烦,通过不懈努力,最后终于最大限度恢复客户数据,避免了进一步的损失
故障误操作
近期又一个可以误执行了dd命令,破坏了生产库的两个asm disk磁盘
1


上述截图中可以确认两个信息
1.sdh盘被dd掉了10m,sdg盘被dd掉了100M
2.sdg对应的是asm-data1,sdh对应的是asm-data2
dd误操作之后,data磁盘组开始报错,然后直接dismount.
2

分析原磁盘组中asm磁盘情况
1.确定损坏磁盘在磁盘组中位置
通过asm的alert日志分析损坏的盘对应的磁盘组信息
3

基于这个信息,我们可以确认asm-data1 对应的是DATA磁盘组的disk 0,asm-data2对应的是DATA磁盘组的disk 1,也就是说现在DATA磁盘组的0号盘被dd了100M,1号盘被dd了10m
2.确认ausize大小
通过分析该磁盘组的其他磁盘确认ausize大小
kfdhdb.ausize: 4194304 ; 0x0bc: 0×00400000
可以确认该磁盘组的au大小为4M
3.磁盘组后续进行了一次加盘扩容
4

在25年6月份,对data磁盘组加了三块盘
4.sdh(asm-data2)磁盘还被加入到了arch磁盘组中
5

比较幸运由于在另外一个节点上sdh磁盘权限没有正确修改,导致该增加没有完全成功,也就是该磁盘没有被reblance(如果加入成功并正常reblance,那后果比现在严重很多)
5.通过结合asm日志以及kfed,磁盘物理大小等信息,并且通过kfed构造出来损坏的磁盘头信息,列出故障之前DATA磁盘组所有磁盘的情况(为了避免udev别名带来的影响,我直接使用物理磁盘名称来显示)
6

客户现场情况说明
1.客户有一个大概1年多之前的dataguard(后续没有继续同步),已经进行了failover激活
2.客户的备份系统中有一个大概1个月之前的备份,但是备份库缺少归档,我接手之时,已经被维护厂商强制拉起
3.在我恢复之前,有专业的工程师已经对其这个现场进行了分析,但是没有拿出好的恢复方案
恢复难点说明
1.该磁盘组的disk 0 被dd掉了100M,这个导致kfdhdb.f1b1locn记录被清空,也就无法获取到存储指向ASM file 1 文件目录表,这个值虽然被清空,但是根据经验或者对比其他磁盘组的disk 0 可以确认指定aun为10
2.f1b1locn指向的au中存储中asm元数据1-255以及256-1023的file的前面60个au的extent映射表,由于这个在aun为10(也就是aun*aus=40M)的位置,但是这个位置也已经被dd掉了,从原理上业务文件256-1023的前面60个au的extent映射表彻底丢失
3.通过工具扫描,发现存储别名信息的au也在disk 0的前面100M之内(也被dd掉了),导致通过别名直接定位文件的起点的恢复思路也不可行,而且别名丢失导致以别名的数据文件后续识别有一定的难度(无法获取asm里面文件的完整路径)
4.该库有50多套pdb组成,也就意味着通过数据库碎片扫描的方式无法恢复(因为每个pdb里面都是由默认的种子创建而成,也就意味着rfile 1,4,9是重复的)
5.由于中途加过一次磁盘,因引起这个asm里面的文件进行重新reblance,使得部分文件同一个block记录在不同磁盘的au上(数据文件块可能重复)
6.由于该磁盘组中asm disk 大小不等,导致文件au分布在各个磁盘上不均匀,无明显规律可循
恢复操作
1.通过kfed对损坏的asm-data1、asm-disk2的磁盘头进行构造,便于后续的恢复工具识别
2.通过工具对data中所有磁盘进行扫描,主要扫描文件extent映射表信息和ACD中关于asm文件的分配信息
2.1> 通过对这些信息进行综合分析,确认asm file >=1024的文件的extent映射表信息完整,数据可以直接恢复,效果类似
7

2.2> 对于256-1023号文件通过asm file的extent信息缺少0-59 au的数据,结合acd中获取的部分au分配信息,可以尽可能完整的恢复出来这些数据(由于disk 0中的acd信息丢失,所以不是100%完整)
8

3.对于绝对文件号非1,4,9的文件,而且asm file 小于1024的数据文件,结合rdba碎片重组的方式再一次进行恢复,避免上面2.2中恢复中前面240M(60个au),有部分au信息丢失导致文件不完整的情况
通过上述多种方法恢复,整体恢复数据文件类似(由于文件较多,存放多个目录和根据规则取了多种名字)
9

4.由于asm的文件目录,别名信息全部丢失,而且该库有50多个pdb,无法确认恢复出来的上千个文件和pdb对应的关系,对于这种情况,临时写了一个小程序,对这些文件进行读取,获取file#,rfile#,ts#,tsname,文件大小,dbid,dbname,scn等信息(obet(Oracle Block Editor Tool)第二版发布
10

通过把这些信息和历史的控制文件中的信息进行匹配,确认各个文件所属的pdb关系
5.通过4中获取的文件和pdb对应关系然后通过dbms_pdb.recover包实现把恢复的文件插入到新的cdb库中,在这个插入和open库过程中遇到各种错误,都一一解决
11
12
13
14
15

6.最终完成客户数据恢复要求(为了保证数据不被再次修改,客户要求所有恢复的pdb不能打开到读写模式)
16

然后由客户的运维厂商或者应用厂商把需要的数据迁移或者整合到新库中并恢复业务,完成本次恢复任务

Oracle数据库系统回滚段异常处理-ORA-600 4137/4193

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle数据库系统回滚段异常处理-ORA-600 4137/4193

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最初是由于数据库sysaux文件无法正常恢复,重建ctl抛弃sysaux文件,然后打开数据库,但是无法expdp导出数据

Export: Release 12.2.0.1.0 - Production on Wed Jun 24 17:18:04 2026
Copyright (c) 1982, 2017, Oracle and/or its affiliates.  All rights reserved.
Connected to: Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
ORA-31626: job does not exist
ORA-31637: cannot create job SYS_EXPORT_SCHEMA_02 for user SYS
ORA-06512: at "SYS.KUPV$FT", line 1140
ORA-06512: at "SYS.KUPV$FT", line 1741
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: at "SYS.KUPV$FT_INT", line 823
ORA-39080: failed to create queues "KUPC$C_1_20260624171804" and "" for Data Pump job
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: at "SYS.KUPC$QUE_INT", line 1541
ORA-00376: file 3 cannot be read at this time
ORA-06512: at "SYS.DBMS_AQADM", line 742
ORA-06512: at "SYS.DBMS_AQADM_SYS", line 8060
ORA-01110: data file 3: '/u01/app/oracle/product/12.2.0.1/dbhome_2/dbs/MISSING00003'
ORA-06512: at "SYS.DBMS_AQADM_SYSCALLS", line 912
ORA-06512: at "SYS.DBMS_AQADM_SYS", line 8036
ORA-06512: at "SYS.DBMS_AQADM", line 737
ORA-06512: at "SYS.KUPC$QUE_INT", line 1461
ORA-06512: at line 1
ORA-06512: at "SYS.KUPC$QUEUE_INT", line 158
ORA-06512: at "SYS.KUPV$FT_INT", line 758
ORA-06512: at "SYS.KUPV$FT", line 1645
ORA-06512: at "SYS.KUPV$FT", line 1101

然后通过各方人员一顿操作猛如虎,导致数据库启动报ORA-600 4137和ORA-600 4193错误,数据库无法open成功

2026-06-24T18:38:50.158906+08:00
alter database open
2026-06-24T18:38:50.182720+08:00
Ping without log force is disabled:
  instance mounted in exclusive mode.
2026-06-24T18:38:50.219449+08:00
…………
2026-06-24T18:38:50.514016+08:00
ARC3: Archival started
ARCH: STARTING ARCH PROCESSES COMPLETE
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_48840.trc  (incident=304968):
ORA-00600: internal error code, arguments: [4137], [0.77.1546], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_304968/orcl1_ora_48840_i304968.trc
Use ADRCI or Support Workbench to package the incident.
ORACLE Instance orcl1 (pid = 53) - Error 600 encountered while recovering transaction (0, 77).
2026-06-24T18:38:51.313973+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_48840.trc:
ORA-00600: internal error code, arguments: [4137], [0.77.1546], [0], [0], [], [], [], [], [], [], [], []
2026-06-24T18:38:51.649361+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_48840.trc  (incident=304969):
ORA-00600: internal error code, arguments: [4193], [1112], [1122], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_304969/orcl1_ora_48840_i304969.trc
2026-06-24T18:38:53.412782+08:00
opiodr aborting process unknown ospid (48840) as a result of ORA-603

需要open故障库,并且正常导出数据,需要处理两个问题
1. 被抛弃的sysaux文件需要正常online起来,不然expdp无法正常导出用户或者全库数据
2. 需要解决open过程的ORA-600 4137/ORA-600 4193错误
对于sysaux文件进行检查,由于重建ctl没有包含异常的sysaux文件,因此无法直接从库中查询到当前各种文件头相关情况,通过obet直接解析文件头获取相关信息(Oracle数据块编辑工具( Oracle Block Editor Tool)-obet)
res


对于这种情况,可以使用obet的修改文件头checkpoint scn和resetlogs scn功能进行快速修复

OBET> set file 2
filename set to: /u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_sysaux_go991cmw_.dbf (file#2)

OBET> backup
Created backup directory: backup_blk
Successfully backed up block 1 from current file to /tmp/backup_blk/o1_mf_sysaux_go991cmw_.dbf_1.20260624191357

OBET> copy chkscn file 1 to file 2
Error: Edit mode not enabled. Use 'set mode edit' first.

OBET> set mode edit
mode set to: edit

OBET> copy chkscn file 1 to file 2

Confirm Modify chkscn:
Source: file#1 (/u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_system_go990lcg_.dbf)
Target: file#2 (/u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_sysaux_go991cmw_.dbf)
Proceed? (Y/YES to confirm): yes
Successfully copied checkpoint SCN information from file#1 to file#2.

OBET> copy resetlogscn file 1 to file 2

Confirm Modify resetlogscn:
Source: file#1 (/u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_system_go990lcg_.dbf)
Target: file#2 (/u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_sysaux_go991cmw_.dbf)
Proceed? (Y/YES to confirm): yes
Successfully copied resetlog SCN information from file#1 to file#2.

OBET> sum
Check value for File /u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_sysaux_go991cmw_.dbf, Block 1:
current = 0xF21B, required = 0x6651

OBET> sum apply

Confirm applying checksum:
File: /u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_sysaux_go991cmw_.dbf
Block: 1
Offset in block: 16 (file offset: 0x00002010)
Original value: 0xF21B
New value:      0x6651
Confirm? (Y/YES to proceed): y
Verification successful: Stored checksum matches calculated value (0x6651).
Checksum applied successfully.

OBET> tailchk
Check tailchk for File /u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_sysaux_go991cmw_.dbf, Block 1:
current = 0x010B0000, required = 0x010B0000

OBET>

然后重建ctl,包含该sysaux,尝试打开数据库,报ORA-600 4193错误

SYS@ORCL> alter database open ;
alter database open 
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4193], [1112], [1122], [], [], [], [], [], [], [], [], []
Process ID: 93550
Session ID: 1123 Serial number: 55884

进一步跟踪启动过程,确认报错在update undo$上


PARSING IN CURSOR 
#140446136869016
 len=160 dep=1 uid=0 oct=6 lid=0 tim=3161302405543 hv=1292341136 ad='9bbd4828' sqlid='8vyjutx6hg3wh'
update /*+ rule */ undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,
xactsqn=:8,scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12,spare1=:13 where us#=:1
END OF STMT
PARSE 
#140446136869016
:c=11966,e=11918,p=18,cr=94,cu=0,mis=1,r=0,dep=1,og=3,plh=0,tim=3161302405542
BINDS 
#140446136869016
:
 Bind
#0
  oacdty=01 mxl=32(21) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=9bbdac32  bln=32  avl=21  flg=09
  value="_SYSSMU12_3861134380$"
 Bind
#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda370  bln=24  avl=02  flg=05
  value=5
 Bind
#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda340  bln=24  avl=03  flg=05
  value=144
 Bind
#3
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda310  bln=24  avl=02  flg=05
  value=5
 Bind
#4
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda2e0  bln=24  avl=02  flg=05
  value=1
 Bind
#5
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda2b0  bln=24  avl=04  flg=05
  value=46221
 Bind
#6
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda280  bln=24  avl=05  flg=05
  value=30810931
 Bind
#7
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda250  bln=24  avl=06  flg=05
  value=3399756014
 Bind
#8
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda220  bln=24  avl=03  flg=05
  value=2429
 Bind
#9
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda1f0  bln=24  avl=02  flg=05
  value=2
 Bind
#10
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda1c0  bln=24  avl=02  flg=05
  value=4
 Bind
#11
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda190  bln=24  avl=02  flg=05
  value=2
 Bind
#12
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda3a0  bln=22  avl=02  flg=05
  value=12
WAIT 
#140446136869016
: nam='db file sequential read' ela= 16 file#=1 block#=547 blocks=1 obj#=0 tim=3161302406306
2026-06-24T19:59:40.979075+08:00
ORA-00600: internal error code, arguments: [4193], [1112], [1122], [], [], [], [], [], [], [], [], []

alert日志中还有ORA-600 4137等错误

ORACLE Instance orcl1 (pid = 53) - Error 600 encountered while recovering transaction (0, 77).
2026-06-24T19:59:40.387459+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_83245.trc:
ORA-00600: internal error code, arguments: [4137], [0.77.1546], [0], [0], [], [], [], [], [], [], [], []

通过这个报错,可以确认是由于0号回滚段,也就是rollback中事务异常,获取相关的trace

[TOC00003]
----- Beginning of Customized Incident Dump(s) -----
XID passed in = xid: 0x0000.04d.0000060a
XID from Undo block = xid: 0x0000.060.00000600
Dump of buffer cache at level 7 for pdb=0 tsn=0 rdba=4194432
BH (0x3ddfd26b8) file#: 1 rdba: 0x00400080 (1/128) class: 15 ba: 0x3ddb80000
  set: 166 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,0
  dbwrid: 3 obj: -1 objn: 0 tsn: [0/0] afn: 1 hint: f
  hash: [0x3ddece808,0xc6bdc2d8] lru: [0xbc1db108,0xbc1db108]
  ckptq: [NULL] fileq: [NULL]
  objq: [0xa2267bc0,0xa2267bc0] objaq: [0xa2267bb0,0xa2267bb0]
  st: XCURRENT md: NULL fpin: 'ktuwh72: ktugus:ktuswr1' fscn: 0x980cfff669f tch: 1
  flags:
  LRBA: [0x0.0.0] LSCN: [0x0] HSCN: [0x0] HSUB: [65535]
  Printing buffer operation history (latest change first):
  cnt: 10
  01. sid:00 L353:gcur:set:MEXCL      02. sid:00 L145:zib:mk:EXCL       
  03. sid:00 L212:zib:bic:FSQ         04. sid:00 L122:zgb:set:st        
  05. sid:00 L830:olq1:clr:WRT+CKT    06. sid:00 L951:zgb:lnk:objq      
  07. sid:00 L372:zgb:set:MEXCL       08. sid:00 L123:zgb:no:FEN        
  09. sid:00 L083:zgb:ent:fn          10. sid:01 L203:w_ini_dc:bic:FVB  
  buffer tsn: 0 rdba: 0x00400080 (1/128)
  scn: 0x980cffc5958 seq: 0x01 flg: 0x04 tail: 0x59580e01
  frmt: 0x02 chkval: 0x2688 type: 0x0e=KTU UNDO HEADER W/UNLIMITED EXTENTS

基于这样的情况,可以判断通过清理undo$中的相关记录,让其重新分配新的回滚块

Block Header:
block type=0x0e (KTU UNDO HEADER W/UNLIMITED EXTENTS)
block format=0xa2 (oracle 10+)
block rdba=0x00400080 (file#=1, block#=128)
scn=0x0980.cff7c56d, seq=1, tail=0xc56d0e01
block checksum value=0x2683=9859, flag=4
  Extent Control Header
  -------------------------------------------------------------
  Extent Header:: extents: 10  blocks: 79
                  last map: 0x00000000  
#maps
: 0  offset: 4128
      Highwater:: 0x00400225  (rfile#=1,block#=549)
                  ext#: 6  blk#: 5   ext size:8
      
#blocks
 in seg. hdr's freelists: 0
      
#blocks
 below: 0
      mapblk: 0x00000000   offset: 6
      Map Header:: next: 0x00000000   
#extents
: 10  obj#: 0  flag: 0x40000000
  Extent Control Header
  -------------------------------------------------------------
   0x00400081  length: 7
   0x004206a8  length: 8
   0x004206b0  length: 8
   0x00400088  length: 8
   0x00400210  length: 8
   0x00400218  length: 8
   0x00400220  length: 8
   0x00400228  length: 8
   0x004206a0  length: 8
   0x00400230  length: 8
  TRN CTL:: seq: 0x0462 chd: 0x005e ctl: 0x000d inc: 0x00000000 nbf: 0x0000
            mgc: 0x8002 xts: 0x0068 flg: 0x0001 opt: 2147483646(0x7ffffffe)
            uba: 0x00000225.0462.1d scn: 0x0980.cf1f2121
Version: 0x01
  FREE BLOCK POOL::
    uba: 0x00000000.0462.1c ext: 0x6  spc: 0x11a2
    uba: 0x00000000.0462.26 ext: 0x6  spc: 0xc86
    uba: 0x00000000.0462.03 ext: 0x6  spc: 0x1e5c
    uba: 0x00000000.0460.03 ext: 0x4  spc: 0x1e5c
    uba: 0x00000000.043c.21 ext: 0x8  spc: 0xd2c

然后数据库打开成功,使用expdp完美导出数据,完成本次恢复任务

使用deepseek进行Oracle恢复,引起重大故障

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:使用deepseek进行Oracle恢复,引起重大故障

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有一个恢复case,查询数据库是open状态,有一个数据文件处于offline,删除表空间报offline的文件不能读写
2
3
1


根据经验,这个是一个小问题,可能就是由于datafile 5 offline了,而这个文件是undo表空间的所以出现这样的情况,想着屏蔽下异常回滚段,或者强制online下文件就可以解决该问题.先进行第一个尝试,屏蔽异常回滚段,由于库是open状态,直接查询数据库是否有异常回滚段
undo_seg

无法查询到异常回滚段,这个有点不太符合常规认知,进一步核实文件和表空间信息
4
5
6
到这一步就发现了异常:
1. v$tablespace里面有两个undotbs1的表空间(这个肯定不对,是ctl和ts$不一致)
2. ts$中只有一个而且ts#=9没有ts#=2
3. file$中有ts#=2,这样导致ts$和file$信息不匹配,也不对
基于上述这样信息,我怀疑有人对底层字典进行了操作delete了ts$这个表记录.让现场技术人员再次确认这个库的所有操作,最后确认在他不知情的情况下,有另外的技术人员上来进行了类似操作
delete_ts

根据他们提供的聊天记录,以及当前数据库情况,进一步确认他们应该是执行了

delete from ts$ where name='UNDOTB1';
delete from seg$ where ts#=2;

没有对file$进行delete操作.对于这样的情况,人工删除字典,明显没有处理干净.导致数据库的任何操作都会去检查异常事务.
seg


通过清理这些异常事务,数据库可以正常操作,数据也导出成功
expdp

后续和当时直接进行delete 字典操作的人员沟通,他那边是根据deepseek提供的建议进行处理的
deepseek

在这里温馨提醒,虽然现在的ai比较发达,很多问题可以直接在上面问出来答案,但是需要对这些答案有一个判断能力,不能他说啥你就执行啥,特别是数据库非常规恢复这种不可逆而且可能引起重大事故的高风险性操作需要谨慎和做好回退方案.

接手一个只差临门一脚的数据库恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:接手一个只差临门一脚的数据库恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

刚刚进行了一个数据库恢复case,故障处理起来非常简单(设置一个参数启动库即可),大概回溯下故障经过,反馈是由于客户在没有停机的情况下复制虚拟机,然后启动虚拟机之后,数据库无法正常启动报ORA-00314 ORA-00312错误

Wed Jun 17 10:24:02 2026
ALTER DATABASE   MOUNT
Successful mount of redo thread 1, with mount id 3689345602
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT
Wed Jun 17 10:24:06 2026
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 32 processes
Started redo scan
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_60458.trc:
ORA-00314: log 2 of thread 1, expected sequence# 12271 doesn't match 9443
ORA-00312: online log 2 thread 1: '/oradata/orcl/redo02.log'
Aborting crash recovery due to error 314
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_60458.trc:
ORA-00314: log 2 of thread 1, expected sequence# 12271 doesn't match 9443
ORA-00312: online log 2 thread 1: '/oradata/orcl/redo02.log'
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_60458.trc:
ORA-00314: log 2 of thread 1, expected sequence# 12271 doesn't match 9443
ORA-00312: online log 2 thread 1: '/oradata/orcl/redo02.log'
ORA-314 signalled during: ALTER DATABASE OPEN...
Wed Jun 17 10:24:07 2026
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_61170.trc:
ORA-00314: log 1 of thread 1, expected sequence# 12269 doesn't match 9441
ORA-00312: online log 1 thread 1: '/oradata/orcl/redo01.log'
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_61170.trc:
ORA-00314: log 2 of thread 1, expected sequence# 12271 doesn't match 9443
ORA-00312: online log 2 thread 1: '/oradata/orcl/redo02.log'
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_61170.trc:
ORA-00314: log 3 of thread 1, expected sequence# 12265 doesn't match 9444
ORA-00312: online log 3 thread 1: '/oradata/orcl/redo03.log'
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_61170.trc:
ORA-00314: log 4 of thread 1, expected sequence# 12266 doesn't match 9438
ORA-00312: online log 4 thread 1: '/oradata/orcl/redo04.log'
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_61170.trc:
ORA-00314: log 5 of thread 1, expected sequence# 12267 doesn't match 9439
ORA-00312: online log 5 thread 1: '/oradata/orcl/redo05.log'
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_61170.trc:
ORA-00314: log 6 of thread 1, expected sequence# 12268 doesn't match 9440
ORA-00312: online log 6 thread 1: '/oradata/orcl/redo06.log'

这个错误的redo sequence差距有点大,个人感觉可能不是简单的复制引起的,由于没有第一现场不好溯源,不乱做评论,姑且认为是由于虚拟机复制引起的问题
客户使用隐含参数强制打开数据库,报ORA-1555错误

Wed Jun 17 10:42:50 2026
ALTER DATABASE OPEN RESETLOGS
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 1838398816
Resetting resetlogs activation ID 3463997979 (0xce786a1b)
ORA-344 signalled during: ALTER DATABASE OPEN RESETLOGS...
Wed Jun 17 10:44:30 2026
ALTER DATABASE OPEN RESETLOGS
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 1838398816
Resetting resetlogs activation ID 3463997979 (0xce786a1b)
Wed Jun 17 10:45:13 2026
Setting recovery target incarnation to 3
Wed Jun 17 10:45:13 2026
Assigning activation ID 3689297678 (0xdbe6370e)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /oradata/orcl/redo01.log
Successful open of redo thread 1
Wed Jun 17 10:45:14 2026
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Jun 17 10:45:14 2026
SMON: enabling cache recovery
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0000.6d93bd66):
select ctime, mtime, stime from obj$ where obj# = :1
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_18038.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 35 with name "_SYSSMU35_3782695576$" too small
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_18038.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 35 with name "_SYSSMU35_3782695576$" too small
Error 704 happened during db open, shutting down database
USER (ospid: 18038): terminating the instance due to error 704
Instance terminated by USER, pid = 18038
ORA-1092 signalled during: ALTER DATABASE OPEN RESETLOGS...
opiodr aborting process unknown ospid (18038) as a result of ORA-1092
Wed Jun 17 10:45:16 2026
ORA-1092 : opitsk aborting process

这个是一个比较经典的错误,以往的文章中总结了这类错误可能涉及的具体sql语句
在数据库open过程中常遇到ORA-01555汇总
数据库open过程遭遇ORA-1555对应sql语句补充
客户尝试多次重启之后,数据库报ORA-600 2662错误

Wed Jun 17 10:55:34 2026
Thread 1 advanced to log sequence 3 (thread open)
Thread 1 opened at log sequence 3
  Current log# 3 seq# 3 mem# 0: /oradata/orcl/redo03.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Jun 17 10:55:34 2026
SMON: enabling cache recovery
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_6707.trc  (incident=117868):
ORA-00600: internal error code, arguments: [2662], [0], [1838441753], [0], [1838486139], [12583040], 
Incident details in: /u01/oracle/diag/rdbms/orcl/orcl/incident/incdir_117868/orcl_ora_6707_i117868.trc
Wed Jun 17 10:55:45 2026
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_6707.trc:
ORA-00600: internal error code, arguments: [2662], [0], [1838441753], [0], [1838486139], [12583040], 
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_6707.trc:
ORA-00600: internal error code, arguments: [2662], [0], [1838441753], [0], [1838486139], [12583040], 
Error 600 happened during db open, shutting down database
USER (ospid: 6707): terminating the instance due to error 600
Instance terminated by USER, pid = 6707
ORA-1092 signalled during: alter database open...

客户后续多次重建ctl,强制拉库等操作,一直在ORA-600 2662上面循环,后面终于出现了ORA-600 4193/4194错误,数据库没有正常open成功,至此客户放弃恢复尝试。
我们接手故障之后,设置undo手工管理模式,然后直接启动库成功
open1
然后使用expdp导出数据,完成本次恢复工作

硬件故障后数据文件大小不对故障处理—Oracle碎片扫描恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:硬件故障后数据文件大小不对故障处理—Oracle碎片扫描恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有硬件恢复圈朋友找到我,说硬件恢复之后dbv报dbv-00102错误,让我给看看是否可以处理
dbv-00102


这个是oracle dbv中一种常见错误,一般是由于block 0 不对,或者是由于文件大小不对引起,让把恢复文件发给我,进行检查

SQL> select name,bytes/1024/1024/1024 from v$datafile_header;

NAME                                                                             BYTES/1024/1024/1024
-------------------------------------------------------------------------------- --------------------
H:\BAIDUNETDISK\ORADATA\XXXXORCL\SYSTEM01.DBF                                             2.080078125
H:\BAIDUNETDISK\ORADATA\XXXXORCL\SYSAUX01.DBF                                             2.880859375
H:\BAIDUNETDISK\ORADATA\XXXXORCL\UNDOTBS01.DBF                                           9.0087890625
H:\BAIDUNETDISK\ORADATA\XXXXORCL\USERS01.DBF                                          31.993408203125
H:\BAIDUNETDISK\ORADATA\XXXXORCL\USERS02.DBF                                                8.1640625
H:\BAIDUNETDISK\ORADATA\XXXXORCL\USERS03.DBF                                              7.958984375
H:\BAIDUNETDISK\ORADATA\XXXXORCL\USERS04.DBF                                              7.958984375
H:\BAIDUNETDISK\ORADATA\XXXXORCL\USERS05.DBF                                                 7.890625

已选择8行。

确定USER02-USERS05的dbf文件实际大小(数据文件头记录)在8G左右,但是目前恢复出来的文件大小只有4G左右
4g


在恢复工具中直接查看文件大小情况
rs

这里比较明显rs中虽然显示文件状态良好,但是实际大小也不对(得出经验:以后恢复中不能太依赖这个状态),根据反馈现场是三个盘的raid5,中途做了一次强制上线,然后客户也使用win pe拷贝过一次数据,大小和现在一样,也是少了近4G.第一反应可能是由于raid盘弄的不对,但是经过对其他文件的确认,多完全没有问题,排除了盘错误的问题,怀疑是由于文件系统异常导致,对于这种的情况,文件系统层面肯定无法恢复,考虑使用自研的OraScan工具进行扫描(OraScan(Oracle 碎片扫描工具) 使用说明)
ora1
ora2

通过OraScan扫描找到相关block,并提取出来数据文件
file

使用dbv检测文件

C:\Users\XFF>dbv file=H:\BaiduNetdisk\xff\YFKJORCL.USERS.4.7.4.N.DBF

DBVERIFY: Release 11.2.0.4.0 - Production on 星期日 6月 7 18:06:30 2026

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - 开始验证: FILE = H:\BAIDUNETDISK\XFF\YFKJORCL.USERS.4.7.4.N.DBF


DBVERIFY - 验证完成

检查的页总数: 1043200
处理的页总数 (数据): 67167
失败的页总数 (数据): 0
处理的页总数 (索引): 37995
失败的页总数 (索引): 0
处理的页总数 (其他): 861109
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 76929
标记为损坏的总页数: 0
流入的页总数: 0
加密的总页数        : 0
最高块 SCN            : 347454063 (0.347454063)

把文件拷贝替换掉之前恢复的USERS02-USER05.DBF,然后尝试打开数据库

SQL> recover database ;
完成介质恢复。
SQL> alter database open ;
alter database open
*
第 1 行出现错误:
ORA-03113: 通信通道的文件结尾
进程 ID: 3308
会话 ID: 14 序列号: 3

查看alert日志分析原因

Sun Jun 07 14:43:51 2026
Recovery of Online Redo Log: Thread 1 Group 2 Seq 36464 Reading mem 0
  Mem# 0: H:\BAIDUNETDISK\ORADATA\YFKJORCL\REDO02.LOG
Completed: ALTER DATABASE RECOVER  database   
alter database open 
Beginning crash recovery of 1 threads
 parallel recovery started with 19 processes
Started redo scan
Completed redo scan
 read 2353 KB redo, 0 data blocks need recovery
Started redo application at
 Thread 1: logseq 36464, block 15876
Recovery of Online Redo Log: Thread 1 Group 2 Seq 36464 Reading mem 0
  Mem# 0: H:\BAIDUNETDISK\ORADATA\YFKJORCL\REDO02.LOG
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 36464, block 20582, scn 347475303
 0 data blocks read, 0 data blocks written, 2353 redo k-bytes read
Sun Jun 07 14:43:57 2026
Errors in file c:\app\xff\diag\rdbms\yfkjorcl\o11201\trace\o11201_lgwr_2204.trc:
ORA-00314: ?? 3 (???? 1) ??? sequence# 36462 ? 32025 ???
ORA-00312: ???? 3 ?? 1: 'H:\BAIDUNETDISK\ORADATA\YFKJORCL\REDO03.LOG'
Errors in file c:\app\xff\diag\rdbms\yfkjorcl\o11201\trace\o11201_lgwr_2204.trc:
ORA-00314: ?? 3 (???? 1) ??? sequence# 36462 ? 32025 ???
ORA-00312: ???? 3 ?? 1: 'H:\BAIDUNETDISK\ORADATA\YFKJORCL\REDO03.LOG'
Errors in file c:\app\xff\diag\rdbms\yfkjorcl\o11201\trace\o11201_ora_3308.trc:
ORA-00314: 日志 1 (用于线程 ) 要求的 sequence#  与  不匹配
ORA-00312: 联机日志 3 线程 1: 'H:\BAIDUNETDISK\ORADATA\YFKJORCL\REDO03.LOG'
USER (ospid: 3308): terminating the instance due to error 314
Sun Jun 07 14:44:02 2026
Instance terminated by USER, pid = 3308

由于redo group 异常导致库无法正常open,但是由于已经recover database成功,因此大概率可以clear该redo 组

SQL> select group#,status from v$log;

    GROUP# STATUS
---------- ----------------
         1 INACTIVE
         3 INACTIVE
         2 CURRENT

SQL> alter database clear logfile group 3;

数据库已更改。

SQL> alter database open;

数据库已更改。

数据库open成功,然后使用expdp导出数据,完成本次恢复任务.

1.5T MySQL数据库完美恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:1.5T MySQL数据库完美恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户MySQL数据库异常无法正常启动,需要提供恢复支持,当时提供的错误日志信息为:log sequence number xxxx is in the future
Q15


2026-06-03T13:35:02.368514Z 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html for information about forcing recovery.
2026-06-03T13:35:02.369669Z 0 [ERROR] InnoDB: Page [page id: space=0, page number=521127] log sequence number 15319315659882 is in the future! Current system log sequence number 6712970192343.
从头分析mysql的日志,发现最初情况为:

---TRANSACTION 8424429306, ACTIVE 259 sec truncating table
mysql tables in use 1, locked 1
0 lock struct(s), heap size 1136, 0 row lock(s)
MySQL thread id 4513911, OS thread handle 21996, query id 4194188849 localhost 127.0.0.1 root System lock
TRUNCATE TABLE xxxx
--------
FILE I/O
--------
I/O thread 0 state: wait Windows aio (insert buffer thread)
I/O thread 1 state: wait Windows aio (log thread)
I/O thread 2 state: complete io for buf page (read thread)
I/O thread 3 state: wait Windows aio (read thread)
I/O thread 4 state: wait Windows aio (read thread)
I/O thread 5 state: complete io for buf page (read thread)
I/O thread 6 state: wait Windows aio (write thread)
I/O thread 7 state: wait Windows aio (write thread)
I/O thread 8 state: wait Windows aio (write thread)
I/O thread 9 state: wait Windows aio (write thread)
Pending normal aio reads: [2, 0, 0, 3] , aio writes: [0, 0, 0, 0] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 0
908141629 OS file reads, 8774070813 OS file writes, 2977363738 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
InnoDB: ###### Diagnostic info printed to the standard error stream
2026-06-03T06:37:25.679003Z 0 [Warning] InnoDB: A long semaphore wait:
--Thread 17216 has waited at btr0sea.ic line 128 for 258  seconds the semaphore:
S-lock on RW-latch at 0000017F19122E18 created in file btr0sea.cc line 195
a writer (thread id 8340) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffff
Last time read locked in file btr0sea.ic line 128
Last time write locked in file g:\ade\build\sb_0-34537258-1560180832.84\mysql-5.7.27\storage\innobase\include\btr0sea.ic line 90
2026-06-03T06:37:25.681739Z 0 [Warning] InnoDB: A long semaphore wait:
--Thread 28160 has waited at btr0sea.ic line 128 for 241  seconds the semaphore:
S-lock on RW-latch at 0000017F19123598 created in file btr0sea.cc line 195
a writer (thread id 13620) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffff
Last time read locked in file btr0sea.ic line 128
Last time write locked in file G:\ade\build\sb_0-34537258-1560180832.84\mysql-5.7.27\storage\innobase\btr\btr0cur.cc line 3874
2026-06-03T06:37:25.684495Z 0 [Warning] InnoDB: A long semaphore wait:
--Thread 23052 has waited at btr0sea.ic line 128 for 253  seconds the semaphore:
S-lock on RW-latch at 0000017F19123598 created in file btr0sea.cc line 195
a writer (thread id 13620) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffff
Last time read locked in file btr0sea.ic line 128
Last time write locked in file G:\ade\build\sb_0-34537258-1560180832.84\mysql-5.7.27\storage\innobase\btr\btr0cur.cc line 3874
2026-06-03T06:37:25.687586Z 0 [Warning] InnoDB: A long semaphore wait:
--Thread 28480 has waited at btr0sea.ic line 128 for 272  seconds the semaphore:
S-lock on RW-latch at 0000017F19122E18 created in file btr0sea.cc line 195
a writer (thread id 8340) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffff
Last time read locked in file btr0sea.ic line 128
Last time write locked in file g:\ade\build\sb_0-34537258-1560180832.84\mysql-5.7.27\storage\innobase\include\btr0sea.ic line 90
2026-06-03T06:37:25.689857Z 0 [Warning] InnoDB: A long semaphore wait:
--Thread 2868 has waited at buf0flu.cc line 1209 for 262  seconds the semaphore:
SX-lock on RW-latch at 00000179B38E0DC0 created in file buf0buf.cc line 1460
a writer (thread id 1008) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: f0000000
Last time read locked in file ibuf0ibuf.cc line 4552
Last time write locked in file G:\ade\build\sb_0-34537258-1560180832.84\mysql-5.7.27\storage\innobase\ibuf\ibuf0ibuf.cc line 406
…………………………
2026-06-03T06:37:56.919054Z 0 [Warning] InnoDB: A long semaphore wait:
--Thread 13620 has waited at btr0sea.ic line 90 for 303  seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0000017F19123598 created in file btr0sea.cc line 195
a writer (thread id 13620) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffff
Last time read locked in file btr0sea.ic line 128
Last time write locked in file G:\ade\build\sb_0-34537258-1560180832.84\mysql-5.7.27\storage\innobase\btr\btr0cur.cc line 3874
2026-06-03T06:37:56.921090Z 0 [Warning] InnoDB: A long semaphore wait:
--Thread 24268 has waited at btr0sea.ic line 128 for 252  seconds the semaphore:
S-lock on RW-latch at 0000017F19122E18 created in file btr0sea.cc line 195
a writer (thread id 8340) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffff
Last time read locked in file btr0sea.ic line 128
Last time write locked in file g:\ade\build\sb_0-34537258-1560180832.84\mysql-5.7.27\storage\innobase\include\btr0sea.ic line 90
2026-06-03T06:37:56.923175Z 0 [Warning] InnoDB: A long semaphore wait:
--Thread 15984 has waited at btr0sea.ic line 128 for 302  seconds the semaphore:
S-lock on RW-latch at 0000017F19122CD8 created in file btr0sea.cc line 195
a writer (thread id 5420) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffff
Last time read locked in file btr0sea.ic line 128
Last time write locked in file G:\ade\build\sb_0-34537258-1560180832.84\mysql-5.7.27\storage\innobase\btr\btr0cur.cc line 3874
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0
InnoDB: ###### Diagnostic info printed to the standard error stream
2026-06-03T07:04:50.616385Z 0 [Note] MySQL: Normal shutdown

2026-06-03T07:04:50.617251Z 0 [Note] Giving 97 client threads a chance to die gracefully
2026-06-03T07:54:31.982035Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use 
--explicit_defaults_for_timestamp server option (see documentation for more details).
2026-06-03T07:54:31.983047Z 0 [Warning] 'NO_ZERO_DATE', 'NO_ZERO_IN_DATE' and 'ERROR_FOR_DIVISION_BY_ZERO' 
sql modes should be used with strict mode. They will be merged with strict mode in a future release.
2026-06-03T07:54:31.983057Z 0 [Warning] 'NO_AUTO_CREATE_USER' sql mode was not set.
2026-06-03T07:54:31.983098Z 0 [Note] --secure-file-priv is set to NULL. Operations related to importing and exporting data are disabled
2026-06-03T07:54:31.985272Z 0 [Note] MySQL (mysqld 5.7.27) starting as process 832 ...
2026-06-03T07:54:32.054771Z 0 [Note] InnoDB: Mutexes and rw_locks use Windows interlocked functions
2026-06-03T07:54:32.055435Z 0 [Note] InnoDB: Uses event mutexes
2026-06-03T07:54:32.055728Z 0 [Note] InnoDB: _mm_lfence() and _mm_sfence() are used for memory barrier
2026-06-03T07:54:32.056143Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2026-06-03T07:54:32.057581Z 0 [Note] InnoDB: Number of pools: 1
2026-06-03T07:54:32.058963Z 0 [Note] InnoDB: Not using CPU crc32 instructions
2026-06-03T07:54:32.063255Z 0 [Note] InnoDB: Initializing buffer pool, total size = 40G, instances = 8, chunk size = 128M
2026-06-03T07:54:34.619212Z 0 [Note] InnoDB: Completed initialization of buffer pool
2026-06-03T07:54:35.420695Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2026-06-03T07:54:36.121477Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 15319438590791
2026-06-03T07:54:36.449855Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 15319443833344
2026-06-03T07:54:36.847204Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 15319449076224
2026-06-03T07:54:37.263455Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 15319454319104
2026-06-03T07:54:37.544475Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 15319459561984
2026-06-03T07:54:37.678504Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 15319463095246
2026-06-03T07:54:37.681262Z 0 [Note] InnoDB: Database was not shutdown normally!
2026-06-03T07:54:37.681755Z 0 [Note] InnoDB: Starting crash recovery.
2026-06-03T07:54:38.088953Z 0 [Note] InnoDB: 2 transaction(s) which must be rolled back or cleaned up in total 1 row operations to undo
2026-06-03T07:54:38.089913Z 0 [Note] InnoDB: Trx id counter is 8424438528
2026-06-03T07:54:38.090288Z 0 [Note] InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 0 1 2 3 4 5 6 7 8 …… 91 92 93 94 95 96 97 98 99 
2026-06-03T07:54:42.043861Z 0 [Note] InnoDB: Apply batch completed
2026-06-03T07:54:42.295726Z 0 [Note] InnoDB: Rolling back trx with id 8424429306, 0 rows to undo
2026-06-03T07:54:42.298212Z 0 [Note] InnoDB: Rollback of trx with id 8424429306 completed
2026-06-03T07:55:19.432722Z 0 [Note] InnoDB: Completing truncate for table with id (8714) residing in file-per-table tablespace with id (6005)
2026-06-03T07:55:30.963664Z 0 [ERROR] InnoDB: The age of the last checkpoint is 90593936, which exceeds the log group capacity 90593280.
2026-06-03T07:55:47.442228Z 0 [ERROR] InnoDB: The age of the last checkpoint is 90602582, which exceeds the log group capacity 90593280.
2026-06-03T07:56:05.422243Z 0 [ERROR] InnoDB: The age of the last checkpoint is 90609452, which exceeds the log group capacity 90593280.
2026-06-03T07:56:27.643550Z 0 [ERROR] InnoDB: The age of the last checkpoint is 90621740, which exceeds the log group capacity 90593280.
2026-06-03T07:56:47.845714Z 0 [ERROR] InnoDB: The age of the last checkpoint is 90627886, which exceeds the log group capacity 90593280.
2026-06-03T07:57:04.714691Z 0 [ERROR] InnoDB: The age of the last checkpoint is 90639662, which exceeds the log group capacity 90593280.
2026-06-03T07:57:26.028889Z 0 [ERROR] InnoDB: The age of the last checkpoint is 90645808, which exceeds the log group capacity 90593280.
2026-06-03T07:57:42.796901Z 0 [ERROR] InnoDB: The age of the last checkpoint is 90655262, which exceeds the log group capacity 90593280.
2026-06-03T07:58:04.513110Z 0 [ERROR] InnoDB: The age of the last checkpoint is 90670384, which exceeds the log group capacity 90593280.
2026-06-03T07:58:20.675070Z 0 [ERROR] InnoDB: The age of the last checkpoint is 90689842, which exceeds the log group capacity 90593280.
2026-06-03T07:58:39.362158Z 0 [ERROR] InnoDB: The age of the last checkpoint is 90699949, which exceeds the log group capacity 90593280.
2026-06-03T07:58:57.146181Z 0 [ERROR] InnoDB: The age of the last checkpoint is 90707243, which exceeds the log group capacity 90593280.
2026-06-03T07:59:15.226281Z 0 [ERROR] InnoDB: The age of the last checkpoint is 90731065, which exceeds the log group capacity 90593280.
2026-06-03T07:59:32.902269Z 0 [ERROR] InnoDB: The age of the last checkpoint is 90751406, which exceeds the log group capacity 90593280.
2026-06-03T07:59:55.020833Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use 
--explicit_defaults_for_timestamp server option (see documentation for more details).
2026-06-03T07:59:55.022839Z 0 [Warning] 'NO_ZERO_DATE', 'NO_ZERO_IN_DATE' and 'ERROR_FOR_DIVISION_BY_ZERO' 
sql modes should be used with strict mode. They will be merged with strict mode in a future release.
2026-06-03T07:59:55.022856Z 0 [Warning] 'NO_AUTO_CREATE_USER' sql mode was not set.
2026-06-03T07:59:55.022918Z 0 [Note] --secure-file-priv is set to NULL. Operations related to importing and exporting data are disabled
2026-06-03T07:59:55.028372Z 0 [Note] MySQL (mysqld 5.7.27) starting as process 2532 ...
2026-06-03T07:59:55.078201Z 0 [Note] InnoDB: Mutexes and rw_locks use Windows interlocked functions
2026-06-03T07:59:55.078998Z 0 [Note] InnoDB: Uses event mutexes
2026-06-03T07:59:55.079491Z 0 [Note] InnoDB: _mm_lfence() and _mm_sfence() are used for memory barrier
2026-06-03T07:59:55.080092Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2026-06-03T07:59:55.084872Z 0 [Note] InnoDB: Number of pools: 1
2026-06-03T07:59:55.090793Z 0 [Note] InnoDB: Not using CPU crc32 instructions
2026-06-03T07:59:55.096910Z 0 [Note] InnoDB: Initializing buffer pool, total size = 40G, instances = 8, chunk size = 128M
2026-06-03T07:59:57.966267Z 0 [Note] InnoDB: Completed initialization of buffer pool
2026-06-03T07:59:58.778551Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2026-06-03T07:59:59.348188Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 15319447826657
2026-06-03T07:59:59.942491Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 15319453069312
2026-06-03T08:00:00.233703Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 15319458312192
2026-06-03T08:00:00.470153Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 15319463555072
2026-06-03T08:00:01.158338Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 15319468797952
2026-06-03T08:00:01.852263Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 15319474040832
2026-06-03T08:00:02.555075Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 15319479283712
2026-06-03T08:00:03.289043Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 15319484526592
2026-06-03T08:00:04.031381Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 15319489769472
2026-06-03T08:00:04.754714Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 15319495012352
2026-06-03T08:00:05.533650Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 15319500255232
2026-06-03T08:00:06.253542Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 15319505498112
2026-06-03T08:00:06.883691Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 15319510740992
2026-06-03T08:00:07.603988Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 15319515983872
2026-06-03T08:00:08.268531Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 15319521226752
2026-06-03T08:00:08.967662Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 15319526469632
2026-06-03T08:00:09.323566Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 15319529184727
2026-06-03T08:00:09.328240Z 0 [Note] InnoDB: Database was not shutdown normally!
2026-06-03T08:00:09.328844Z 0 [Note] InnoDB: Starting crash recovery.
2026-06-03T08:00:09.730171Z 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 1 row operations to undo
2026-06-03T08:00:09.731034Z 0 [Note] InnoDB: Trx id counter is 8424439040
2026-06-03T08:00:09.731402Z 0 [Note] InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 0 1 2 3 4 5 6 7 8 9 10 11 12 …… 94 95 96 97 98 99 
2026-06-03T08:00:17.587182Z 0 [Note] InnoDB: Apply batch completed
2026-06-03T08:01:02.988241Z 0 [Note] InnoDB: Completing truncate for table with id (8714) residing in file-per-table tablespace with id (6005)
2026-06-03T08:01:07.965314Z 0 [ERROR] InnoDB: The age of the last checkpoint is 90593580, which exceeds the log group capacity 90593280.
2026-06-03T08:01:25.945332Z 0 [ERROR] InnoDB: The age of the last checkpoint is 90599727, which exceeds the log group capacity 90593280.
2026-06-03T08:01:43.117294Z 0 [ERROR] InnoDB: The age of the last checkpoint is 90610481, which exceeds the log group capacity 90593280.
2026-06-03T08:02:05.338558Z 0 [ERROR] InnoDB: The age of the last checkpoint is 90622769, which exceeds the log group capacity 90593280.

从这里看,最初是truncate table xxxx,然后由于被阻塞了无法truncate成功,可以就关闭了mysql服务,然后启动库就没有成功,然后就是加上了innodb_force_recovery出现了上述截图的错误.尝试进行强制拉库,遭遇以下错误

2026-06-04T07:05:59.924315Z 0 [Note] MySQL (mysqld 5.7.27) starting as process 8764 ...
2026-06-04T07:05:59.944187Z 0 [Warning] option 'innodb-purge-threads': unsigned value 0 adjusted to 1
2026-06-04T07:05:59.947611Z 0 [Note] InnoDB: Started in read only mode
2026-06-04T07:05:59.948012Z 0 [Note] InnoDB: Mutexes and rw_locks use Windows interlocked functions
2026-06-04T07:05:59.948485Z 0 [Note] InnoDB: Uses event mutexes
2026-06-04T07:05:59.948825Z 0 [Note] InnoDB: _mm_lfence() and _mm_sfence() are used for memory barrier
2026-06-04T07:05:59.949329Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2026-06-04T07:05:59.950087Z 0 [Note] InnoDB: Number of pools: 1
2026-06-04T07:05:59.950587Z 0 [Note] InnoDB: Not using CPU crc32 instructions
2026-06-04T07:05:59.950987Z 0 [Note] InnoDB: Disabling background log and ibuf IO write threads.
2026-06-04T07:05:59.952835Z 0 [Note] InnoDB: Initializing buffer pool, total size = 512M, instances = 1, chunk size = 128M
2026-06-04T07:05:59.980631Z 0 [Note] InnoDB: Completed initialization of buffer pool
2026-06-04T07:06:00.012856Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2026-06-04T07:06:00.013649Z 0 [Note] InnoDB: The user has set SRV_FORCE_NO_LOG_REDO on, skipping log redo
2026-06-04T07:06:00.019299Z 0 [Note] InnoDB: Completing truncate for table with id (8714) residing in file-per-table tablespace with id (6005)
07:06:00 UTC - mysqld got exception 0xc0000005 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=0
max_threads=200
thread_count=0
connection_count=0
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 87429 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
7ff67eaba97e    mysqld.exe!std::operator<<<std::char_traits<char> >()[ostream:791]
7ff67f1ffd49    mysqld.exe!os_file_create_subdirs_if_needed()[os0file.cc:2013]
7ff67f22e967    mysqld.exe!fil_ibd_create()[fil0fil.cc:3530]
7ff67f2afdd7    mysqld.exe!truncate_t::fixup_tables_in_non_system_tablespace()[row0trunc.cc:2274]
7ff67f1c563a    mysqld.exe!innobase_start_or_create_for_mysql()[srv0start.cc:2346]
7ff67f12ea40    mysqld.exe!innobase_init()[ha_innodb.cc:4077]
7ff67e9b783e    mysqld.exe!ha_initialize_handlerton()[handler.cc:840]
7ff67eab7228    mysqld.exe!plugin_initialize()[sql_plugin.cc:1229]
7ff67eab8790    mysqld.exe!plugin_register_builtin_and_init_core_se()[sql_plugin.cc:1589]
7ff67e972de3    mysqld.exe!init_server_components()[mysqld.cc:4080]
7ff67e977da5    mysqld.exe!win_main()[mysqld.cc:4773]
7ff67e975705    mysqld.exe!mysql_service()[mysqld.cc:5226]
7ff80e734da7    MSVCR120.dll!_beginthread()
7ff80e734e60    MSVCR120.dll!_endthread()
7ff84e4d84d4    KERNEL32.DLL!BaseThreadInitThunk()
7ff850cb1791    ntdll.dll!RtlUserThreadStart()
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

在启动过程中需要去完成truncate操作,但是由于强制拉库是只读状态导致无法完成,直接启动失败.如果非只读状态拉库,启动过程包InnoDB: Corruption of an index tree: table `innodb_change_buffer` index `CLUST_IND`, father ptr page no 111415, child page no 517749异常

2026-06-04T13:35:46.447160Z 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace 
but not the InnoDB log files. Please refer to http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html 
for information about forcing recovery.
2026-06-04T13:35:46.448525Z 0 [ERROR] InnoDB: Corruption of an index tree: table `innodb_change_buffer` index `CLUST_IND`, 
father ptr page no 111415, child page no 517749
PHYSICAL RECORD: n_fields 6; 1-byte offsets; info bits 0
 0: len 4; hex 00001775; asc    u;;
 1: len 1; hex 00; asc  ;;
 2: len 4; hex 00e78555; asc    U;;
 3: len 16; hex 000400010c0f0190002d860800088000; asc          -      ;;
 4: len 29; hex 323032362d30332d31345f3132325f315f323032363033313432303136; asc 2026-03-14_122_1_202603142016;;
 5: len 8; hex 800000001b26cad6; asc      &  ;;
2026-06-04T13:35:46.450738Z 0 [Note] InnoDB: n_owned: 0; heap_no: 2; next rec: 211
PHYSICAL RECORD: n_fields 7; 1-byte offsets; info bits 0
 0: len 4; hex 00001775; asc    u;;
 1: len 1; hex 00; asc  ;;
 2: len 4; hex 00e78555; asc    U;;
 3: len 16; hex 000400010c0f0190002d860800088000; asc          -      ;;
 4: len 29; hex 323032362d30332d31345f3132325f315f323032363033313432303136; asc 2026-03-14_122_1_202603142016;;
 5: len 8; hex 800000001b26cad6; asc      &  ;;
 6: len 4; hex 0001b337; asc    7;;
2026-06-04T13:35:46.452647Z 0 [Note] InnoDB: n_owned: 0; heap_no: 147; next rec: 14554
2026-06-04T13:35:46.453038Z 0 [ERROR] [FATAL] InnoDB: You should dump + drop + reimport the table to fix the corruption. 
If the crash happens at database startup. Please refer to http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html 
for information about forcing recovery. Then dump + drop + reimport.
2026-06-04 21:35:46 0x828  InnoDB: Assertion failure in thread 2088 in file ut0ut.cc line 910
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
13:35:46 UTC - mysqld got exception 0x80000003 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=0
max_threads=200
thread_count=0
connection_count=0
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 87429 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
7ff7c7794312    mysqld.exe!my_sigabrt_handler()[my_thr_init.c:449]
7fffaf31ec9d    MSVCR120.dll!raise()
7fffaf324874    MSVCR120.dll!abort()
7ff7c78b07e4    mysqld.exe!ut_dbg_assertion_failed()[ut0dbg.cc:67]
7ff7c78b09c1    mysqld.exe!ib::fatal::~fatal()[ut0ut.cc:910]
7ff7c790a794    mysqld.exe!btr_page_get_father_node_ptr_func()[btr0btr.cc:799]
7ff7c790a28e    mysqld.exe!btr_page_get_father()[btr0btr.cc:854]
7ff7c7906cca    mysqld.exe!btr_compress()[btr0btr.cc:3577]
7ff7c7913a3e    mysqld.exe!btr_cur_compress_if_useful()[btr0cur.cc:5068]
7ff7c7917f7c    mysqld.exe!btr_cur_pessimistic_delete()[btr0cur.cc:5403]
7ff7c79583f9    mysqld.exe!ibuf_delete_rec()[ibuf0ibuf.cc:4385]
7ff7c795805e    mysqld.exe!ibuf_delete_for_discarded_space()[ibuf0ibuf.cc:4833]
7ff7c78d51a8    mysqld.exe!fil_recreate_tablespace()[fil0fil.cc:2265]
7ff7c794fe06    mysqld.exe!truncate_t::fixup_tables_in_non_system_tablespace()[row0trunc.cc:2297]
7ff7c786563a    mysqld.exe!innobase_start_or_create_for_mysql()[srv0start.cc:2346]
7ff7c77cea40    mysqld.exe!innobase_init()[ha_innodb.cc:4077]
7ff7c705783e    mysqld.exe!ha_initialize_handlerton()[handler.cc:840]
7ff7c7157228    mysqld.exe!plugin_initialize()[sql_plugin.cc:1229]
7ff7c7158790    mysqld.exe!plugin_register_builtin_and_init_core_se()[sql_plugin.cc:1589]
7ff7c7012de3    mysqld.exe!init_server_components()[mysqld.cc:4080]
7ff7c7017da5    mysqld.exe!win_main()[mysqld.cc:4773]
7ff7c7015705    mysqld.exe!mysql_service()[mysqld.cc:5226]
7fffaf2d4da7    MSVCR120.dll!_beginthread()
7fffaf2d4e60    MSVCR120.dll!_endthread()
7fffd0f784d4    KERNEL32.DLL!BaseThreadInitThunk()
7fffd3b91791    ntdll.dll!RtlUserThreadStart()
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

基于这种两种情况:
2026-06-04T13:35:46.448525Z 0 [ERROR] InnoDB: Corruption of an index tree: table `innodb_change_buffer` index `CLUST_IND`, father ptr page no 111415, child page no 517749和InnoDB: Completing truncate for table with id (8714) residing in file-per-table tablespace with id (6005)异常形成了相互死循环,无法直接强制拉库.
这个库有1.5T如果通过工具提取效率有点低
data


对于这样的情况,使用ibd的discard+import功能进行处理,参考相关文章:
frm和ibd文件数据库恢复
运气不错,这个客户的所有库通过这种方法导入ibd文件(部分temp结尾的临时表数据可以不用恢复,占据空间较大)之后,然后通过mysqldump顺利导出所有数据,完成本次恢复任务
sql

WARNING: detected duplicate paths to the same disk导致crs无法正常启动故障解决

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:WARNING: detected duplicate paths to the same disk导致crs无法正常启动故障解决

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最近一周之内,两个客户由于错误修改asm_diskstring=’/dev/dm*’,’/dev/mapper/*’,导致后续集群无法启动成功,查看asm的alert日志发现类似错误(为了说明类似问题,实际中情况比现在复杂)

WARNING: detected duplicate paths to the same disk:
  '/dev/mapper/mpathi' and
  '/dev/dm-3'
ERROR: diskgroup CRS was not mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup "CRS" cannot be mounted
ORA-15040: diskgroup is incomplete

由于asm的vote磁盘组无法mount,从而使得crs无法正常启动,进而导致集群无法正常启动和工作.而在系统中mpathi和dm-3是同一块磁盘

[root@rac1 mapper]# ls -ltr mpathi
lrwxrwxrwx 1 root root       7 May 24 13:09 mpathi -> ../dm-3

GPnP profile中关于asm_diskstring配置

[grid@rac1 trace]$  gpnptool get -o-
……
<orcl:ASM-Profile id="asm" DiscoveryString="/dev/dm*,/dev/mapper/mpath*" 
SPFile="+DATA/cimp-cluster/asmparameterfile/registry.253.889969697"/>
……

对于这样的情况,需要把profile中值修改了才行,通过重建asm spfile来实现

---pfile
large_pool_size= 12M
instance_type= 'asm'
remote_login_passwordfile= 'EXCLUSIVE'
asm_diskstring= '/dev/mapper/*'
asm_power_limit= 1
diagnostic_dest= '/opt/app/grid'

创建spfile(create spfile=’+CRS’ from pfile=’/tmp/pfile’)

NOTE: Attempting voting file refresh on diskgroup CRS
NOTE: Refresh completed on diskgroup CRS. No voting file found.
NOTE: Voting file relocation is required in diskgroup CRS
NOTE: Attempting voting file relocation on diskgroup CRS
NOTE: voting file deletion on grp 1 disk CRS_0000
NOTE: voting file deletion on grp 1 disk CRS_0001
NOTE: voting file deletion on grp 1 disk CRS_0002
NOTE: No voting file found on diskgroup CRS
Sun May 24 13:23:25 2026
NOTE: updated gpnp profile ASM SPFILE to 
NOTE: updated gpnp profile ASM diskstring: /dev/mapper/*
NOTE: updated gpnp profile ASM diskstring: /dev/mapper/*
NOTE: updated gpnp profile ASM SPFILE to +CRS/cluster/asmparameterfile/registry.253.1234099407

主要注意在create spfile的过程中voting file的信息被删除,使用kfed进行验证

[root@rac1 dbs]# kfed read /dev/asm_mpathi|grep vf
kfdhdb.vfstart:                       0 ; 0x0ec: 0x00000000
kfdhdb.vfend:                         0 ; 0x0f0: 0x00000000
[root@rac1 dbs]# kfed read /dev/asm_mpatha|grep vf
kfdhdb.vfstart:                       0 ; 0x0ec: 0x00000000
kfdhdb.vfend:                         0 ; 0x0f0: 0x00000000
[root@rac1 dbs]# kfed read /dev/asm_mpathd|grep vf
kfdhdb.vfstart:                       0 ; 0x0ec: 0x00000000
kfdhdb.vfend:                         0 ; 0x0f0: 0x00000000

需要执行replace votedisk

[root@rac1 dbs]# crsctl replace votedisk +CRS
Successful addition of voting disk 99df870869c64f78bf944b40bfff8644.
Successful addition of voting disk e6db086a74d64f49bf0aff02944bcac5.
Successful addition of voting disk e88a412fe04b4fe6bf1ca067bca779b8.
Successful deletion of voting disk 615579e778684f54bf31c0ce83709f37.
Successful deletion of voting disk 9e48b89f95084fefbfb7648897508684.
Successful deletion of voting disk 274b913b3b874f7cbfb6b025028b4eaa.
Successfully replaced voting disk group with +CRS.
CRS-4266: Voting file(s) successfully replaced

这样操作完成之后,再次查看GPnP profile中关于asm_diskstring配置

[grid@rac1 trace]$  gpnptool get -o-
……
<orcl:ASM-Profile id="asm" DiscoveryString="/dev/asm_*" 
SPFile="+DATA/cimp-cluster/asmparameterfile/registry.253.1234099407"/>
……

然后再次重启集群,一切恢复正常,解决了asm_diskstring=’/dev/dm*’,’/dev/mapper/*’误操作引起的WARNING: detected duplicate paths to the same disk:故障