硬件故障导致ORA-01242 ORA-01122等错误

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:硬件故障导致ORA-01242 ORA-01122等错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户多个节点rac,早上反馈说有两个节点实例异常,需要分析原因,查看其中一个节点的数据库alert日志,发现是由于访问1399号文件异常报ORA-01242 ORA-01122等错误,导致实例crash

Mon Aug 19 20:48:02 2024
Read of datafile '+DATA/xifenfei_01-157.dbf' (fno 1399) header failed with ORA-01207
Rereading datafile 1399 header failed with ORA-01207
Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_ckpt_75582.trc:
ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf'
ORA-01207: file is more recent than control file - old control file
Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_ckpt_75582.trc:
ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf'
ORA-01207: file is more recent than control file - old control file
CKPT (ospid: 75582): terminating the instance due to error 1242
Mon Aug 19 20:48:02 2024
System state dump requested by (instance=6, osid=75582 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_diag_75520.trc
Termination issued to instance processes. Waiting for the processes to exit
Mon Aug 19 20:48:13 2024
ORA-1092 : opitsk aborting process

继续分析日志发现集群尝试拉起该实例,遭遇ORA-01186,ORA-01122无法启动成功

ALTER DATABASE OPEN /* db agent *//* {0:6:39} */
Mon Aug 19 20:49:34 2024
SUCCESS: diskgroup DATA was mounted
Mon Aug 19 20:49:34 2024
NOTE: dependency between database xff and diskgroup resource ora.DATA.dg is established
Mon Aug 19 20:50:41 2024
Picked broadcast on commit scheme to generate SCNs
Mon Aug 19 20:50:42 2024
Read of datafile '+DATA/xifenfei_01-157.dbf' (fno 1399) header failed with ORA-01207
Rereading datafile 1399 header failed with ORA-01207
Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_dbw0_29208.trc:
ORA-01186: file 1399 failed verification tests
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf'
ORA-01207: file is more recent than control file - old control file
File 1399 not verified due to error ORA-01122

这个错误是数据库文件访问异常导致,根据经验,出现这种问题一般是由于底层异常导致,查看系统messages日志,发现有硬件磁盘报错

Aug 19 20:41:58 xff6 fcoemon: FC_HOST_EVENT 6894 at 1724071318 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:41:58 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:41:58 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:41:58 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:41:58 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:03 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:42:03 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:03 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:42:03 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:03 xff6 fcoemon: FC_HOST_EVENT 6895 at 1724071323 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:07 xff6 fcoemon: FC_HOST_EVENT 6896 at 1724071327 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:07 xff6 kernel: sd 1:0:0:44: [sdat]  
Aug 19 20:42:07 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:07 xff6 kernel: sd 1:0:0:44: [sdat]  
Aug 19 20:42:07 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:12 xff6 fcoemon: FC_HOST_EVENT 6897 at 1724071332 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:12 xff6 kernel: sd 1:0:0:44: [sdat]  
Aug 19 20:42:12 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:12 xff6 kernel: sd 1:0:0:44: [sdat]  
Aug 19 20:42:12 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:25 xff6 fcoemon: FC_HOST_EVENT 6898 at 1724071345 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:25 xff6 kernel: sd 1:0:0:42: [sdar]  
Aug 19 20:42:25 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:25 xff6 kernel: sd 1:0:0:42: [sdar]  
Aug 19 20:42:25 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6899 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:42: [sdar]  
Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:42: [sdar]  
Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0xd0 ASCQ=0x6ASC=0xd0 ASCQ=0x6
Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6900 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0xd0 ASCQ=0x6ASC=0xd0 ASCQ=0x6
Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6901 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:53 xff6 fcoemon: FC_HOST_EVENT 6902 at 1724071373 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:53 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:53 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:53 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:53 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
Aug 19 20:43:03 xff6 kernel: sd 1:0:0:40: [sdap]  
Aug 19 20:43:03 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:43:03 xff6 kernel: sd 1:0:0:40: [sdap]  
Aug 19 20:43:03 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6903 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6904 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6905 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:43:03 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:43:03 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:43:03 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:43:03 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
Aug 19 20:49:26 xff6 kernel: scsi_verify_blk_ioctl: 683 callbacks suppressed

客户进一步分析是由于昨天存储坏了一块盘,然后热备盘顶上了,但是不知道什么原因出现了文件访问异常,可能和当时的rebuild过程有关系.由于客户是rac环境,还有部分剩余节点运行正常,对于异常节点直接启动库成功
20240820-182825


节点写入数据报ORA-01187: cannot read from file because it failed verification tests错误
ora-01187

在所有节点通过执行ALTER SYSTEM CHECK DATAFILES,然后所有节点操作正常
check_datafile