在线mv方式迁移数据文件导致数据库无法正常启动

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:在线mv方式迁移数据文件导致数据库无法正常启动

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户在数据库没有关闭的情况下,直接操作系统层面mv方式把数据文件从一个分区迁移到另外一个分区,再创建ln -s(软连接)的方式实现数据文件不修改路径的方式数据文件迁移,结果数据库重启之后,库无法正常启动,报ORA-01172 ORA-01151错误

Fri Dec 29 09:49:19 2023
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 11 processes
Started redo scan
Completed redo scan
 read 11591 KB redo, 1566 data blocks need recovery
Started redo application at
 Thread 1: logseq 6320, block 479571
Recovery of Online Redo Log: Thread 1 Group 4 Seq 6320 Reading mem 0
  Mem# 0: /data/oracle/oradata/orcl/redo04.log
Fri Dec 29 09:49:19 2023
Hex dump of (file 6, block 3598593) in trace file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_p002_6696.trc
Fri Dec 29 09:49:19 2023
Fri Dec 29 09:49:19 2023
Hex dump of (file 5, block 27832) in trace file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_p008_6708.trc
Hex dump of (file 6, block 3598208) in trace file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_p010_6712.trc
Reading datafile '/oadate/xff/xff_01.dbf' for corruption at rdba: 0x01b6e901 (file 6, block 3598593)
Reading datafile '/oadate/xff/xff.dbf' for corruption at rdba: 0x01406cb8 (file 5, block 27832)
Reread (file 6, block 3598593) found same corrupt data (logically corrupt)
Reading datafile '/oadate/xff/xff_01.dbf' for corruption at rdba: 0x01b6e780 (file 6, block 3598208)
Reread (file 5, block 27832) found same corrupt data (logically corrupt)
    RECOVERY OF THREAD 1 STUCK AT BLOCK 3598593 OF FILE 6

Reread (file 6, block 3598208) found same corrupt data (logically corrupt)
RECOVERY OF THREAD 1 STUCK AT BLOCK 3598208 OF FILE 6RECOVERY OF THREAD 1 STUCK AT BLOCK 27832 OF FILE 5

Fri Dec 29 09:49:32 2023
Slave exiting with ORA-1172 exception
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_p010_6712.trc:
ORA-01172: recovery of thread 1 stuck at block 3598208 of file 6
ORA-01151: use media recovery to recover block, restore backup if needed
Fri Dec 29 09:49:32 2023
Fri Dec 29 09:49:32 2023
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_p008_6708.trc:
ORA-10388: parallel query server interrupt (failure)
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_p002_6696.trc:
ORA-10388: parallel query server interrupt (failure)
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_p008_6708.trc:
ORA-10388: parallel query server interrupt (failure)
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_p002_6696.trc:
ORA-10388: parallel query server interrupt (failure)
Fri Dec 29 09:49:32 2023
Aborting crash recovery due to slave death, attempting serial crash recovery
Beginning crash recovery of 1 threads
Started redo scan
Completed redo scan
 read 11591 KB redo, 1566 data blocks need recovery
Started redo application at
 Thread 1: logseq 6320, block 479571
Recovery of Online Redo Log: Thread 1 Group 4 Seq 6320 Reading mem 0
  Mem# 0: /data/oracle/oradata/orcl/redo04.log
Hex dump of (file 6, block 3598593) in trace file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_6690.trc
Reading datafile '/oadate/xff/xff_01.dbf' for corruption at rdba: 0x01b6e901 (file 6, block 3598593)
Reread (file 6, block 3598593) found same corrupt data (logically corrupt)
RECOVERY OF THREAD 1 STUCK AT BLOCK 3598593 OF FILE 6
Aborting crash recovery due to error 1172
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_6690.trc:
ORA-01172: recovery of thread 1 stuck at block 3598593 of file 6
ORA-01151: use media recovery to recover block, restore backup if needed
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_6690.trc:
ORA-01172: recovery of thread 1 stuck at block 3598593 of file 6
ORA-01151: use media recovery to recover block, restore backup if needed
ORA-1172 signalled during: ALTER DATABASE OPEN...

sqlplus恢复数据库报错

SQL> recover datafile 6;
ORA-00283: recovery session canceled due to errors
ORA-00600: internal error code, arguments: [3020], [6], [3578240], [28744064],
[], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 6, block# 3578240, file
offset is 3543138304 bytes)
ORA-10564: tablespace xff
ORA-01110: data file 6: '/oadate/xff/xff_01.dbf'
ORA-10560: block type 'FIRST LEVEL BITMAP BLOCK'

alert日志报ORA-600 3020错误

Fri Dec 29 17:43:03 2023
ALTER DATABASE RECOVER  datafile 6  
Media Recovery Start
Serial Media Recovery started
Recovery of Online Redo Log: Thread 1 Group 4 Seq 6320 Reading mem 0
  Mem# 0: /data/oracle/oradata/orcl/redo04.log
Fri Dec 29 17:43:42 2023
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_30140.trc  (incident=462294):
ORA-00600: internal error code, arguments: [3020], [6], [3578240], [28744064], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 6, block# 3578240, file offset is 3543138304 bytes)
ORA-10564: tablespace XFF
ORA-01110: data file 6: '/oadate/xff/xff_01.dbf'
ORA-10560: block type 'FIRST LEVEL BITMAP BLOCK'
Incident details in: /data/oracle/diag/rdbms/orcl/orcl/incident/incdir_462294/orcl_ora_30140_i462294.trc
Fri Dec 29 17:43:42 2023
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Media Recovery failed with error 600
ORA-283 signalled during: ALTER DATABASE RECOVER  datafile 6  ...

此类故障是由于在线拷贝数据文件,可能有不少最新写入的数据都有数据文件和redo不一致的风险,引起这里的ORA-600 3020最好不要通过allow N corruption的方式跳过,因为可能导致大量数据文件坏块,这样就不光丢失了redo数据,可能数据文件中好的block中的很多数据也丢失.对于这种情况,我们为了减少客户的数据丢失,选择了最少数据丢失的方法:通过bbed修改文件头,然后直接recover 数据文件,open库

Fri Dec 29 18:05:36 2023
ALTER DATABASE RECOVER  datafile 5  
Media Recovery Start
Serial Media Recovery started
Recovery of Online Redo Log: Thread 1 Group 4 Seq 6320 Reading mem 0
  Mem# 0: /data/oracle/oradata/orcl/redo04.log
Media Recovery Complete (orcl)
Completed: ALTER DATABASE RECOVER  datafile 5  
ALTER DATABASE RECOVER  datafile 6  
Media Recovery Start
Serial Media Recovery started
Recovery of Online Redo Log: Thread 1 Group 4 Seq 6320 Reading mem 0
  Mem# 0: /data/oracle/oradata/orcl/redo04.log
Media Recovery Complete (orcl)
Completed: ALTER DATABASE RECOVER  datafile 6  
Fri Dec 29 18:07:02 2023
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 11 processes
Started redo scan
Completed redo scan
 read 11591 KB redo, 0 data blocks need recovery
Started redo application at
 Thread 1: logseq 6320, block 479571
Recovery of Online Redo Log: Thread 1 Group 4 Seq 6320 Reading mem 0
  Mem# 0: /data/oracle/oradata/orcl/redo04.log
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 6320, block 502754, scn 2657849964
 0 data blocks read, 0 data blocks written, 11591 redo k-bytes read
Thread 1 advanced to log sequence 6321 (thread open)
Thread 1 opened at log sequence 6321
  Current log# 5 seq# 6321 mem# 0: /data/oracle/oradata/orcl/redo05.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
SMON: enabling cache recovery
[2656] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:933676634 end:933676704 diff:70 (0 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is AL32UTF8
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Fri Dec 29 18:07:04 2023
QMNC started with pid=31, OS id=2687 
Completed: ALTER DATABASE OPEN

然后逻辑方式迁移数据到新库中,最大程度抢救客户数据

resetlogs失败故障恢复-ORA-01555

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:resetlogs失败故障恢复-ORA-01555

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户数据库resetlogs报错

Tue Dec 19 15:21:23 2023
ALTER DATABASE   MOUNT
Successful mount of redo thread 1, with mount id 1683789043
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT
Tue Dec 19 15:22:01 2023
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
ORA-1248 signalled during: alter database open resetlogs...
Tue Dec 19 16:16:26 2023
alter database datafile 83 offline
Completed: alter database datafile 83 offline
Tue Dec 19 16:19:13 2023
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
Archived Log entry 50 added for thread 1 sequence 3657135 ID 0x5d907698 dest 1:
Tue Dec 19 16:20:01 2023
Errors in file /oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_94696.trc:
ORA-00333: 重做日志读取块 8806400 计数 16384 出错
ORA-00312: 联机日志 2 线程 1: '/data/oradata/orcl/redo2.log'
ORA-27072: 文件 I/O 错误
Linux-x86_64 Error: 25: Inappropriate ioctl for device
Additional information: 4
Additional information: 8806400
Additional information: 4325376
Errors in file /oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_94696.trc:
ORA-00333: 重做日志读取块 8806400 计数 16384 出错
ARCH: All Archive destinations made inactive due to error 333
ARCH: Closing local archive destination LOG_ARCHIVE_DEST_1: '/data/arch/1_3657136_874715183.dbf' (error 333) (orcl)
Committing creation of archivelog '/data/arch/1_3657136_874715183.dbf' (error 333)
Tue Dec 19 16:20:46 2023
Archived Log entry 51 added for thread 1 sequence 3657132 ID 0x5d907698 dest 1:
Tue Dec 19 16:21:28 2023
Archived Log entry 52 added for thread 1 sequence 3657133 ID 0x5d907698 dest 1:
Tue Dec 19 16:22:13 2023
Archived Log entry 53 added for thread 1 sequence 3657134 ID 0x5d907698 dest 1:
RESETLOGS after incomplete recovery UNTIL CHANGE 161052517347
Resetting resetlogs activation ID 1569748632 (0x5d907698)
Tue Dec 19 16:23:43 2023
Setting recovery target incarnation to 3
Tue Dec 19 16:23:43 2023
Assigning activation ID 1683789043 (0x645c94f3)
LGWR: STARTING ARCH PROCESSES
Tue Dec 19 16:23:43 2023
ARC0 started with pid=40, OS id=5391 
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Thread 1 advanced to log sequence 2 (thread open)
Tue Dec 19 16:23:44 2023
ARC1 started with pid=41, OS id=5393 
Tue Dec 19 16:23:44 2023
ARC2 started with pid=42, OS id=5395 
ARC1: Archival started
Tue Dec 19 16:23:44 2023
ARC3 started with pid=43, OS id=5397 
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Thread 1 opened at log sequence 2
  Current log# 2 seq# 2 mem# 0: /data/oradata/orcl/redo2.log
Successful open of redo thread 1
Tue Dec 19 16:23:44 2023
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Dec 19 16:23:44 2023
SMON: enabling cache recovery
Tue Dec 19 16:23:44 2023
NSA2 started with pid=44, OS id=5399 
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0025.7f7d42df):
select ctime, mtime, stime from obj$ where obj# = :1
Errors in file /oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_94696.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 27 (名称为 "_SYSSMU27_4233559991$") 过小
Errors in file /oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_94696.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 27 (名称为 "_SYSSMU27_4233559991$") 过小
Error 704 happened during db open, shutting down database
USER (ospid: 94696): terminating the instance due to error 704
Instance terminated by USER, pid = 94696
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (94696) as a result of ORA-1092

通过以上信息,可以的出来以下结论:
1. 客户的硬件或者文件系统可能有问题,通过系统日志进一步确认底层异常

Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Sense Key : Medium Error [current] 
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Add. Sense: Unrecovered read error
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb] CDB: Read(10): 28 00 47 bc ff c0 00 01 00 00
Dec 19 08:28:38 tdb2 kernel: end_request: critical medium error, dev sdb, sector 1203568576
Dec 19 08:28:38 tdb2 kernel: end_request: critical medium error, dev dm-3, sector 1203568576
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Sense Key : Medium Error [current] 
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Add. Sense: Unrecovered read error
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb] CDB: Read(10): 28 00 47 bd 00 c0 00 01 00 00
Dec 19 08:28:38 tdb2 kernel: end_request: critical medium error, dev sdb, sector 1203568832
Dec 19 08:28:38 tdb2 kernel: end_request: critical medium error, dev dm-3, sector 1203568832
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Sense Key : Medium Error [current] 
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Add. Sense: Unrecovered read error
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb] CDB: Read(10): 28 00 47 bd 00 80 00 00 08 00
Dec 19 08:28:38 tdb2 kernel: end_request: critical medium error, dev sdb, sector 1203568768
Dec 19 08:28:38 tdb2 kernel: end_request: critical medium error, dev dm-3, sector 1203568768
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Sense Key : Medium Error [current] 
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Add. Sense: Unrecovered read error
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb] CDB: Read(10): 28 00 9b 1a 28 20 00 01 00 00
Dec 19 16:20:01 tdb2 kernel: end_request: critical medium error, dev sdb, sector 2602182688
Dec 19 16:20:01 tdb2 kernel: end_request: critical medium error, dev dm-3, sector 2602182688
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Sense Key : Medium Error [current] 
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Add. Sense: Unrecovered read error
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb] CDB: Read(10): 28 00 9b 1a 29 20 00 01 00 00
Dec 19 16:20:01 tdb2 kernel: end_request: critical medium error, dev sdb, sector 2602182944
Dec 19 16:20:01 tdb2 kernel: end_request: critical medium error, dev dm-3, sector 2602182944
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Sense Key : Medium Error [current] 
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Add. Sense: Unrecovered read error
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb] CDB: Read(10): 28 00 9b 1a 29 00 00 00 08 00
Dec 19 16:20:01 tdb2 kernel: end_request: critical medium error, dev sdb, sector 2602182912
Dec 19 16:20:01 tdb2 kernel: end_request: critical medium error, dev dm-3, sector 2602182912

2. 数据库在强制拉库的时候,很可能是屏蔽了一致性,导致文件头scn过小
3. 在resetlogs之前,先offline了83号文件,这个将导致该文件的reseltogs scn和其他文件不一致,通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)
20231229095533



这个库由于客户在resetlogs之前offline了数据文件,导致一些麻烦,先使用Oracle Recovery Tools修改resetlogs scn
20231229100250

然后重建ctl,修改scn,打开数据库
20231229102556

hcheck检测字典一切正常

HCheck Version 07MAY18 on 26-12月-2023 18:44:20
----------------------------------------------
Catalog Version 11.2.0.1.0 (1102000100)
db_name: ORCL
                                   Catalog       Fixed           
Procedure Name                     Version    Vs Release    Timestamp      Result
------------------------------ ... ---------- -- ---------- -------------- ------
.- LobNotInObj                 ... 1102000100 <=  *All Rel* 12/26 18:44:20 
PASS
.- MissingOIDOnObjCol          ... 1102000100 <=  *All Rel* 12/26 18:44:20 
PASS
.- SourceNotInObj              ... 1102000100 <=  *All Rel* 12/26 18:44:20 
PASS
.- IndIndparMismatch           ... 1102000100 <= 1102000100 12/26 18:44:21 
PASS
.- InvCorrAudit                ... 1102000100 <= 1102000100 12/26 18:44:21 
PASS
.- OversizedFiles              ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- PoorDefaultStorage          ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- PoorStorage                 ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- PartSubPartMismatch         ... 1102000100 <= 1102000100 12/26 18:44:21 
PASS
.- TabPartCountMismatch        ... 1102000100 <=  *All Rel* 12/26 18:44:21 

*** 2023-12-26 18:44:21.507
PASS
.- OrphanedTabComPart          ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- MissingSum$                 ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- MissingDir$                 ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- DuplicateDataobj            ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- ObjSynMissing               ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- ObjSeqMissing               ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedUndo                ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedIndex               ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedIndexPartition      ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedIndexSubPartition   ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedTable               ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedTablePartition      ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedTableSubPartition   ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- MissingPartCol              ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedSeg$                ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedIndPartObj#         ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- DuplicateBlockUse           ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- FetUet                      ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- Uet0Check                   ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- ExtentlessSeg               ... 1102000100 <= 1102000100 12/26 18:44:22 
PASS
.- SeglessUET                  ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadInd$                     ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadTab$                     ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadIcolDepCnt               ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- ObjIndDobj                  ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- TrgAfterUpgrade             ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- ObjType0                    ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadOwner                    ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- StmtAuditOnCommit           ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadPublicObjects            ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadSegFreelist              ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadDepends                  ... 1102000100 <=  *All Rel* 12/26 18:44:22 

*** 2023-12-26 18:44:22.571
PASS
.- CheckDual                   ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- ObjectNames                 ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadCboHiLo                  ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- ChkIotTs                    ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- NoSegmentIndex              ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- BadNextObject               ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- DroppedROTS                 ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- FilBlkZero                  ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- DbmsSchemaCopy              ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- OrphanedObjError            ... 1102000100 >  1102000000 12/26 18:44:23 
PASS
.- ObjNotLob                   ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- MaxControlfSeq              ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- SegNotInDeferredStg         ... 1102000100 >  1102000000 12/26 18:44:23 
PASS
.- SystemNotRfile1             ... 1102000100 >   902000000 12/26 18:44:23 

*** 2023-12-26 18:44:23.779
PASS
.- DictOwnNonDefaultSYSTEM     ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- OrphanTrigger               ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- ObjNotTrigger               ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
---------------------------------------
26-12月-2023 18:44:23  Elapsed: 3 secs
---------------------------------------
Found 0 potential problem(s) and 0 warning(s)

然后增加temp,导出数据数据,完成本次数据库救援

ORA-01113 ORA-01110错误不一定都要Oracle Recovery Tools解决

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-01113 ORA-01110错误不一定都要Oracle Recovery Tools解决

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户联系我,说数据库故障经过他们一系列恢复之后,现在open库报ORA-01113 ORA-01110错误,咨询我Oracle Recovery Tools恢复工具是否可以解决该问题
ORA-01113-ORA-01110


alert日志报错

Sat Dec 16 09:10:45 2023
alter database open
Errors in file f:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_8948.trc:
ORA-01113: 文件 1 需要介质恢复
ORA-01110: 数据文件 1: 'F:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF'
ORA-1113 signalled during: alter database open...

其实这个错误不是这个库不能打开的本质,一般遇到这种错误的客户,都是强制拉过库,在拉库的过程中失败,才是导致库不能open的本质原因,比如通过查看相关日志发现拉库的时候报ORA-01555 ORA-00704错

Thu Dec 14 19:05:45 2023
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 13289960075
Resetting resetlogs activation ID 1596978603 (0x5f2ff5ab)
Thu Dec 14 19:05:45 2023
Setting recovery target incarnation to 2
Thu Dec 14 19:05:45 2023
Assigning activation ID 1683369006 (0x64562c2e)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: F:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Dec 14 19:05:45 2023
SMON: enabling cache recovery
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0003.1824b292):
select ctime, mtime, stime from obj$ where obj# = :1
Errors in file f:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_7736.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 4 (名称为 "_SYSSMU4_1451910634$") 过小
Errors in file f:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_7736.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 4 (名称为 "_SYSSMU4_1451910634$") 过小
Error 704 happened during db open, shutting down database
USER (ospid: 7736): terminating the instance due to error 704
Instance terminated by USER, pid = 7736
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (7736) as a result of ORA-1092
Thu Dec 14 19:05:51 2023
ORA-1092 : opitsk aborting process

比如还有客户咨询也是ORA-01113 ORA-01110错误,希望通过Oracle Recovery Tools工具来解决该问题,通过咨询客户确认他们其实是在前期恢复中报ORA-600 2662错误
ORA-600-2662


对于一般通过强制拉库启动过程中报ORA-600错误,后面恢复中报ORA-01113 ORA-01110错误无法正常open的库,一般不用Oracle Recovery Tools工具来解决,通过一些恢复技巧就可以解决该问题.如果无法自行解决,可以联系我们进行技术支持,最大限度抢救和数据,减少损失
电话/微信:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

ssd trim导致fdisk格式化磁盘之后无法恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ssd trim导致fdisk格式化磁盘之后无法恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

8个节点的rac,有一个节点有点问题,重装系统的时候,没有选择任何磁盘
20231221191215


结果发现该机器上的所有磁盘都被分区并且加入到centos的vg中了,导致另外7个节点的asm disk的所有磁盘全部异常,磁盘组无法正常mount,在以前正常的rac节点上pvs扫描发现如下结果
20231221191551

对原asm disk(现在被分区加入到vg中的磁盘)进行分析发现所有的ssd磁盘被的以前记录被置空,机械磁盘的数据正常,机械磁盘kfed看到的记录

[root@xxxrac5 oracle]#  kfed read /dev/mapper/hdisk60 aun=2 aus=4096k
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                    2048 ; 0x004: blk=2048
kfbh.block.obj:              2147483682 ; 0x008: disk=34
kfbh.check:                  2349679586 ; 0x00c: 0x8c0d43e2
kfbh.fcn.base:                   998219 ; 0x010: 0x000f3b4b
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdatb.aunum:                    916608 ; 0x000: 0x000dfc80
………………
kfdate[446].free.hi:                  0 ; 0xe1c: S=0 V=0 L=0 ASZM=0x0 S=0
kfdate[447].discriminator:            0 ; 0xe20: 0x00000000
kfdate[447].free.lo.next:             0 ; 0xe20: 0x0000
kfdate[447].free.lo.prev:             0 ; 0xe22: 0x0000
kfdate[447].free.hi:                  0 ; 0xe24: S=0 V=0 L=0 ASZM=0x0 S=0

分析ssd磁盘的记录情况,发现磁盘中记录全部被置空

[root@xxxrac5 oracle]# kfed read /dev/mapper/ssdisk30 aun=2 aus=4096k
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
000800000 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

[root@xxxrac5 oracle]# kfed read /dev/mapper/ssdisk30 aun=11 aus=4096k
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
002C00000 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

[root@xxxrac5 oracle]# kfed read /dev/mapper/ssdisk30 aun=100
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
006400000 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

[root@xxxrac5 oracle]# kfed read /dev/mapper/ssdisk30 aun=1000
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
03E800000 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

[root@xxxrac5 oracle]# kfed read /dev/mapper/ssdisk30 aun=10000
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
271000000 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

[root@xxxrac5 oracle]# kfed read /dev/mapper/ssdisk30 aun=100000
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
186A000000 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

[root@xxxrac5 oracle]# kfed read /dev/mapper/ssdisk30 aun=100000
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
186A000000 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

dd ssd磁盘100M数据压缩前后大小比较(压缩之后只有150kb左右,除了分区和lv信息之外其他全部是0)
100M
20231221193859


通过linux命令查看磁盘是否是为ssd(cat /sys/block/sdfr/queue/rotational为1表示机械磁盘,0表示ssd磁盘)

hdisk66 (360060e8007c550000030c55000000223) dm-66 HITACHI ,OPEN-V          
size=2.0T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 10:0:1:4  sdfr               130:208 active ready running
  |- 7:0:0:4   sds                65:32   active ready running
  |- 8:0:1:4   sdu                65:64   active ready running
  `- 9:0:1:4   sdx                65:112  active ready running
ssdisk58 (26e8d69b6480c854f6c9ce90091cfe58e) dm-22 Nimble  ,Server          
size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 7:0:1:48  sdfu               131:0   active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 10:0:0:48 sdek               128:192 active ghost running
ssdisk43 (2e0e493e08cfd75996c9ce90091cfe58e) dm-8 Nimble  ,Server          
size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 7:0:1:35  sdev               129:112 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 10:0:0:35 sddj               71:16   active ghost running
ssdisk60 (2696bc3246f81fb116c9ce90091cfe58e) dm-34 Nimble  ,Server          
size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 7:0:1:59  sdgq               132:96  active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 10:0:0:59 sdfg               130:32  active ghost running
ssdisk08 (2629f870548db01a76c9ce90091cfe58e) dm-35 Nimble  ,Server          
size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 7:0:1:6   sdcq               69:224  active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 10:0:0:6  sdz                65:144  active ghost running
[root@odsrac5 oracle]# cat /sys/block/sdcq/queue/rotational
0
[root@odsrac5 oracle]# cat /sys/block/sdcu/queue/rotational
0
[root@odsrac5 oracle]# cat /sys/block/sdfr/queue/rotational
1

出现该问题是由于ssd盘的trim操作导致,相关知识补充:TRIM(SATA), Deallocate(NVMe), UNMAP(SCSI)指的是同一类指令,都是为了减少不必要的数据搬移。
原因:
在文件系统中,删除文件并没有真正的删除物理的数据,只是清空了记录表。而此时,对SSD来说,它并不知道文件已经被删除了,只有下次覆写的时候,SSD才能发现之前被删除的文件对应的page是无效的,从而启动GC。然而,如果在此之前发生了GC等数据搬移动作,无效的page仍然会被当做是有效的。重新磁盘分区,重建文件系统也可能触发类似操作
作用:
Trim 只是一个指令,它让操作系统通知 SSD 主控某个页的数据已经‘无效’后,任务就已完成,并没有更多的操作。TRIM 的先进性在于它可以让固态硬盘在进行垃圾回收的时候跳过移动无用数据的过程,从而不再用重新写入这些无用的数据,达到节省时间的目的。

kettle导致MySQL数据丢失恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:kettle导致MySQL数据丢失恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户通过kettle 插入数据,由于配置不当导致原数据丢失,希望能够恢复之前数据(mysql数据库)
20231217125043


通过分析(相关文件的时间),判断kettle应该是在插入数据之前触发了truncate操作导致数据丢失,而且还插入了部分数据
2023121712561220231217125738

通过数据块层面扫描分析,找出来需要恢复表对应的page文件
20231217125854

解析这些page文件恢复出来客户需要数据
20231217130029

遇到此类误操作,最重要的是保护现场,尽可能减少对数据文件所在分区的写入操作,可以实现最大限度数据恢复.

ORA-600 3020/ORA-600 2662故障

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 3020/ORA-600 2662故障

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库重启之后报ORA-600 3020错误

Tue Dec 12 00:37:44 2023
ALTER DATABASE RECOVER    CONTINUE DEFAULT  
Media Recovery Log /data/oracle/flash_recovery_area/ORCL/archivelog/2023_11_30/o1_mf_1_3867_lpj3krsn_.arc
ORA-279 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
ALTER DATABASE RECOVER    CONTINUE DEFAULT  
Media Recovery Log /data/oracle/flash_recovery_area/ORCL/archivelog/2023_12_02/o1_mf_1_3868_lppkmyhf_.arc
Tue Dec 12 00:37:45 2023
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_pr01_57992.trc  (incident=31482):
ORA-00600: internal error code, arguments: [3020], [1], [87806], [4282110], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 1, block# 87806, file offset is 719306752 bytes)
ORA-10564: tablespace SYSTEM
ORA-01110: data file 1: '/data/oracle/oradata/orcl/system01.dbf'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 5843
Incident details in: /data/oracle/diag/rdbms/orcl/orcl/incident/incdir_31482/orcl_pr01_57992_i31482.trc
Tue Dec 12 00:37:45 2023
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_pr08_58007.trc  (incident=31538):
ORA-00600: internal error code, arguments: [3020], [1], [61031], [4255335], [], [], [], [], []
ORA-10567:Redo is inconsistent with data block (file# 1, block# 61031, file offset is 499965952 bytes)
ORA-10564: tablespace SYSTEM
ORA-01110: data file 1: '/data/oracle/oradata/orcl/system01.dbf'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 39
Incident details in: /data/oracle/diag/rdbms/orcl/orcl/incident/incdir_31538/orcl_pr08_58007_i31538.trc

客户不太了解上述故障,直接使用隐含参数,强制拉库报ORA-600 2662错误

Tue Dec 12 00:45:11 2023
SMON: enabling cache recovery
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_58731.trc  (incident=32668):
ORA-00600: internal error code, arguments: [2662], [0], [120784398], [0], [121441601], [12583040]
Incident details in: /data/oracle/diag/rdbms/orcl/orcl/incident/incdir_32668/orcl_ora_58731_i32668.trc
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_58731.trc:
ORA-00600: internal error code, arguments: [2662], [0], [120784398], [0], [121441601], [12583040]
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_58731.trc:
ORA-00600: internal error code, arguments: [2662], [0], [120784398], [0], [121441601], [12583040]
Error 600 happened during db open, shutting down database
USER (ospid: 58731): terminating the instance due to error 600
Instance terminated by USER, pid = 58731

接手该故障之后,数据库已经被resetlogs,只能基于该现场进一步恢复,修改数据库scn,open数据库成功,导出数据,完成本次恢复.
20231213102213


MySQL数据库文件丢失恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:MySQL数据库文件丢失恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户服务器重启,mysql相关数据文件丢失
20231206095845


通过底层工具进行分析,无法正确恢复数据库名字,一个个单个ibd文件(而且很多本身是错误的)
20231210224010

对于这种情况,通过mysql block扫描恢复出来page文件
20231210224106

恢复出来客户需要数据
20231210224248
20231207215458

这个客户出现该故障的原因大概率是由于文件系统损坏导致.最终可以比较好的恢复,主要是故障之后第一时间保护了现场,没进行任何的写入覆盖,如果覆盖了神仙也没有办法.如果此类问题无法自行解决或者恢复效果不好,可以联系我们进行技术支持,最大限度抢救和数据,减少损失
电话/微信:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

Patch SCN工具一键恢复ORA-600 kcbzib_kcrsds_1

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Patch SCN工具一键恢复ORA-600 kcbzib_kcrsds_1

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一个19c库由于某种原因redo损坏强制打开库报ORA-600 kcbzib_kcrsds_1错误

SQL> startup mount pfile='?/database/pfile.txt';
ORACLE instance started.

Total System Global Area  859830696 bytes
Fixed Size                  9034152 bytes
Variable Size             566231040 bytes
Database Buffers          276824064 bytes
Redo Buffers                7741440 bytes
Database mounted.
SQL>
SQL>
SQL> alter database open resetlogs ;
alter database open resetlogs
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [],
[], [], [], [], [], [], []
Process ID: 7236
Session ID: 252 Serial number: 14807

alert日志报错

2023-12-07T21:27:40.176923+08:00
alter database open resetlogs upgrade
2023-12-07T21:27:40.223341+08:00
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 70591877 time 
.... (PID:7236): Clearing online redo logfile 1 C:\USERS\ADMINISTRATOR\DESKTOP\OCRL\REDO01.LOG
.... (PID:7236): Clearing online redo logfile 2 C:\USERS\ADMINISTRATOR\DESKTOP\OCRL\REDO02.LOG
.... (PID:7236): Clearing online redo logfile 3 C:\USERS\ADMINISTRATOR\DESKTOP\OCRL\REDO03.LOG
Clearing online log 1 of thread 1 sequence number 0
Clearing online log 2 of thread 1 sequence number 0
Clearing online log 3 of thread 1 sequence number 0
.... (PID:7236): Clearing online redo logfile 1 complete
.... (PID:7236): Clearing online redo logfile 2 complete
.... (PID:7236): Clearing online redo logfile 3 complete
Resetting resetlogs activation ID 1658522235 (0x62db0a7b)
Online log C:\USERS\ADMINISTRATOR\DESKTOP\OCRL\REDO01.LOG: Thread 1 Group 1 was previously cleared
Online log C:\USERS\ADMINISTRATOR\DESKTOP\OCRL\REDO02.LOG: Thread 1 Group 2 was previously cleared
Online log C:\USERS\ADMINISTRATOR\DESKTOP\OCRL\REDO03.LOG: Thread 1 Group 3 was previously cleared
2023-12-07T21:27:40.896383+08:00
Setting recovery target incarnation to 3
2023-12-07T21:27:40.927396+08:00
Ping without log force is disabled:
  instance mounted in exclusive mode.
Endian type of dictionary set to little
2023-12-07T21:27:40.943147+08:00
Assigning activation ID 1682724028 (0x644c54bc)
2023-12-07T21:27:40.943147+08:00
TT00 (PID:7180): Gap Manager starting
2023-12-07T21:27:40.958429+08:00
Redo log for group 1, sequence 1 is not located on DAX storage
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: C:\USERS\ADMINISTRATOR\DESKTOP\OCRL\REDO01.LOG
Successful open of redo thread 1
2023-12-07T21:27:40.958429+08:00
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
stopping change tracking
2023-12-07T21:27:40.990460+08:00
TT03 (PID:1292): Sleep 5 seconds and then try to clear SRLs in 2 time(s)
Errors in file C:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_7236.trc  (incident=12218):
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: C:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_12218\orcl_ora_7236_i12218.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2023-12-07T21:27:41.021109+08:00
2023-12-07T21:27:41.848773+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2023-12-07T21:27:42.082978+08:00
Undo initialization recovery: err:600 start: 3673703 end: 3674796 diff: 1093 ms (1.1 seconds)
2023-12-07T21:27:42.082978+08:00
Errors in file C:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_7236.trc:
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
2023-12-07T21:27:42.082978+08:00
Errors in file C:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_7236.trc:
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
Errors in file C:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_7236.trc  (incident=12219):
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: C:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_12219\orcl_ora_7236_i12219.trc
2023-12-07T21:27:42.881204+08:00
opiodr aborting process unknown ospid (7236) as a result of ORA-603
2023-12-07T21:27:42.881204+08:00
ORA-603 : opitsk aborting process
License high water mark = 4
USER (ospid: (prelim)): terminating the instance due to ORA error 
2023-12-07T21:27:46.052894+08:00
Instance terminated by USER(prelim), pid = 7236

通过patch scn工具快速修改数据库scn
patch_scn


实现数据库顺利打开
20231207213822

数据库打patch建议设置nls_language为英文

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:数据库打patch建议设置nls_language为英文

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

昨晚给客户11.2.0.4打比较新的psu和ojvm patch(主要为了修复安全扫描漏洞)
20231203185616


结果今天早上客户那边反馈应用有报ORA-29548错
ORA-29548

该错误比较明显应该是和java有关系,在本次变更中最大可能就是ojvm有关系,查看ojvm的postinstall.sql执行日志,发现如下问题

 34  -- Check the validity of JAVAVM and let the registry be updated accordingly.
 35  
 36      initjvmaux.validate_javavm;
 37  
 38  -- Add a row in registry$history to indicate this script was run.
 39  
 40      EXECUTE IMMEDIATE 'insert into registry$history
 41                         (action_time, action, namespace, version, id, comments)
 42                         values(SYSTIMESTAMP, ''jvmpsu.sql'', ''SERVER'',
 43                                ''11.2.0.4.221018OJVMPSU'', 0, ''RAN jvmpsu.sql'')';
 44  
 45    END IF;
 46  
 47    EXECUTE IMMEDIATE 'alter system set java_jit_enabled = ' || :jitstate;
 48  
 49  END;
 50  /
BEGIN
*
第 1 行出现错误:
ORA-01843: 无效的月份
ORA-06512: 在 line 8

查询组件有效性

SQL> select comp_name,version,status from dba_registry order by 1;

COMP_NAME
--------------------------------------------------------------------------------
VERSION                        STATUS
------------------------------ ----------------------
JServer JAVA Virtual Machine
11.2.0.4.0                     INVALID

OLAP Analytic Workspace
11.2.0.4.0                     VALID

OLAP Catalog
11.2.0.4.0                     VALID


COMP_NAME
--------------------------------------------------------------------------------
VERSION                        STATUS
------------------------------ ----------------------
OWB
11.2.0.1.0                     VALID

Oracle Application Express
3.2.1.00.10                    VALID

Oracle Database Catalog Views
11.2.0.4.0                     VALID


COMP_NAME
--------------------------------------------------------------------------------
VERSION                        STATUS
------------------------------ ----------------------
Oracle Database Java Packages
11.2.0.4.0                     VALID

Oracle Database Packages and Types
11.2.0.4.0                     VALID

Oracle Enterprise Manager
11.2.0.4.0                     VALID


COMP_NAME
--------------------------------------------------------------------------------
VERSION                        STATUS
------------------------------ ----------------------
Oracle Expression Filter
11.2.0.4.0                     VALID

Oracle Multimedia
11.2.0.4.0                     VALID

Oracle OLAP API
11.2.0.4.0                     VALID


COMP_NAME
--------------------------------------------------------------------------------
VERSION                        STATUS
------------------------------ ----------------------
Oracle Rules Manager
11.2.0.4.0                     VALID

Oracle Text
11.2.0.4.0                     VALID

Oracle Workspace Manager
11.2.0.4.0                     VALID


COMP_NAME
--------------------------------------------------------------------------------
VERSION                        STATUS
------------------------------ ----------------------
Oracle XDK
11.2.0.4.0                     VALID

Oracle XML Database
11.2.0.4.0                     VALID

Spatial
11.2.0.4.0                     VALID


已选择18行。

确认JServer JAVA Virtual Machine组件无效,很可能是ojvm的postinstall.sql执行失败有关系,而引起这个失败的原因是由于“ORA-01843: 无效的月份”,根据经验,引起这个问题的可能是由于win汉语环境中日期/时间的显示格式导致

SQL> select sysdate from dual;

SYSDATE
--------------
03-12月-23

设置为英文格式显示

SQL> alter session set nls_language='American';

Session altered.

SQL> select sysdate from dual;

SYSDATE
------------
03-DEC-23

重新执行postdeinstall和postinstall脚本之后,组件状态恢复正常,软件功能也恢复正常

SQL> select comp_name,version,status from dba_registry order by 1;

JServer JAVA Virtual Machine
11.2.0.4.0                     VALID

OLAP Analytic Workspace
11.2.0.4.0                     VALID

OLAP Catalog
11.2.0.4.0                     VALID

OWB
11.2.0.1.0                     VALID

Oracle Application Express
3.2.1.00.10                    VALID

Oracle Database Catalog Views
11.2.0.4.0                     VALID

Oracle Database Java Packages
11.2.0.4.0                     VALID

Oracle Database Packages and Types
11.2.0.4.0                     VALID

Oracle Enterprise Manager
11.2.0.4.0                     VALID

Oracle Expression Filter
11.2.0.4.0                     VALID

Oracle Multimedia
11.2.0.4.0                     VALID

Oracle OLAP API
11.2.0.4.0                     VALID

Oracle Rules Manager
11.2.0.4.0                     VALID

Oracle Text
11.2.0.4.0                     VALID

Oracle Workspace Manager
11.2.0.4.0                     VALID

Oracle XDK
11.2.0.4.0                     VALID

Oracle XML Database
11.2.0.4.0                     VALID

Spatial
11.2.0.4.0                     VALID


18 rows selected.

自此提醒由于oracle 环境语言默认显示的问题导致某些patch不能正常打成功,建议在执行数据库patch或者升级之时,把数据库语言环境调整为英文nls_language=’American’,以避免本次出现的不必要的麻烦(以前由于大部分客户没有是使用jvm这个功能组件因此没有暴露该问题)

DBV-00107: Unknown header format 故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:DBV-00107: Unknown header format 故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户linux平台被勒索病毒加密,其中有oracle数据库.客户联系黑客进行解密【勒索解密oracle失败】,但是数据库无法正常启动,dbv检查数据库文件报错

[oracle@hisdb ~]$ dbv file=system01.dbf

DBVERIFY: Release 11.2.0.1.0 - Production on 星期一 11月 27 21:49:17 2023

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.


DBV-00107: 未知标头格式 (31) (287942924)

对应的英文为:DBV-00107: Unknown header format (31) (287942924),检查数据文件信息发现提示为 FILE NOT FOUND,使用脚本为:Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)检测结果
20231127220917


通过分区确认是文件头损坏
20231127220354

修复正确的文件头
20231127220457

再次dbv检查数据文件

[oracle@hisdb ~]$ dbv file=system01.dbf

DBVERIFY: Release 11.2.0.1.0 - Production on 星期一 11月 27 22:05:41 2023

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - 开始验证: FILE = /u01/app/oracle/oradata/system01.dbf
页 12800 标记为损坏
Corrupt block relative dba: 0x00403200 (file 1, block 12800)
Bad header found during dbv:
Data in bad block:
 type: 88 format: 1 rdba: 0x33877808
 last change scn: 0x257a.7b3a44e3 seq: 0xe8 flg: 0xe6
 spare1: 0x4e spare2: 0x73 spare3: 0x0
 consistency value in tail: 0x65251001
 check value in block header: 0xc3b4
 computed block checksum: 0x4ca7



DBVERIFY - 验证完成

检查的页总数: 13440
处理的页总数 (数据): 3297
失败的页总数 (数据): 0
处理的页总数 (索引): 2097
失败的页总数 (索引): 0
处理的页总数 (其他): 1441
处理的总页数 (段)  : 1
失败的总页数 (段)  : 0
空的页总数: 6604
标记为损坏的总页数: 1
流入的页总数: 0
加密的总页数        : 0
最高块 SCN            : 1667927064 (12.1667927064)

修复其他文件头,并dbv检查,发现均在12800位置损坏.尝试recover database恢复数据库,报ORA-00742 ORA-00312之类错误.【由于redo损坏,报出来数据文件更多坏块】

Sat Nov 25 17:03:39 2023
ALTER DATABASE RECOVER  database  
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 40 slaves
Sat Nov 25 17:03:40 2023
Recovery of Online Redo Log: Thread 1 Group 7 Seq 27220 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/redo07.log
Sat Nov 25 17:03:41 2023
Hex dump of (file 3, block 7) in trace file /u01/app/oracle/diag/rdbms/his/his/trace/his_pr0l_52669.trc
Corrupt block relative dba: 0x00c00007 (file 3, block 7)
Bad header found during media recovery
Data in bad block:
 type: 124 format: 7 rdba: 0x1698b845
 last change scn: 0x4fa1.3eaa638f seq: 0x6 flg: 0x24
 spare1: 0x26 spare2: 0x42 spare3: 0x0
 consistency value in tail: 0xa39e1e01
 check value in block header: 0x2ca4
 computed block checksum: 0x3b25
Reading datafile '/u01/app/oracle/oradata/undotbs01.dbf' for corruption at rdba: 0x00c00007 (file 3, block 7)
Reread (file 3, block 7) found same corrupt data (no logical check)
Sat Nov 25 17:03:41 2023
Hex dump of (file 46, block 3) in trace file /u01/app/oracle/diag/rdbms/his/his/trace/his_pr0w_52691.trc
Corrupt block relative dba: 0x0b800003 (file 46, block 3)
Bad header found during media recovery
Data in bad block:
 type: 7 format: 7 rdba: 0x77922022
 last change scn: 0xdff3.c40df5b6 seq: 0x6f flg: 0xe5
 spare1: 0xcd spare2: 0x6d spare3: 0x83d7
 consistency value in tail: 0x63c63d2c
 check value in block header: 0xf662
 computed block checksum: 0xec49
Data in bad block:
 type: 135 format: 4 rdba: 0x45ad2864
 last change scn: 0x9d7e.34949c73 seq: 0x32 flg: 0x3e
 spare1: 0x89 spare2: 0x0 spare3: 0x9f9f
 consistency value in tail: 0xa5807800
 check value in block header: 0xb2c9
 computed block checksum: 0x3aea
Reread (file 5, block 11259) found same corrupt data (no logical check)
 type: 214 format: 1 rdba: 0x0228dbe9
Bad header found during media recovery
 last change scn: 0xed57.ca4f7559 seq: 0x9b flg: 0x4a
Data in bad block:
 spare1: 0x97 spare2: 0x77 spare3: 0x2bab
 type: 33 format: 6 rdba: 0x018d584a
 consistency value in tail: 0x359f90d6
 last change scn: 0xaeb8.2fa361eb seq: 0x60 flg: 0x92
 check value in block header: 0x6b26
 spare1: 0xea spare2: 0xe spare3: 0xb405 block checksum disabled
Reread (file 3, block 4) found same corrupt data (no logical check)
Corrupt block relative dba: 0x0b800e61 (file 46, block 3681)
Bad header found during media recovery
Data in bad block:
 type: 131 format: 6 rdba: 0xc7edd0fc
 last change scn: 0xd319.d0e54941 seq: 0x6f flg: 0x6d
 spare1: 0xe7 spare2: 0x82 spare3: 0x439f
 consistency value in tail: 0x18dc47b6
 check value in block header: 0xe9c8
 computed block checksum: 0x204d
Reread (file 46, block 3681) found same corrupt data (no logical check)
Hex dump of (file 1, block 2017) in trace file /u01/app/oracle/diag/rdbms/his/his/trace/his_pr10_52699.trc
Corrupt block relative dba: 0x004007e1 (file 1, block 2017)
Bad header found during media recovery
Data in bad block:
 type: 159 format: 2 rdba: 0x52c5b2b0
 last change scn: 0x2ed8.e0bc5af9 seq: 0x62 flg: 0xe9
 spare1: 0x81 spare2: 0x1e spare3: 0xda98
 consistency value in tail: 0xc5753dd3
 check value in block header: 0x2bba
 block checksum disabled
Reading datafile '/u01/app/oracle/oradata/system01.dbf' for corruption at rdba: 0x004007e1 (file 1, block 2017)
Reread (file 1, block 2017) found same corrupt data (no logical check)
Media Recovery failed with error 742
Errors in file /u01/app/oracle/diag/rdbms/his/his/trace/his_pr00_52622.trc:
ORA-00283: recovery session canceled due to errors
ORA-00742: Log read detects lost write in thread %d sequence %d block %d
ORA-00312: online log 7 thread 1: '/u01/app/oracle/oradata/redo07.log'
ORA-10877 signalled during: ALTER DATABASE RECOVER  database  ...

尝试强制打开数据库报ORA-600 krsi_al_hdr_update.15,参考:Oracle断电故障处理中有类似报错

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [krsi_al_hdr_update.15],
[4294967295], [], [], [], [], [], [], [], [], [], []

由于redo问题无法resetlogs成功,解决异常redo,再次尝试open库,由于undo坏块无法open成功,报ORA-01092 ORA-01578等错误

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-01578: ORACLE data block corrupted (file # 3, block # 1848)
ORA-01110: data file 3: '/u01/app/oracle/oradata/undotbs01.dbf'
Process ID: 55655
Session ID: 2623 Serial number: 5

解决undo异常,数据库open成功.导出客户需要数据,完成此次恢复工作