ORA-07445: exception encountered: core dump [kdxlin()+4088]处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-07445: exception encountered: core dump [kdxlin()+4088]处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

abort方式关闭数据库,启动报错

Tue Sep 19 21:52:56 2023
NOTE: dependency between database orcl and diskgroup resource ora.DATA.dg is established
Tue Sep 19 21:52:57 2023
Reconfiguration started (old inc 4, new inc 6)
List of instances:
 1 (myinst: 1) 
 Global Resource Directory frozen
 * dead instance detected - domain 0 invalid = TRUE 
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
Tue Sep 19 21:52:57 2023
Tue Sep 19 21:52:57 2023
 LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Sep 19 21:52:57 2023
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Sep 19 21:52:57 2023
 LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info 
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Post SMON to start 1st pass IR
 Submitted all GCS remote-cache requests
 Post SMON to start 1st pass IR
 Fix write in gcs resources
Reconfiguration complete
Tue Sep 19 21:53:05 2023
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_28917.trc  (incident=492333):
ORA-00600: internal error code, arguments: [2131], [33], [32], [], [], [], [], [], [], [], [], []
Incident details in:/u01/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_492333/orcl1_ora_28917_i492333.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORA-600 signalled during: ALTER DATABASE MOUNT /* db agent *//* {1:34652:2} */...

重建ctl之后,尝试recover数据库报错ORA-600 3020和ORA-07445 kdxlin等错误

SQL> recover database;
ORA-00600: internal error code, arguments: [3020], [41], [3142531],
[175108995], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 41, block# 3142531, file
offset is 4268777472 bytes)
ORA-10564: tablespace XIFENFEI
ORA-01110: data file 41: '+DATA/orcl/datafile/xifenfei07.dbf'
ORA-10560: block type 'FIRST LEVEL BITMAP BLOCK'
Wed Sep 20 00:15:00 2023
ALTER DATABASE RECOVER  database  
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 64 slaves
Wed Sep 20 00:15:02 2023
Recovery of Online Redo Log: Thread 2 Group 6 Seq 67008 Reading mem 0
  Mem# 0: +DATA/orcl/onlinelog/group_6.268.942097791
Recovery of Online Redo Log: Thread 1 Group 2 Seq 81767 Reading mem 0
  Mem# 0: +DATA/orcl/onlinelog/group_2.262.942097651
Recovery of Online Redo Log: Thread 1 Group 5 Seq 81768 Reading mem 0
  Mem# 0: +DATA/orcl/onlinelog/group_5.263.942097651
Wed Sep 20 00:15:08 2023
Hex dump of (file 41, block 3142531) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_pr1m_45463.trc
Reading datafile '+DATA/orcl/datafile/ts_his3bz07.dbf' for corruption at rdba: 0x0a6ff383 (file 41, block 3142531)
Reread (file 41, block 3142531) found different corrupt data (logically corrupt)
Hex dump of (file 41, block 3142531) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_pr1m_45463.trc
Wed Sep 20 00:15:08 2023
Exception [type: SIGSEGV, Address not mapped to object][ADDR:0xC] [PC:0x95FB582, kdxlin()+4088][flags: 0x0,count:1]
Wed Sep 20 00:15:08 2023
Exception [type: SIGSEGV, Address not mapped to object][ADDR:0xC] [PC:0x95FB582, kdxlin()+4088][flags: 0x0,count:1]
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_pr10_45419.trc  (incident=564584):
ORA-07445: exception encountered:core dump [kdxlin()+4088][SIGSEGV][ADDR:0xC][PC:0x95FB582][Address not mapped to object]
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_564640/orcl1_pr17_45433_i564640.trc

尝试随机恢复文件,也遭遇ORA-07445 kdxlin异常

SQL> recover datafile 34;
ORA-00283: recovery session canceled due to errors
ORA-10562: Error occurred while applying redo to data block (file# 34, block#
1999809)
ORA-10564: tablespace XIFENFEI
ORA-01110: data file 34: '+DATA/orcl/datafile/xifeifenfei06'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 97961
ORA-00607: Internal error occurred while making a change to a data block
ORA-00602: internal programming exception
ORA-07445: exception encountered: core dump [kdxlin()+4088] [SIGSEGV]
[ADDR:0xC] [PC:0x95FB582] [Address not mapped to object] []

出现这种情况是由于redo和数据文件块不一致导致无法正常应用日志,人工对于异常的block进行处理,数据库open成功,然后遭遇undo回滚段异常,对其进行规避,数据库open并且稳定运行

bbed解决ORA-01578

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:bbed解决ORA-01578

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

业务报ORA-01578坏块,无法正常使用,alert日志报错如下

Reading datafile '/data/u01/ZLDOCXML01.DBF' for corruption at rdba: 0x02efdc97 (file 11, block 3136663)
Reread (file 11, block 3136663) found same corrupt data
Wed Sep 13 19:02:04 2023
Corrupt Block Found
         TSN = 10, TSNAME = ZLDOCXML
         RFN = 11, BLK = 3136663, RDBA = 49274007
         OBJN = 73646, OBJD = 73646, OBJECT = SYS_LOB0000073645C00029$$, SUBOBJECT =
         SEGMENT OWNER = ZLDOC, SEGMENT TYPE = Lob Segment
DDE: Problem Key 'ORA 1578' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes

通过dbv检查数据文件,发现只有这一个坏块

[oracle@zlemr ~]$ dbv file=/data/u01/ZLDOCXML01.DBF

DBVERIFY: Release 11.2.0.1.0 - Production on Wed Sep 13 17:51:03 2023

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.


DBVERIFY - Verification starting : FILE = /data/u01/ZLDOCXML01.DBF
Page 3136663 is influx - most likely media corrupt
Corrupt block relative dba: 0x02efdc97 (file 11, block 3136663)
Fractured block found during dbv: 
Data in bad block:
 type: 40 format: 2 rdba: 0x02efdc97
 last change scn: 0x0002.1065d622 seq: 0x2 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x00000001
 check value in block header: 0x48cf
 computed block checksum: 0xfe21


DBVERIFY - Verification complete

Total Pages Examined         : 3289600
Total Pages Processed (Data) : 0
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 21037
Total Pages Failing   (Index): 0
Total Pages Processed (Lob)  : 2051270
Total Pages Failing   (Lob)  : 0
Total Pages Processed (Other): 1068900
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 148392
Total Pages Marked Corrupt   : 1
Total Pages Influx           : 1
Total Pages Encrypted        : 0
Highest block SCN            : 278716397 (2.278716397)

通过bbed进入进行修复

[oracle@zlemr ~]$ bbed 
Password: 

BBED: Release 2.0.0.0.0 - Limited Production on Wed Sep 13 19:05:53 2023

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

************* !!! For Oracle Internal Use only !!! ***************

BBED> set filename '/data/u01/ZLDOCXML01.DBF'
        FILENAME        /data/u01/ZLDOCXML01.DBF

BBED> set block 3136663
        BLOCK#          3136663

BBED> verify
DBVERIFY - Verification starting
FILE = /data/u01/ZLDOCXML01.DBF
BLOCK = 3136663

Block 3136663 is corrupt
Corrupt block relative dba: 0x02efdc97 (file 0, block 3136663)
Fractured block found during verification
Data in bad block:
 type: 40 format: 2 rdba: 0x02efdc97
 last change scn: 0x0002.1065d622 seq: 0x2 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x00000001
 check value in block header: 0x48cf
 computed block checksum: 0xfe21


DBVERIFY - Verification complete

Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 1
Total Blocks Influx           : 2
Message 531 not found;  product=RDBMS; facility=BBED


BBED> map
 File: /data/u01/ZLDOCXML01.DBF (0)
 Block: 3136663                                Dba:0x00000000
------------------------------------------------------------
BBED-00400: invalid blocktype (40)


BBED> d
 File: /data/u01/ZLDOCXML01.DBF (0)
 Block: 3136663           Offsets:    0 to  511           Dba:0x00000000
------------------------------------------------------------------------
 28a20000 97dcef02 22d66510 02000204 cf480000 ae1f0100 00000001 00000280 
 93840000 00000000 00000000 00000000 80dcef02 00000000 000d000a 005b4e3b 
 6cbb533b 5e0867e5 623f8bb0 5f55005d 000d000a 4e3b6cbb 533b5e08 67e5623f 
 8bb05f55 00320030 00320033 002d0030 0039002d 00310033 00200030 0039003a 
 00310036 000d000a 4eca65e5 968f9ec4 4e1d51e4 4e3b6cbb 533b5e08 67e5623f 
 002c60a3 80058bc9 65e08179 75dbff0c 65e053d1 70ed3001 754f5bd2 ff0c65e0 
 59346655 30015934 75db3001 773c82b1 ff0c65e0 9f3b585e 30016d41 6d953001 
 54bd75db 300154b3 55fd3001 54b375f0 ff0c65e0 60765fc3 30015455 5410ff0c 
 65e080f8 95f73001 5fc360b8 3001547c 543856f0 96beff0c 65e05c3f 98913001 
 5c3f6025 30015c3f 75dbff0c 65e08179 6cfb7b49 4e0d9002 ff0c7cbe 795e3001 
 98df6b32 30017761 772053ef ff0c5927 5c0f4fbf 6b635e38 300267e5 4f53ff1a 
 751f547d 5f816b63 5e38ff0c 54bd65e0 51458840 ff0c6241 68434f53 65e080bf 
 5927ff0c 53cc4fa7 4e73623f 4e0d80c0 ff0c4e73 6c415206 6ccc591a ff0c672a 
 89e653ca 786c7ed3 ff0c5fc3 80ba672a 89c1660e 663e5f02 5e38ff0c 81798f6f 
 ff0c8179 90e8538b 75dbff0c 65e053cd 8df375db ff0c80a0 9e2397f3 6b635e38 
 30028f85 52a968c0 67e5ff1a 767d5e26 5e3889c4 ff1a767d 7ec680de 0020002b 

 <32 bytes per line>


BBED> set offset 8188
        OFFSET          8188

BBED> d
 File: /data/u01/ZLDOCXML01.DBF (0)
 Block: 3136663           Offsets: 8188 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 01000000 

 <32 bytes per line>

BBED> set count 32
        COUNT           32

BBED> set mode edit
        MODE            Edit

BBED> d
 File: /data/u01/ZLDOCXML01.DBF (0)
 Block: 3136663           Offsets:   14 to   45           Dba:0x00000000
------------------------------------------------------------------------
 0204cf48 0000ae1f 01000000 00010000 02809384 00000000 00000000 00000000 

 <32 bytes per line>

BBED> set offset 8188
        OFFSET          8188

BBED> m /x 022822d6
 File: /data/u01/ZLDOCXML01.DBF (0)
 Block: 3136663           Offsets: 8188 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 022822d6 

 <32 bytes per line>

BBED> sum apply
Check value for File 0, Block 3136663:
current = 0x48cf, required = 0x48cf

BBED> verify
DBVERIFY - Verification starting
FILE = /data/u01/ZLDOCXML01.DBF
BLOCK = 3136663


DBVERIFY - Verification complete

Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0
Message 531 not found;  product=RDBMS; facility=BBED

dbv检测文件正常

[oracle@zlemr ~]$ dbv file=/data/u01/ZLDOCXML01.DBF

DBVERIFY: Release 11.2.0.1.0 - Production on Wed Sep 13 19:17:21 2023

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /data/u01/ZLDOCXML01.DBF


DBVERIFY - Verification complete

Total Pages Examined         : 3289600
Total Pages Processed (Data) : 0
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 21037
Total Pages Failing   (Index): 0
Total Pages Processed (Lob)  : 2051586
Total Pages Failing   (Lob)  : 0
Total Pages Processed (Other): 1069031
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 147946
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 278849105 (2.278849105)

业务测试正常,数据完美恢复

asm disk被加入到另外一个磁盘组故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:asm disk被加入到另外一个磁盘组故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友在aix环境对其中一个rac的asm磁盘组进行扩容
add_disk


之后另外一套rac的磁盘组直接dismount

Wed Aug 23 12:44:02 2023
NOTE: SMON starting instance recovery for group DATA domain 2 (mounted)
NOTE: F1X0 found on disk 0 au 2 fcn 0.128808679
NOTE: SMON skipping disk 7 - no header
NOTE: cache initiating offline of disk 7 group DATA
NOTE: process _smon_+asm1 (1770932) initiating offline of disk 7.3422955792 (DATA_0007) with mask 0x7e in group 2
NOTE: initiating PST update: grp = 2, dsk = 7/0xcc062910, mask = 0x6a, op = clear
Wed Aug 23 12:44:02 2023
GMON updating disk modes for group 2 at 7 for pid 17, osid 1770932
ERROR: Disk 7 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 2)
Wed Aug 23 12:44:02 2023
NOTE: cache dismounting (not clean) group 2/0x7FE6D808 (DATA) 
WARNING: Offline for disk DATA_0007 in mode 0x7f failed.
Wed Aug 23 12:44:02 2023
NOTE: halting all I/Os to diskgroup 2 (DATA)
ERROR: No disks with F1X0 found on disk group DATA
NOTE: aborting instance recovery of domain 2 due to diskgroup dismount
NOTE: SMON skipping lock domain (2) validation because diskgroup being dismounted
Abort recovery for domain 2
Wed Aug 23 12:44:02 2023
ERROR: ORA-15130 in COD recovery for diskgroup 2/0x7fe6d808 (DATA)
ERROR: ORA-15130 thrown in RBAL for group number 2
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_2360526.trc:
ORA-15130: diskgroup "DATA" is being dismounted
[

再次尝试mount该磁盘组,报ORA-15042和ORA-15038错误

SQL> alter diskgroup data mount 
NOTE: cache registered group DATA number=2 incarn=0x79e6d861
NOTE: cache began mount (first) of group DATA number=2 incarn=0x79e6d861
NOTE: Assigning number (2,0) to disk (/dev/rhdisk31)
NOTE: Assigning number (2,3) to disk (/dev/rhdisk33)
NOTE: Assigning number (2,4) to disk (/dev/rhdisk34)
NOTE: Assigning number (2,5) to disk (/dev/rhdisk35)
NOTE: Assigning number (2,6) to disk (/dev/rhdisk36)
NOTE: Assigning number (2,9) to disk (/dev/rhdisk39)
NOTE: Assigning number (2,1) to disk (/dev/rhdisk8)
NOTE: Assigning number (2,2) to disk (/dev/rhdisk9)
Wed Aug 23 12:58:46 2023
NOTE: GMON heartbeating for grp 2
GMON querying group 2 at 11 for pid 27, osid 3736034
NOTE: Assigning number (2,7) to disk ()
NOTE: Assigning number (2,8) to disk ()
GMON querying group 2 at 12 for pid 27, osid 3736034
NOTE: cache dismounting (clean) group 2/0x79E6D861 (DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 3736034, image: oracle@hbbz01 (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 2/0x79E6D861 (DATA) 
NOTE: cache ending mount (fail) of group DATA number=2 incarn=0x79e6d861
NOTE: cache deleting context for group DATA 2/0x79e6d861
GMON dismounting group 2 at 13 for pid 27, osid 3736034
NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0002 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0003 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0004 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0005 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0006 in mode 0x7f marked for de-assignment
NOTE: Disk  in mode 0x7f marked for de-assignment
NOTE: Disk  in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0009 in mode 0x7f marked for de-assignment
ERROR: diskgroup DATA was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "8" is missing from group number "2" 
ORA-15042: ASM disk "7" is missing from group number "2" 
ORA-15038: disk '/dev/rhdisk37' mismatch on 'Time Stamp' with target disk group [2129689239] [2062898314]
ERROR: alter diskgroup data mount

怀疑把报错这个磁盘组的rhdisk37加入到另外一套rac的asm中了(也就是说两套asm使用了同一块磁盘),aix操作系统层面分析确认

---对asm扩容的机器上
# lscfg -vpl hdisk15
  hdisk15          U78C5.001.DQD076A-P2-C4-T1-W200C00A098BC9A83-L0  MPIO NetApp FCP Default PCM Disk

        Manufacturer................NETAPP  
        Machine Type and Model......LUN C-Mode      
        ROS Level and ID............9000
        Serial Number...............80DYz]L/OpCA
        Device Specific.(Z0)........FAS8020         


  PLATFORM SPECIFIC

  Name:  disk
    Node:  disk
    Device Type:  block

---磁盘组dismount的机器上
# lscfg -vpl hdisk37      
  hdisk37          U5802.001.9K87776-P1-C1-T1-W200500A098BC9A83-L0  MPIO NetApp FCP Default PCM Disk

        Manufacturer................NETAPP  
        Machine Type and Model......LUN C-Mode      
        ROS Level and ID............9000
        Serial Number...............80DYz]L/OpCA
        Device Specific.(Z0)........FAS8020         


  PLATFORM SPECIFIC

  Name:  disk
    Node:  disk
    Device Type:  block

通过lscfg 命令确认两套rac使用了同一块盘导致一个磁盘组异常,在新加的机器上查询确认新盘被破坏情况(新加入的磁盘由于reblance操作,已经被写入了380G左右数据[也就意味着这个磁盘在老磁盘组中最少会丢失380G数据]
20230905140603


对于这种情况,dismount磁盘组是外部冗余不可能直接mount起来,只能通过以前处理的类似方法:
asm disk header 彻底损坏恢复
asm磁盘加入vg恢复
asm磁盘dd破坏恢复
asm disk 磁盘部分被清空恢复
再一例asm disk被误加入vg并且扩容lv恢复
fdisk分区导致asm disk破坏数据库恢复
再一起asm disk被格式化成ext3文件系统故障恢复
oracle asm disk格式化恢复—格式化为ext4文件系统
oracle asm disk格式化恢复—格式化为ntfs文件系统
ORA-15063: ASM discovered an insufficient number of disks for diskgroup 恢复
通过底层处理恢复出来没有覆盖的数据块中数据
20230827200941

再使用dul恢复出来其中数据,完成这次故障的核心数据恢复