ORA-600 2032故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 2032故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户数据库,异常断电之后,数据库运行不稳定(经常性的重启),通过分析发现

Wed Jun 29 01:04:39 2022
Completed: alter database open
Wed Jun 29 01:04:42 2022
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_j000_3284.trc:
ORA-12012: error on auto execute of job 1
ORA-01578: ORACLE data block corrupted (file # 2, block # 552)
ORA-01110: data file 2: 'E:\ORACLE\PRODUCT\10.2.0\ORADATA_2\ORCL\UNDOTBS01.DBF'
…………
Wed Jun 29 01:13:28 2022
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Wed Jun 29 01:15:34 2022
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_m000_5488.trc:
ORA-00600: internal error code, arguments: [6002], [6], [48], [5], [0], [], [], []
Wed Jun 29 01:20:54 2022
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.
SMON encountered 2 out of maximum 100 non-fatal internal errors.
Wed Jun 29 01:20:55 2022
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_smon_6956.trc:
ORA-00600: internal error code, arguments: [2032], [8389160], [8389160], [8192], [2], [255], [0], [767]

Non-fatal internal error happenned while SMON was doing flushing of monitored table stats.
SMON encountered 3 out of maximum 100 non-fatal internal errors.
Wed Jun 29 01:20:57 2022
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_smon_6956.trc:
ORA-00600: internal error code, arguments: [2032], [8389160], [8389160], [8192], [2], [255], [0], [767]
………………
Wed Jun 29 01:21:41 2022
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_q001_2124.trc:
ORA-00474: SMON process terminated with error

Wed Jun 29 01:21:42 2022
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_dbw0_3376.trc:
ORA-00474: SMON process terminated with error

Wed Jun 29 01:21:42 2022
Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl_reco_2412.trc:
ORA-00474: SMON process terminated with error

Instance terminated by PMON, pid = 7160

对ora-600 2032进行分析

*** 2022-06-29 01:13:26.907
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [2032], [8389160], [8389160], [8192], [2], [255], [0], [767]
Current SQL statement for this session:
insert into smon_scn_time (thread, time_mp, time_dp, scn, scn_wrp, scn_bas,  num_mappings, 
tim_scn_map) values (0, :1, :2, :3, :4, :5, :6, :7)
check trace file e:\oracle\product\10.2.0\db_2\rdbms\trace\orcl_ora_0.trc for preloading .sym file messages
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedmp+663           CALL???  ksedst+55            003C878B8 000000000 00FF57178
                                                   000000000
ksfdmp+19            CALL???  ksedmp+663           000000003 00ED07680 010453DA8
                                                   003CACC80
kgeriv+184           CALL???  ksfdmp+19            7FF2378C000 7FF5E2F81D8
                                                   7FF5EBB9300 7FF5EBB92F0
kgesiv+102           CALL???  kgeriv+184           000000002 00ED07040 000000002
                                                   00FF59110
ksesic7+125          CALL???  kgesiv+102           7FFFFFFF00800228 100000002
                                                   200000001 00FF59328
kcopcv+1014          CALL???  ksesic7+125          0000007F0 000000000 000800228
                                                   000000000
kcbchg1_main+3115    CALL???  kcopcv+1014          00001E778 7FF00000001
                                                   00001E980 00001E650
kcbchg1+238          CALL???  kcbchg1_main+3115    00ED07040 00ED07040 000040013
                                                   000000000
ktuchg+1331          CALL???  kcbchg1+238          7FF00000000 000000003
                                                   00FF597C8 00FF59780
ktbchg2+341          CALL???  ktuchg+1331          000000000 000000401 000000046
                                                   000BB5D21
kdtchg+916           CALL???  ktbchg2+341          000000001 000000000 000000240
                                                   000000000
kdtwrp+2582          CALL???  kdtchg+916           01077A268 01044404C 010444054
                                                   010441710
kdtInsRow+705        CALL???  kdtwrp+2582          01077A268 7FF5E2F8510
                                                   003CB3358 0009C20C4
insrowFastPath+125   CALL???  kdtInsRow+705        00ED07040 7FF55A86E30
                                                   7FF58E87918 00001F318
insdrvFastPath+478   CALL???  insrowFastPath+125   000000067 000000000 000000000
                                                   000000000
inscovexe+434        CALL???  insdrvFastPath+478   01077A268 000000000 010440070
                                                   00521E0C9
insExecStmtExecIniE  CALL???  inscovexe+434        7FF55A88FD8 7FF55A86E30
ngine+99                                           00FF5BAD8 0009FA3EC
insexe+453           CALL???  insExecStmtExecIniE  8B2D554D9623 000000006
                              ngine+99             000000006 0000000C0
opiexe+4991          CALL???  insexe+453           7FF55A885A0 00FF5BAD8
                                                   000000102 000000000
opiall0+1931         CALL???  opiexe+4991          7FF00000049 000000003
                                                   00FF5C160 000000020
opikpr+660           CALL???  opiall0+1931         000000065 000000022 00FF5C638
                                                   000000000
opiodr+1136          CALL???  opikpr+660           000000065 000000017 01044B798
                                                   07EEF1FCF
rpidrus+230          CALL???  opiodr+1136          000000065 000000017 01044B798
                                                   80005900000000
rpidru+112           CALL???  rpidrus+230          00FF5D3F0 000000003 01078D4A0
                                                   7FF5B6F1A80
rpiswu2+517          CALL???  rpidru+112           7FF5DA52BB8 000000000
                                                   000000000 000000008
kprball+1446         CALL???  rpiswu2+517          7FF5E408820 000000000
                                                   00FF5DAD0 000000002
ktf_scn_time+4951    CALL???  kprball+1446         01044B798 8B2D00000140
                                                   000000000 000000005
ktmmon+4107          CALL???  ktf_scn_time+4951    000000000 000000001 07FFFFFFF
                                                   00521A3C2
ktmSmonMain+26       CALL???  ktmmon+4107          0049CBAC0 000000004
                                                   8B2D554D9623 00001E768
ksbrdp+903           CALL???  ktmSmonMain+26       003CC2A18 0049CBADC 000000008
                                                   000000004
opirip+700           CALL???  ksbrdp+903           726F77740000001E 003C8B000
                                                   00FF5FA30 000000000
opidrv+860           CALL???  opirip+700           000000032 000000004 00FF5FD50
                                                   000000000
sou2o+52             CALL???  opidrv+860           000000032 000000004 00FF5FD50
                                                   000000003
opimai_real+272      CALL???  sou2o+52             000000000 000000000 000000000
                                                   000000000
opimai+96            CALL???  opimai_real+272      000000000 000000000 000000000
                                                   000000000
BackgroundThreadSta  CALL???  opimai+96            00FF5FEA8 000000001 000000000
rt+633                                             000000000
000000007738F56D     CALL???  BackgroundThreadSta  0069E4590 000000000 000000000
                              rt+633               000000000
0000000077703021     CALL???  000000007738F56D     000000000 000000000 000000000
                                                   000000000
 
--------------------- Binary Stack Dump ---------------------

通过该trace和alert日志信息可以确认是由于smon_scn_time的操作需要使用到undo,但是对应的undo block异常,从而使得该操作失败,进而引起数据库smon进程异常从而引起ORA-00474,数据库自动crash.处理问题比较简单:
1. 对异常undo进行处理,创建新undo,删除老undo
2. 对于smon_scn_time异常数据进行处理

Oracle Recovery Tools实战批量坏块修复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle Recovery Tools实战批量坏块修复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户数据库无法正常启动ORA-600 6711错误

SQL> startup mount pfile='d:/pfile.txt'
ORACLE instance started.

Total System Global Area 4294964032 bytes
Fixed Size                  9036608 bytes
Variable Size             889192448 bytes
Database Buffers         3388997632 bytes
Redo Buffers                7737344 bytes
Database mounted.
SQL> alter database open ;
alter database open 
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898],
[0], [], [], [], [], [], [], []
Process ID: 22708
Session ID: 978 Serial number: 56675

alert日志报错

2022-06-26T12:34:41.855326+08:00
alter database open
2022-06-26T12:34:41.984974+08:00
Ping without log force is disabled:
  instance mounted in exclusive mode.
Endian type of dictionary set to little
Undo initialization finished serial:0 start:313418906 end:313418906 diff:0 ms (0.0 seconds)
Database Characterset is ZHS16GBK
No Resource Manager plan active
2022-06-26T12:34:43.302315+08:00
Errors in file C:\APP\XFF\diag\rdbms\orcl\ora19c\trace\ora19c_ora_22708.trc  (incident=38629):
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898], [0], [], [], [], [], [], [], []
Incident details in: C:\APP\XFF\diag\rdbms\orcl\ora19c\incident\incdir_38629\ora19c_ora_22708_i38629.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2022-06-26T12:34:44.232115+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2022-06-26T12:34:44.315431+08:00
Errors in file C:\APP\XFF\diag\rdbms\orcl\ora19c\trace\ora19c_ora_22708.trc:
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898], [0], [], [], [], [], [], [], []
2022-06-26T12:34:44.315431+08:00
Errors in file C:\APP\XFF\diag\rdbms\orcl\ora19c\trace\ora19c_ora_22708.trc:
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898], [0], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
Errors in file C:\APP\XFF\diag\rdbms\orcl\ora19c\trace\ora19c_ora_22708.trc  (incident=38630):
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898], [0], [], [], [], [], [], [], []
Incident details in: C:\APP\XFF\diag\rdbms\orcl\ora19c\incident\incdir_38630\ora19c_ora_22708_i38630.trc
2022-06-26T12:34:45.266678+08:00
opiodr aborting process unknown ospid (22708) as a result of ORA-603
2022-06-26T12:34:45.274688+08:00
ORA-603 : opitsk aborting process
License high water mark = 1
USER (ospid: (prelim)): terminating the instance due to ORA error 

通过分析trace文件进行分析,确认是由于histgrm$表异常导致,通过一些特殊处理,绕过该表相关sql,open数据库,并且尝试导出数据

SQL> startup mount pfile='d:/pfile.txt';
ORACLE instance started.

Total System Global Area 4294964032 bytes
Fixed Size                  9036608 bytes
Variable Size             889192448 bytes
Database Buffers         3388997632 bytes
Redo Buffers                7737344 bytes
Database mounted.
SQL>
SQL>
SQL> alter database open;

Database altered.

使用expdp导出数据报ORA-01578错
expdp-ora-1578


通过分析是由于system有坏块导致,dbv检查文件
dbv-huikuai

通过Oracle Recovery Tools工具批量坏块修复功能修复
20220626123245
20220626123343

通过工具修复大量主要坏块被修复,还有一些内部逻辑错误(后续工具继续完善),再次尝试逻辑导出数据,无任何报错,数据比较完美恢复
20220626160209

ORA-15063: ASM discovered an insufficient number of disks for diskgroup 恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-15063: ASM discovered an insufficient number of disks for diskgroup 恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户反馈三个磁盘组无法正常mount,报错类似ORA-15032 ORA-15017 ORA-15063

SQL> ALTER DISKGROUP ASM_DATA MOUNT  /* asm agent *//* {0:0:2} */ 
NOTE: cache registered group ASM_DATA number=1 incarn=0xffa85ccd
NOTE: cache began mount (first) of group ASM_DATA number=1 incarn=0xffa85ccd
ERROR: no read quorum in group: required 2, found 0 disks
NOTE: cache dismounting (clean) group 1/0xFFA85CCD (ASM_DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 5709, image: oracle@XFF (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0xFFA85CCD (ASM_DATA) 
NOTE: cache ending mount (fail) of group ASM_DATA number=1 incarn=0xffa85ccd
NOTE: cache deleting context for group ASM_DATA 1/0xffa85ccd
Tue Jun 21 12:24:38 2022
NOTE: No asm libraries found in the system
ASM Health Checker found 1 new failures
GMON dismounting group 1 at 16 for pid 19, osid 5709
ERROR: diskgroup ASM_DATA was not mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup "ASM_DATA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "ASM_DATA"
ERROR: ALTER DISKGROUP ASM_DATA MOUNT  /* asm agent *//* {0:0:2} */

初步判断是asm disk异常导致(比如asm disk不能被扫描到,或者丢失,或者磁盘头损坏等),分析客户的asm disk的udev文件配置

KERNEL=="sdd1", NAME="asm_grid", OWNER="grid", GROUP="asmadmin", MODE="0660"          
KERNEL=="sde1", NAME="asm_system", OWNER="grid", GROUP="asmadmin", MODE="0660"    
KERNEL=="sdf1", NAME="asm_data", OWNER="grid", GROUP="asmadmin", MODE="0660"     

从udev的配置中可以看出来,客户以前是对3个磁盘进行分析,然后使用udev映射别名给asm使用的.通过对其中一个磁盘进行分析
20220621220634
20220621220728


通过上述winhex查看,可以确认该分区的磁盘头信息异常[该信息属于磁盘刚分区的时候信息,而不是asm disk的信息],和kfed看到的结果一致[磁盘头位置肯定损坏,其他位置目前未知]

H:\TEMP\dd>kfed read sdf_sdf1.dd
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
0064D8400 00000000 00000000 00000000 00000000  [................]
        Repeat 26 times
0064D85B0 00000000 00000000 00000000 02000000  [................]
0064D85C0 FE8E0001 003FFFFF DFFC0000 0000257F  [......?......%..]
0064D85D0 00000000 00000000 00000000 00000000  [................]
        Repeat 1 times
0064D85F0 00000000 00000000 00000000 AA550000  [..............U.]
0064D8600 00000000 00000000 00000000 00000000  [................]
  Repeat 223 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

分析其他位置的block情况,初步看基本上ok[运气还不错]

H:\TEMP\dd>kfed read sdf_sdf1.dd blkn=2|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

H:\TEMP\dd>kfed read sdf_sdf1.dd blkn=3|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

H:\TEMP\dd>kfed read sdf_sdf1.dd blkn=1 aun=2|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

通过检索备份出来的部分磁盘文件,找出来ORCLDISK信息部分(asm disk header)
20220621221843


然后利用这个部分对损坏的磁盘头进行修复,并且dd回生产环境中,并尝试mount磁盘组,数据库open成功
20220621181430
20220621222356


至此这个数据库运气不错,没有过多损坏,算完美恢复,可以进行了逻辑导出和rman备份,全部正常.为了后续安全,建议对其进行迁移

修改oracle scn小工具(patch scn)

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:修改oracle scn小工具(patch scn)

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在一些情况下(特别是一些数据库非常规恢复场景中),需要修改oracle scn绕过一些错误,让数据库open成功,在以前的版本中我们可以通过event,隐含参数,oradebug等方法进行修改,在一些较新的版本中这些方法都被oracle屏蔽,无法实现oracle scn进行调整,针对这种情况,开发了一个Patch_SCN小程序,实现对oracle数据库的scn进行调整

SQL> select dbms_flashback.get_system_change_number a from dual;

               A
----------------
    107367806959

通过工具查询scn信息,由于oracle的scn是动态的,因此和get_system_change_number 查询值有细微出入
20220614130358


修改scn
20220614130543

通过查询确认scn修改成功
20220614130553

该功能的通用版已经包含到oracle racovery tools工具中(注册版可用)
oracle-recovery-tools

软件下载:OraRecovery下载
使用说明:使用说明

ORA-01110 ORA-17070 OSD-04006 故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-01110 ORA-17070 OSD-04006 故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友找到我说应用访问数据库和导出数据都报ORA-01110 ORA-17070 OSD-04006之类错误,数据库可以正常open,但是业务访问关键数据和导出报错
20220611175122
20220611175156


对于这个错误,根据以往恢复经验,初步判断可能硬件异常(比如坏道,硬件故障)或者文件系统异常引起,让客户尝试拷贝该文件,确认该文件也无法拷贝
20220611174852

对于这种情况,如果放弃该文件,恢复其他文件数据,那样数据丢失比例太大,直接通过特定恢复工具对其损坏文件进行拷贝,最大限度强求当前文件数据,发现一些扇区损坏跳过继续拷贝
20220611174928

通过坏块检查工具进行检查确认该文件76个block损坏(对于32G的数据文件损坏1M数据,比较好效果)
20220611174909

对坏块进行处理,然后使用expdp导出数据,最大限度抢救数据
20220611175039

fdisk分区导致asm disk破坏数据库恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:fdisk分区导致asm disk破坏数据库恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

尝试mount data磁盘组

SQL> alter diskgroup DATADG mount 
NOTE: cache registered group DATADG number=1 incarn=0xbc43fafd
NOTE: cache began mount (first) of group DATADG number=1 incarn=0xbc43fafd
NOTE: Assigning number (1,0) to disk (/dev/raw/raw2)
Thu Jun 02 10:14:33 2022
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 27 for pid 27, osid 3853
NOTE: Assigning number (1,1) to disk ()
GMON querying group 1 at 28 for pid 27, osid 3853
NOTE: cache dismounting (clean) group 1/0xBC43FAFD (DATADG) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 3853, image: oracle@node1 (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0xBC43FAFD (DATADG) 
NOTE: cache ending mount (fail) of group DATADG number=1 incarn=0xbc43fafd
NOTE: cache deleting context for group DATADG 1/0xbc43fafd
GMON dismounting group 1 at 29 for pid 27, osid 3853
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
ERROR: diskgroup DATADG was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "1" is missing from group number "1" 
ERROR: alter diskgroup DATADG mount
Thu Jun 02 10:14:33 2022
ASM Health Checker found 1 new failures

报错信息比较明显 datadg的disk number 为1的磁盘丢失了。通过fdisk确认磁盘情况

Disk /dev/sdb: 42.9 GB, 42949672960 bytes
64 heads, 32 sectors/track, 40960 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0006c2be

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sda: 53.7 GB, 53687091200 bytes
64 heads, 32 sectors/track, 51200 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00061443

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           2        2049     2097152   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2            2050       10241     8388608   82  Linux swap / Solaris
Partition 2 does not end on cylinder boundary.
/dev/sda3           10242       12289     2097152   83  Linux
Partition 3 does not end on cylinder boundary.
/dev/sda4           12290       51200    39844864    5  Extended
Partition 4 does not end on cylinder boundary.
/dev/sda5           12291       14338     2097152   83  Linux
/dev/sda6           14340       50178    36699136   83  Linux
/dev/sda7           50180       51200     1045504   83  Linux

Disk /dev/sdc: 214.7 GB, 214748364800 bytes
255 heads, 63 sectors/track, 26108 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x1b3fba6b

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1        1045     8393931   83  Linux
/dev/sdc2            1046       26108   201318547+  83  Linux

Disk /dev/sdd: 536.9 GB, 536870912000 bytes
255 heads, 63 sectors/track, 65270 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x4c63ecad

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1       65270   524281243+  83  Linux

Disk /dev/sde: 536.9 GB, 536870912000 bytes
255 heads, 63 sectors/track, 65270 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000


Disk /dev/sdf: 536.9 GB, 536870912000 bytes
255 heads, 63 sectors/track, 65270 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

根据客户反馈,异常的应该是一个500G的磁盘,而其中sdb为分区,通过kfed命令分析,确认sdc1为ocr磁盘,sdc2为datadg的一块磁盘,另外一块磁盘应该在sdd,sde,sdf三者之中,通过kfed分析sde,sdf均不可能是asm disk(一块是文件系统,一块是彻底没有使用的空盘),如果datadg的磁盘没有丢失,那应该就是sdd这块磁盘,通过dd 磁盘100M空间,然后通过kfed进行分析确认

E:\TEMP\xff>kfed read sdd.dd
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
006648400 00000000 00000000 00000000 00000000  [................]
        Repeat 26 times
0066485B0 00000000 00000000 4C63ECAD 01000000  [..........cL....]
0066485C0 FE830001 003FFFFF CB370000 00003E7F  [......?...7..>..]
0066485D0 00000000 00000000 00000000 00000000  [................]
        Repeat 1 times
0066485F0 00000000 00000000 00000000 AA550000  [..............U.]
006648600 00000000 00000000 00000000 00000000  [................]
  Repeat 223 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

E:\TEMP\xff>kfed read sdd1.dd
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
006768400 00000000 00000000 00000000 00000000  [................]
        Repeat 26 times
0067685B0 00000000 00000000 70D364B4 FE000000  [.........d.p....]
0067685C0 FE83FFFF D13FFFFF BB7603EB 00003A93  [......?...v..:..]
0067685D0 00000000 00000000 00000000 00000000  [................]
        Repeat 1 times
0067685F0 00000000 00000000 00000000 AA550000  [..............U.]
006768600 02038201 00000008 80000001 826037C1  [.............7`.]
006768EA0 00000079 00800105 0000007A 00800105  [y.......z.......]
006768EB0 0000007C 00800105 0000007D 00800105  [|.......}.......]
0067693C0 0000015C 00800105 0000015D 00800105  [\.......].......]
0067693D0 0000015F 00800105 00000160 00800105  [_.......`.......]
0067693E0 00000161 00800105 00000163 00800105  [a.......c.......]
0067693F0 00000164 00800105 00000166 00800105  [d.......f.......]
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

E:\TEMP\xff>kfed read sdd.dd blkn=1|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            2 ; 0x002: KFBTYP_FREESPC
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                       1 ; 0x004: blk=1
kfbh.block.obj:              2147483649 ; 0x008: disk=1
kfbh.check:                  2197087544 ; 0x00c: 0x82f4e538
kfbh.fcn.base:                   616391 ; 0x010: 0x000967c7
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdfsb.aunum:                         0 ; 0x000: 0x00000000
kfdfsb.max:                         254 ; 0x004: 0x00fe
kfdfsb.cnt:                         254 ; 0x006: 0x00fe
kfdfsb.bound:                         0 ; 0x008: 0x0000
kfdfsb.flag:                          1 ; 0x00a: B=1
kfdfsb.ub1spare:                      0 ; 0x00b: 0x00
kfdfsb.spare[0]:                      0 ; 0x00c: 0x00000000
kfdfsb.spare[1]:                      0 ; 0x010: 0x00000000
kfdfsb.spare[2]:                      0 ; 0x014: 0x00000000

通过上述信息分析,基本上可以确认sdd磁盘以前是asm disk,但是被fdisk进行了分区,基于这种情况,通过对磁盘组进行修复

E:\TEMP\xff>kfed read sdd.ok
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483649 ; 0x008: disk=1
kfbh.check:                   424926402 ; 0x00c: 0x1953dcc2
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:         ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]:            0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                186646528 ; 0x020: 0x0b200000
kfdhdb.dsknum:                        1 ; 0x024: 0x0001
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:             DATADG_0001 ; 0x028: length=11
kfdhdb.grpname:                  DATADG ; 0x048: length=6
kfdhdb.fgname:              DATADG_0001 ; 0x068: length=11
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             33074858 ; 0x0a8: HOUR=0xa DAYS=0x15 MNTH=0xb YEAR=0x7e2
kfdhdb.crestmp.lo:           2375520256 ; 0x0ac: USEC=0x0 MSEC=0x1e4 SECS=0x19 MINS=0x23
kfdhdb.mntstmp.hi:             33074858 ; 0x0b0: HOUR=0xa DAYS=0x15 MNTH=0xb YEAR=0x7e2
kfdhdb.mntstmp.lo:           2375522304 ; 0x0b4: USEC=0x0 MSEC=0x1e6 SECS=0x19 MINS=0x23
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80
kfdhdb.dsksize:                  512000 ; 0x0c4: 0x0007d000
kfdhdb.pmcnt:                         6 ; 0x0c8: 0x00000006
kfdhdb.fstlocn:                       1 ; 0x0cc: 0x00000001
kfdhdb.altlocn:                       2 ; 0x0d0: 0x00000002
kfdhdb.f1b1locn:                      0 ; 0x0d4: 0x00000000
kfdhdb.redomirrors[0]:                0 ; 0x0d8: 0x0000
kfdhdb.redomirrors[1]:                0 ; 0x0da: 0x0000
kfdhdb.redomirrors[2]:                0 ; 0x0dc: 0x0000
kfdhdb.redomirrors[3]:                0 ; 0x0de: 0x0000
kfdhdb.dbcompat:              168820736 ; 0x0e0: 0x0a100000
kfdhdb.grpstmp.hi:             33072461 ; 0x0e4: HOUR=0xd DAYS=0xa MNTH=0x9 YEAR=0x7e2
kfdhdb.grpstmp.lo:           3452534784 ; 0x0e8: USEC=0x0 MSEC=0x260 SECS=0x1c MINS=0x33

磁盘组mount成功,数据库open成功,实现数据0丢失
20220611171941
20220611172005


使用rman对数据库进行备份,并且重建磁盘组实现数据0丢失

ORA-600 kcvent_internal_02故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 kcvent_internal_02故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库启动报ORA-00600: internal error code, arguments: [kcvent_internal_02]错,无法正常open

Reconfiguration complete
 parallel recovery started with 32 processes
Started redo scan
Completed redo scan
 read 22775 KB redo, 5055 data blocks need recovery
Started redo application at
 Thread 2: logseq 166395, block 88
Recovery of Online Redo Log: Thread 2 Group 3 Seq 166395 Reading mem 0
  Mem# 0: +DATA/orcl/onlinelog/group_3.283.1036687245
  Mem# 1: +FLASH/orcl/onlinelog/group_3.264.1036687257
Recovery of Online Redo Log: Thread 2 Group 4 Seq 166396 Reading mem 0
  Mem# 0: +DATA/orcl/onlinelog/group_4.284.1036687257
  Mem# 1: +FLASH/orcl/onlinelog/group_4.265.1036687257
Completed redo application of 15.97MB
Completed instance recovery at
 Thread 2: logseq 166396, block 15854, scn 27533037896
 5055 data blocks read, 5055 data blocks written, 22775 redo k-bytes read
Thread 2 advanced to log sequence 166397 (thread recovery)
Redo thread 2 internally disabled at seq 166397 
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_35652472.trc  (incident=195549):
ORA-00600: internal error code, arguments: [kcvent_internal_02], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_195549/orcl1_ora_35652472_i195549.trc

对应的trace文件信息

Dump continued from file: /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_35652472.trc
ORA-00600: internal error code, arguments: [kcvent_internal_02], [], [], [], [], [], [], [], [], [], [], []

========= Dump for incident 195549 (ORA 600 [kcvent_internal_02]) ========

*** 2022-06-06 22:17:48.743
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=5fmpzya54p4hf) -----
ALTER DATABASE OPEN /* db agent *//* {1:38339:2} */

----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
skdstdst()+40        bl       0000000109B1E77C     000000000 ? 000000001 ?
                                                   000000003 ? 000000000 ?
                                                   000000000 ? 000000001 ?
                                                   000000003 ? 000000000 ?
ksedst1()+112        call     skdstdst()           16F60DC8B26FAB02 ?
                                                   4846284100000000 ?
                                                   FFFFFFFFFFE46D0 ?
                                                   283C6E7C6A9A6 ? 10A6B923C ?
                                                   000000000 ? 110737880 ?
                                                   2050033FFFE46D8 ?
ksedst()+40          call     ksedst1()            000000000 ? 00000000A ?
                                                   07FFFFFFF ? 700000000003670 ?
                                                   000000000 ? 000000000 ?
                                                   000002004 ? 000000001 ?
dbkedDefDump()+1516  call     ksedst()             000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 300000003 ?
ksedmp()+72          call     dbkedDefDump()       310737880 ? 110000D40 ?
                                                   FFFFFFFFFFE4EE0 ? 1106AB740 ?
                                                   100124BB8 ? 000000000 ?
                                                   700011D7387FF08 ? 1106AB740 ?
ksfdmp()+100         call     ksedmp()             000000002 ? 000000000 ?
                                                   000000002 ? 10AF01CA8 ?
                                                   10A041C38 ? 000000000 ?
                                                   11073C760 ? 110737880 ?
dbgexPhaseII()+1904  call     ksfdmp()             000000000 ? 00000000A ?
                                                   000000002 ? 000000000 ?
                                                   000000002 ? 10A041C30 ?
                                                   000000000 ? 001050005 ?
dbgexProcessError()  call     dbgexPhaseII()       110737880 ? 11073A970 ?
+1556                                              00002FBDD ? 200000000 ?
                                                   FFFFFFFFFFE5DF8 ? 00000006C ?
                                                   200000000 ? 1000000000 ?
dbgeExecuteForError  call     dbgexProcessError()  110737880 ? 11073C760 ?
()+72                                              100000703 ? 000004000 ?
                                                   000000000 ? FFFFFFFFFFE9608 ?
                                                   000000001 ? 11073E4A8 ?
dbgePostErrorKGE()+  call     dbgeExecuteForError  FFFFFFFFFFE92B0 ?
2044                          ()                   700011D61558BB8 ? 102878B5C ?
                                                   000000000 ? 000000000 ?
                                                   FFFFFFFFFFE9608 ? 000000000 ?
                                                   000000000 ?
dbkePostKGE_kgsf()+  call     dbgePostErrorKGE()   07FFFFFFF ? 700000000003670 ?
68                                                 25800000001 ? 109E4A618 ?
                                                   000000000 ? 000000000 ?
                                                   FFFFFFFFFFEA0B0 ? 1109C0040 ?
kgeadse()+380        call     dbkePostKGE_kgsf()   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 069186EAB ?
kgerinv_internal()+  call     kgeadse()            000000002 ? 000000002 ?
48                                                 000000001 ? FFFFFFFFFFEAB58 ?
                                                   10A4E02F0 ? 000000002 ?
                                                   FFFFFFFFFFE9FE0 ? 000000000 ?
kgerinv()+48         call     kgerinv_internal()   200000002 ? 000000002 ?
                                                   FFFFFFFFFFEA060 ? 000000000 ?
                                                   102860EB0 ? FFFFFFFFFFEA458 ?
                                                   10285CE74 ? FFFFFFFFFFEA358 ?
kgeasnmierr()+72     call     kgerinv()            38400000001 ? 000000000 ?
                                                   10A4E0D20 ? 497F0A29CAE0 ?
                                                   000000001 ? FFFFFFFFFFEA1C0 ?
                                                   10A4E0D20 ? 110000D78 ?
kcvent_internal()+1  call     kgeasnmierr()        FFFFFFFFFFEA1C0 ? 200000002 ?
532                                                1F0410001F041 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000004 ?
kctenb_internal()+2  call     kcvent_internal()    FFFFFFFFFFEB378 ? 200000002 ?
772                                                FFFFFFFFFFEB448 ?
                                                   FFFFFFFFFFEB2E8 ?
                                                   41F6C57900000000 ?
                                                   000000000 ? FFFFFFFFFFEB330 ?
                                                   1106AB740 ?
kcfopd()+1508        call     kctenb_internal()    07FFFFFFF ? 000000000 ?
                                                   000000018 ? FFFFFFFFFFEC380 ?
                                                   000000000 ? 110A39050 ?
                                                   FFFFFFFFFFEC390 ? 000000000 ?
adbdrv()+8028        call     kcfopd()             081F0AD00 ? 00000000F ?
                                                   0FFFED4C0 ? 000000000 ?
                                                   FFFFFFFFFFED548 ? 100000000 ?
                                                   000000000 ? 1000100000000 ?
opiexe()+16048       call     adbdrv()             2300000023 ? 100000001 ?
                                                   000000000 ? FFFFFFFFFFF6960 ?
                                                   000000000 ? FFFFFFFFFFF6B60 ?
                                                   FFFFFFFFFFF6A98 ? 200000002 ?
opiosq0()+3984       call     opiexe()             700011E117B3B20 ? 000000000 ?
                                                   FFFFFFFFFFF7ED8 ? 110000D78 ?
                                                   000000001 ? 1109FA438 ?
                                                   FFFFFFFFFFF7E70 ?
                                                   2216414400000001 ?
kpooprx()+316        call     opiosq0()            300000000 ? 000000000 ?
                                                   000000000 ? A4000000000000 ?
                                                   000000000 ? FFFFFFFFFFF87F0 ?
                                                   28104221FFFF86F0 ?
                                                   1109FAB08 ?
kpoal8()+872         call     kpooprx()            1000CE68C ? 000000001 ?
                                                   FFFFFFFFFFFAD14 ? 100000001 ?
                                                   000000000 ? A40000000000A4 ?
                                                   109EB6D00 ? 000000000 ?
opiodr()+908         call     kpoal8()             100000000 ? 9001000A0091108 ?
                                                   000000FFF ? 07FFFFFF8 ?
                                                   FFFFFFFFFFF8F10 ? 000000018 ?
                                                   000000000 ? 000072FFF ?
ttcpip()+1028        call     opiodr()             5EFFFFA480 ? 1C00200048 ?
                                                   FFFFFFFFFFFA9F8 ? 000530058 ?
                                                   1108BEE30 ? 000000028 ?
                                                   FFFFFFFFFFFA3A0 ? 1108BEC70 ?
opitsk()+1612        call     ttcpip()             110135440 ? 000002078 ?
                                                   000000000 ? 110000D78 ?
                                                   110005210 ? 000000000 ?
                                                   FFFFFFFFFFFAA20 ?
                                                   2222208009EF13C0 ?
opiino()+940         call     opitsk()             110024C58 ? 000000000 ?
                                                   11079B550 ? 1107A0850 ?
                                                   110737880 ? FFFFFFFFFFFCAE0 ?
                                                   FFFFFFFFFFFEB3C ? 000000101 ?
opiodr()+908         call     opiino()             3C006C787C ?
                                                   BFF0000000000000 ?
                                                   FFFFFFFFFFFEF60 ?
                                                   FFFFFFFFFFFD5E9 ?
                                                   FFFFFFFFFFFD630 ? 1106AB740 ?
                                                   FFFFFFFFFFFD650 ?
                                                   9FFFFFFF000E608 ?
opidrv()+1132        call     opiodr()             3C0AFBC600 ? 410134340 ?
                                                   FFFFFFFFFFFEF60 ? 07530312F ?
                                                   108820CE4 ? 1106AB740 ?
                                                   7264626D732F6F72 ?
                                                   1106AB740 ?
sou2o()+136          call     opidrv()             3C0882A9D0 ? 41170031F ?
                                                   FFFFFFFFFFFEF60 ?
                                                   110017002A0000 ? 0E0DDF00D ?
                                                   1106AB740 ?
                                                   BADC0FFEE0DDF00D ?
                                                   BADC0FFEE0DDF00D ?
opimai_real()+560    call     sou2o()              FFFFFFFFFFFEFD0 ?
                                                   BADC0FFEE0DDF00D ?
                                                   90000000008BE3C ?
                                                   BADC0FFEE0DDF00D ?
                                                   000000002 ? 9001000A0091108 ?
                                                   A0000000A000000 ? 10B671248 ?
ssthrdmain()+276     call     opimai_real()        10B6B1D74 ? 9001000A0095260 ?
                                                   FFFFFFFFFFFF0B0 ? 10B6B1598 ?
                                                   FFFFFFFFFFFF0D0 ?
                                                   FFFFFFFFFFFF428 ?
                                                   900000000100968 ?
                                                   9001000A0091108 ?
main()+204           call     ssthrdmain()         240000000 ? FFFFFFFFFFFF418 ?
                                                   8FFFFFFF0000090 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   BADC0FFEE0DDF00D ?
                                                   BADC0FFEE0DDF00D ?
__start()+112        call     main()               000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
 

--------------------- Binary Stack Dump ---------------------

该错误在mos,互联网上没有任何信息,不过在alert日志中发现类似信息

Mon Jun 06 23:03:58 2022
Error: Controlfile sequence number in file header is different from the one in memory
       Please check that the correct mount options are used if controlfile is located on NFS

初步判断可能和这个错误有关系,解决相关问题后,尝试open库

SQL> recover database;

ORA-00279: change 27533037896 generated at 06/06/2022 22:17:46 needed for
thread 2
ORA-00289: suggestion :
+FLASH/orcl/archivelog/2022_06_06/thread_2_seq_166396.6532.1106691471
ORA-00280: change 27533037896 for thread 2 is in sequence #166396


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
Log applied.
Media recovery complete.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01216: thread 2 is expected to be disabled after CREATE CONTROLFILE

SQL> !oerr ora 01216
01216, 00000, "thread %s is expected to be disabled after CREATE CONTROLFILE"
// *Cause:  A thread that was given during CREATE CONTROLFILE is enabled, but
//          the datafiles indicate that it should be disabled.  This is
//          probably because the logs supplied to the CREATE CONTROLFILE
//          are old (from before the disabling of the thread).
// *Action: This thread is not required to run the database.  The CREATE
//          CONTROLFILE statement can be reissued without the problem thread,
//          and, if desired, the thread can be recreated after the database
//          is open.

ORA-01216这个错误比较也比较少见,但是感觉和thread有关系,大概的意思是thread 被disable了

SQL> select thread#,STATUS FROM V$THREAD;

   THREAD# STATUS
---------- ------------------
         1 CLOSED
         2 CLOSED

通过人工强制把thread个open,然后数据库启动成功

SQL> select thread#,status from v$thread;

   THREAD# STATUS
---------- ------------------
         1 OPEN
         2 CLOSED

SQL> alter database open;

Database altered.

然后启动thread 2,open 第二个节点

--需要open节点
QL> startup
ORACLE instance started.

Total System Global Area 1.2961E+11 bytes
Fixed Size                  2262400 bytes
Variable Size            3.3018E+10 bytes
Database Buffers         9.6368E+10 bytes
Redo Buffers              221818880 bytes
ORA-01618: redo thread 2 is not enabled - cannot mount


--已经open节点
SQL> ALTER DATABASE ENABLE THREAD 2;

Database altered.

--需要open节点
SQL> ALTER DATABASE MOUNT;

Database altered.

SQL> ALTER DATABASE OPEN;

Database altered.
xifenfei1:/home/grid$crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       xifenfei1                                      
               ONLINE  ONLINE       xifenfei2                                      
ora.FLASH.dg
               ONLINE  ONLINE       xifenfei1                                      
               ONLINE  ONLINE       xifenfei2                                      
ora.LISTENER.lsnr
               ONLINE  ONLINE       xifenfei1                                      
               ONLINE  ONLINE       xifenfei2                                      
ora.OCR.dg
               ONLINE  ONLINE       xifenfei1                                      
               ONLINE  ONLINE       xifenfei2                                      
ora.asm
               ONLINE  ONLINE       xifenfei1                  Started             
               ONLINE  ONLINE       xifenfei2                  Started             
ora.gsd
               OFFLINE OFFLINE      xifenfei1                                      
               OFFLINE OFFLINE      xifenfei2                                      
ora.net1.network
               ONLINE  ONLINE       xifenfei1                                      
               ONLINE  ONLINE       xifenfei2                                      
ora.ons
               ONLINE  ONLINE       xifenfei1                                      
               ONLINE  ONLINE       xifenfei2                                      
ora.registry.acfs
               ONLINE  ONLINE       xifenfei1                                      
               ONLINE  ONLINE       xifenfei2                                      
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xifenfei1                                      
ora.cvu
      1        OFFLINE OFFLINE                                                   
ora.xifenfei1.vip
      1        ONLINE  ONLINE       xifenfei1                                      
ora.xifenfei2.vip
      1        ONLINE  ONLINE       xifenfei2                                      
ora.oc4j
      1        ONLINE  ONLINE       xifenfei2                                      
ora.orcl.db
      1        ONLINE  ONLINE       xifenfei1                  Open                
      2        ONLINE  ONLINE       xifenfei2                  Open                
ora.scan1.vip
      1        ONLINE  ONLINE       xifenfei1                                      

PostgreSQL恢复系列:pg_filedump基本使用

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:PostgreSQL恢复系列:pg_filedump基本使用

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

当PostgreSQL遇到重大故障,使用各种方法都无法直接启动数据库,可以考虑使用类似oracle dul工具,直接离线方式读取文件进行恢复.这个工具为pg_filedump
pg_filedump安装

[root@xifenfei ~]# yum install pg_filedump_14.x86_64
Loaded plugins: langpacks, ulninfo
Resolving Dependencies
--> Running transaction check
---> Package pg_filedump_14.x86_64 0:14.1-1.rhel7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

======================================================================================================================
 Package                            Arch      Version                 Repository                                 Size
======================================================================================================================
Installing:                                 
 pg_filedump_14                     x86_64    14.1-1.rhel7            pgdg14                                     43 k
                                            
Transaction Summary                         
======================================================================================================================
Install  1 Package

Total download size: 43 k
Installed size: 81 k
Is this ok [y/d/N]: y
Downloading packages:
pg_filedump_14-14.1-1.rhel7.x86_64.rpm                                              |  43 kB  00:00:02     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : pg_filedump_14-14.1-1.rhel7.x86_64                                                      1/1 
  Verifying  : pg_filedump_14-14.1-1.rhel7.x86_64                                                      1/1 

Installed:
  pg_filedump_14.x86_64 0:14.1-1.rhel7                                                                     

Complete!
-bash-4.2$ pg_filedump 

Version 14.1 (for PostgreSQL 8.x .. 14.x)
Copyright (c) 2002-2010 Red Hat, Inc.
Copyright (c) 2011-2022, PostgreSQL Global Development Group

Usage: pg_filedump [-abcdfhikxy] [-R startblock [endblock]] [-D attrlist] [-S blocksize] [-s segsize] [-n segnumber] file

Display formatted contents of a PostgreSQL heap/index/control file
Defaults are: relative addressing, range of the entire file, block
               size as listed on block 0 in the file

The following options are valid for heap and index files:
  -a  Display absolute addresses when formatting (Block header
      information is always block relative)
  -b  Display binary block images within a range (Option will turn
      off all formatting options)
  -d  Display formatted block content dump (Option will turn off
      all other formatting options)
  -D  Decode tuples using given comma separated list of types
      Supported types:
        bigint bigserial bool char charN date float float4 float8 int
        json macaddr name numeric oid real serial smallint smallserial text
        time timestamp timestamptz timetz uuid varchar varcharN xid xml
      ~ ignores all attributes left in a tuple
  -f  Display formatted block content dump along with interpretation
  -h  Display this information
  -i  Display interpreted item details
  -k  Verify block checksums
  -o  Do not dump old values.
  -R  Display specific block ranges within the file (Blocks are
      indexed from 0)
        [startblock]: block to start at
        [endblock]: block to end at
      A startblock without an endblock will format the single block
  -s  Force segment size to [segsize]
  -t  Dump TOAST files
  -v  Ouput additional information about TOAST relations
  -n  Force segment number to [segnumber]
  -S  Force block size to [blocksize]
  -x  Force interpreted formatting of block items as index items
  -y  Force interpreted formatting of block items as heap items

The following options are valid for control files:
  -c  Interpret the file listed as a control file
  -f  Display formatted content dump along with interpretation
  -S  Force block size to [blocksize]
Additional functions:
  -m  Interpret file as pg_filenode.map file and print contents (all
      other options will be ignored)

Report bugs to <pgsql-bugs@postgresql.org>

创建测试表

-bash-4.2$ psql
psql (14.3)
Type "help" for help.

postgres=# create table pg_xifenfei(id int,name varchar(100));
CREATE TABLE
postgres=# insert into pg_xifenfei values(1,'www.xifenfei.com');
INSERT 0 1
postgres=# insert into pg_xifenfei values(2,'xienfei_pg_recovery');
INSERT 0 1
postgres=# select * from pg_xifenfei;
 id |        name         
----+---------------------
  1 | www.xifenfei.com
  2 | xienfei_pg_recovery
(2 rows)

postgres=# 

pg_filedump恢复数据

-bash-4.2$ pg_filedump /var/lib/pgsql/14/data/base/14487/16384

*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility
*
* File: /var/lib/pgsql/14/data/base/14487/16384
* Options used: None
*******************************************************************

Block    0 ********************************************************
<Header> -----
 Block Offset: 0x00000000         Offsets: Lower      32 (0x0020)
 Block: Size 8192  Version    4            Upper    8096 (0x1fa0)
 LSN:  logid      0 recoff 0x16299cf0      Special  8192 (0x2000)
 Items:    2                      Free Space: 8064
 Checksum: 0x0000  Prune XID: 0x00000000  Flags: 0x0000 ()
 Length (including item array): 32

<Data> -----
 Item   1 -- Length:   45  Offset: 8144 (0x1fd0)  Flags: NORMAL
 Item   2 -- Length:   48  Offset: 8096 (0x1fa0)  Flags: NORMAL


*** End of File Encountered. Last Block Read: 0 ***

-bash-4.2$ pg_filedump -D int,charn /var/lib/pgsql/14/data/base/14487/16384|grep COPY
COPY: 1 www.xifenfei.com
COPY: 2 xienfei_pg_recovery
-bash-4.2$ pg_filedump -D int,charn /var/lib/pgsql/14/data/base/14487/16384|grep COPY
> |awk '{$1=null;print $0}'>/tmp/pg_xifenfei_rec
-bash-4.2$ sed -i 's/^[ ]*//g' /tmp/pg_xifenfei_rec

导入数据验证

postgres=# truncate table pg_xifenfei;
TRUNCATE TABLE
postgres=# select * from pg_xifenfei;
 id | name 
----+------
(0 rows)
postgres=# copy pg_xifenfei from '/tmp/pg_xifenfei_rec'(DELIMITER ' ');
COPY 2
postgres=# select * from pg_xifenfei;
 id |        name         
----+---------------------
  1 | www.xifenfei.com
  2 | xienfei_pg_recovery
(2 rows)

通过上述简单测试证明,在PG数据库出现极端情况下,可以使用该方法进行最后的数据恢复,减少因为数据丢失带来的损失.

Oracle断电故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle断电故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

异常断电导致数据库异常恢复文件报ORA-00283 ORA-00742 ORA-00312

 
D:\check_db>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on 星期二 5月 31 00:38:42 2022

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


连接到:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> recover datafile 1;
ORA-00283: 恢复会话因错误而取消
ORA-00742: 日志读取在线程 %d 序列 %d 块 %d 中检测到写入丢失情况
ORA-00312: 联机日志 3 线程 1:
'D:\APP\ADMINISTRATOR\FAST_RECOVERY_AREA\ORCL\ONLINELOG\O1_MF_3_HJ32KJD5_.LOG'

这个错误比较明显是由于异常断电引起的写丢失导致.而且这种故障在没有备份的情况下,没有什么好处理方法,只能屏蔽一致性强制拉库,尝试强制拉库报错如下

SQL> startup mount pfile='d:/pfile.txt'
ORACLE 例程已经启动。

Total System Global Area 2.0310E+10 bytes
Fixed Size                  2290000 bytes
Variable Size            3690991280 bytes
Database Buffers         1.6576E+10 bytes
Redo Buffers               40837120 bytes
数据库装载完毕。
SQL> recover database until cancel;
ORA-00279: 更改 18755939194213 (在  生成) 对于线程 1 是必需的


指定日志: {<RET>=suggested | filename | AUTO | CANCEL}
D:\APP\ADMINISTRATOR\FAST_RECOVERY_AREA\ORCL\ONLINELOG\O1_MF_3_HJ32KJD5_.LOG
ORA-00600: internal error code, arguments: [3020], [2], [78824], [8467432], [],
[], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 78824, file
offset is 645726208 bytes)
ORA-10564: tablespace SYSAUX
ORA-01110: data file 2: 'D:\ORADATA\ORCL\SYSAUX01.DBF'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 80834


ORA-01112: 未启动介质恢复


SQL> alter database open resetlogs;
alter database open resetlogs
*
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [krsi_al_hdr_update.15], [4294967295], [], [],[], [], [], [], [], [], [], []

ORA-600 krsi_al_hdr_update.15错误,主要是由于redo异常导致无法resetlogs成功,具体参考:Alter Database Open Resetlogs returns error ORA-00600: [krsi_al_hdr_update.15], (Doc ID 2026541.1)描述,处理这个问题之后,再次resetlogs库,报ORA-600 2662错误

SQL> alter database open resetlogs;
alter database open resetlogs
*
第 1 行出现错误:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [4366], [4112122046],
[4366], [4112228996], [12583040], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [2662], [4366], [4112122045],
[4366], [4112228996], [12583040], [], [], [], [], [], []
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [4366], [4112122040],
[4366], [4112228996], [12583040], [], [], [], [], [], []
进程 ID: 4644
会话 ID: 1701 序列号: 3

这个问题比较简单,通过修改scn即可绕过去,之后数据库open报ORA-600 4194等错误

SQL> alter database open ;
alter database open 
*
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [4194], [
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_smon_5112.trc  (incident=322982):
ORA-00600: internal error code, arguments: [4137], [10.33.3070116], [0], [0], [], [], [], [], [], [], [], []
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_322982\orcl_smon_5112_i322982.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
replication_dependency_tracking turned off (no async multimaster replication found)
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_3340.trc  (incident=323030):
ORA-00600: 内部错误代码, 参数: [4194], [
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_323030\orcl_ora_3340_i323030.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue May 31 09:05:04 2022
Sweep [inc][322982]: completed
ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering transaction (10, 33).
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_smon_5112.trc:
ORA-00600: internal error code, arguments: [4137], [10.33.3070116], [0], [0], [], [], [], [], [], [], [], []
Checker run found 1 new persistent data failures
Tue May 31 09:05:05 2022
Sweep [inc][323030]: completed
Sweep [inc2][322982]: completed
Tue May 31 09:05:14 2022
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_smon_5112.trc  (incident=322983):
ORA-00600: internal error code, arguments: [4193], [10.33.3070116], [0], [], [], [], [], [], [], [], [], []
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_322983\orcl_smon_5112_i322983.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue May 31 09:05:14 2022
ORA-600 signalled during: alter database open...
Block recovery stopped at EOT rba 2.61.16
Block recovery completed at rba 2.61.16, scn 4366.4112429058
Block recovery from logseq 2, block 60 to scn 18755939643393
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: D:\APP\ADMINISTRATOR\FAST_RECOVERY_AREA\ORCL\ONLINELOG\O1_MF_2_K9BSVC11_.LOG
Block recovery completed at rba 2.61.16, scn 4366.4112429058
Dumping diagnostic data in directory=[cdmp_2022053],requested by(instance=1,osid=5112(SMON)),summary=[incident=322983].
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_smon_5112.trc:
ORA-01595: error freeing extent (3) of rollback segment (1))
ORA-00600: internal error code, arguments: [4193], [10.33.3070116], [3], [], [], [], [], [], [], [], [], []

对异常undo进行处理,数据库正常open成功

SQL> shutdown immediate;
ORA-00600: 内部错误代码, 参数: [4193], [


SQL> shutdown abort;
ORACLE 例程已经关闭。
SQL> startup mount
ORACLE 例程已经启动。

Total System Global Area 2.0310E+10 bytes
Fixed Size                  2290000 bytes
Variable Size            3690991280 bytes
Database Buffers         1.6576E+10 bytes
Redo Buffers               40837120 bytes
数据库装载完毕。
SQL> alter database open;

数据库已更改。

hcheck检测有一些字典不一致,建议客户逻辑导出,然后导入到新库中

HCheck Version 07MAY18 on 31-5月 -2022 09:12:22
----------------------------------------------
Catalog Version 11.2.0.4.0 (1102000400)
db_name: ORCL

                                   Catalog       Fixed
Procedure Name                     Version    Vs Release    Timestamp      Resul
t
------------------------------ ... ---------- -- ---------- -------------- -----
-
.- LobNotInObj                 ... 1102000400 <=  *All Rel* 05/31 09:12:22 PASS
.- MissingOIDOnObjCol          ... 1102000400 <=  *All Rel* 05/31 09:12:22 PASS
.- SourceNotInObj              ... 1102000400 <=  *All Rel* 05/31 09:12:22 PASS
.- OversizedFiles              ... 1102000400 <=  *All Rel* 05/31 09:12:22 PASS
.- PoorDefaultStorage          ... 1102000400 <=  *All Rel* 05/31 09:12:22 PASS
.- PoorStorage                 ... 1102000400 <=  *All Rel* 05/31 09:12:22 PASS
.- TabPartCountMismatch        ... 1102000400 <=  *All Rel* 05/31 09:12:22 PASS
.- OrphanedTabComPart          ... 1102000400 <=  *All Rel* 05/31 09:12:22 PASS
.- MissingSum$                 ... 1102000400 <=  *All Rel* 05/31 09:12:22 PASS
.- MissingDir$                 ... 1102000400 <=  *All Rel* 05/31 09:12:22 PASS
.- DuplicateDataobj            ... 1102000400 <=  *All Rel* 05/31 09:12:22 PASS
.- ObjSynMissing               ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- ObjSeqMissing               ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- OrphanedUndo                ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- OrphanedIndex               ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- OrphanedIndexPartition      ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- OrphanedIndexSubPartition   ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- OrphanedTable               ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- OrphanedTablePartition      ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- OrphanedTableSubPartition   ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- MissingPartCol              ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- OrphanedSeg$                ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- OrphanedIndPartObj#         ... 1102000400 <=  *All Rel* 05/31 09:12:23 FAIL

HCKE-0024: Orphaned Index Partition Obj# (no OBJ$) (Doc ID 1360935.1)
ORPHAN INDPART$: OBJ#=149167 BO#=6378 - no OBJ$ row
ORPHAN INDPART$: OBJ#=149168 BO#=6378 - no OBJ$ row

.- DuplicateBlockUse           ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- FetUet                      ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- Uet0Check                   ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- SeglessUET                  ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- BadInd$                     ... 1102000400 <=  *All Rel* 05/31 09:12:23 FAIL

HCKE-0030: OBJ$ INDEX entry has no IND$ or INDPART$/INDSUBPART$ entry (Doc ID 13
60528.1)
OBJ$ INDEX PARTITION has no INDPART$ entry: Obj#=148278 SYS Name=WRH$_FILESTATXS
_PK PARTITION=WRH$_FILEST_1572571104_16462
OBJ$ INDEX PARTITION has no INDPART$ entry: Obj#=148920 SYS Name=WRH$_FILESTATXS
_PK PARTITION=WRH$_FILEST_1572571104_16678

.- BadTab$                     ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- BadIcolDepCnt               ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- ObjIndDobj                  ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- TrgAfterUpgrade             ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- ObjType0                    ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- BadOwner                    ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- StmtAuditOnCommit           ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- BadPublicObjects            ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- BadSegFreelist              ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- BadDepends                  ... 1102000400 <=  *All Rel* 05/31 09:12:23 WARN

HCKW-0016: Dependency$ p_timestamp mismatch for VALID objects (Doc ID 1361045.1)

[E] - P_OBJ#=6376 D_OBJ#=6765

.- CheckDual                   ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- ObjectNames                 ... 1102000400 <=  *All Rel* 05/31 09:12:23 PASS
.- BadCboHiLo                  ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
.- ChkIotTs                    ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
.- NoSegmentIndex              ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
.- BadNextObject               ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
.- DroppedROTS                 ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
.- FilBlkZero                  ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
.- DbmsSchemaCopy              ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
.- OrphanedObjError            ... 1102000400 >  1102000000 05/31 09:12:24 PASS
.- ObjNotLob                   ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
.- MaxControlfSeq              ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
.- SegNotInDeferredStg         ... 1102000400 >  1102000000 05/31 09:12:24 PASS
.- SystemNotRfile1             ... 1102000400 >   902000000 05/31 09:12:24 PASS
.- DictOwnNonDefaultSYSTEM     ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
.- OrphanTrigger               ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
.- ObjNotTrigger               ... 1102000400 <=  *All Rel* 05/31 09:12:24 PASS
---------------------------------------
31-5月 -2022 09:12:24  Elapsed: 2 secs
---------------------------------------
Found 4 potential problem(s) and 1 warning(s)
Contact Oracle Support with the output and trace file
to check if the above needs attention or not

PL/SQL 过程已成功完成。

PostgreSQL恢复系列:pg_control异常恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:PostgreSQL恢复系列:pg_control异常恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在PG中pg_control文件类似oracle数据库的control文件(控制文件),在Oracle中如果该文件丢失/损坏,可以通过alter database create controlfile命令进行创建,对于PG数据库来说也可以通过pg_resetwal命令来实现创建,由于pg_control文件损坏,需要人工指定一些参数完成pg_resetwal相关操作
pg_resetwal 使用说明

-bash-4.2$ pg_resetwal --help
pg_resetwal resets the PostgreSQL write-ahead log.

Usage:
  pg_resetwal [OPTION]... DATADIR

Options:
  -c, --commit-timestamp-ids=XID,XID
                                   set oldest and newest transactions bearing
                                   commit timestamp (zero means no change)
 [-D, --pgdata=]DATADIR            data directory
  -e, --epoch=XIDEPOCH             set next transaction ID epoch
  -f, --force                      force update to be done
  -l, --next-wal-file=WALFILE      set minimum starting location for new WAL
  -m, --multixact-ids=MXID,MXID    set next and oldest multitransaction ID
  -n, --dry-run                    no update, just show what would be done
  -o, --next-oid=OID               set next OID
  -O, --multixact-offset=OFFSET    set next multitransaction offset
  -u, --oldest-transaction-id=XID  set oldest transaction ID
  -V, --version                    output version information, then exit
  -x, --next-transaction-id=XID    set next transaction ID
      --wal-segsize=SIZE           size of WAL segments, in megabytes
  -?, --help                       show this help, then exit

Report bugs to <pgsql-bugs@lists.postgresql.org>.
PostgreSQL home page: <https://www.postgresql.org/>

确认现在业务表记录情况

-bash-4.2$ psql
psql (14.3)
Type "help" for help.

postgres=# select count(1) from ac_event;
 count  
--------
 246266
(1 row)

模拟pg_control文件异常

-bash-4.2$ ps -ef|grep postgres
postgres  37178      1  0 09:58 ?        00:00:00 /usr/pgsql-14/bin/postgres -D /var/lib/pgsql/14/data
postgres  37179  37178  0 09:58 ?        00:00:00 postgres: logger 
postgres  37181  37178  0 09:58 ?        00:00:00 postgres: checkpointer 
postgres  37182  37178  0 09:58 ?        00:00:00 postgres: background writer 
postgres  37183  37178  0 09:58 ?        00:00:00 postgres: walwriter 
postgres  37184  37178  0 09:58 ?        00:00:00 postgres: autovacuum launcher 
postgres  37185  37178  0 09:58 ?        00:00:00 postgres: stats collector 
postgres  37186  37178  0 09:58 ?        00:00:00 postgres: logical replication launcher 
root      41368  41314  0 11:06 pts/1    00:00:00 su - postgres
postgres  41369  41368  0 11:06 pts/1    00:00:00 -bash
postgres  45071  41369  0 12:07 pts/1    00:00:00 ps -ef
postgres  45072  41369  0 12:07 pts/1    00:00:00 grep --color=auto postgres
-bash-4.2$ kill -9 37178
-bash-4.2$ ps -ef|grep postgres
root      41368  41314  0 11:06 pts/1    00:00:00 su - postgres
postgres  41369  41368  0 11:06 pts/1    00:00:00 -bash
postgres  45095  41369  0 12:08 pts/1    00:00:00 ps -ef
postgres  45096  41369  0 12:08 pts/1    00:00:00 grep --color=auto postgres
-bash-4.2$ pwd
/var/lib/pgsql/14/data/global
-bash-4.2$ ls -l pg_control 
-rw-------. 1 postgres postgres 8192 May 30 12:04 pg_control
-bash-4.2$ rm -rf pg_control 
-bash-4.2$ ls -l pg_control 
ls: cannot access pg_control: No such file or directory

PG启动失败

-bash-4.2$ pg_ctl start 
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....postgres: could not find the database system
Expected to find it in the directory "/var/lib/pgsql/14/data",
but could not open file "/var/lib/pgsql/14/data/global/pg_control": No such file or directory
 stopped waiting
pg_ctl: could not start server
Examine the log output.

创建空pg_control文件启动依旧失败

-bash-4.2$ touch /var/lib/pgsql/14/data/global/pg_control
-bash-4.2$ pg_ctl start 
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2022-05-30 12:09:43.953 CST [45215] PANIC:  
   could not read file "global/pg_control": read 0 of 296
 stopped waiting
pg_ctl: could not start server
Examine the log output.

设置next-wal-file
-l, –next-wal-file=WALFILE,这个参数设置下一个新的WAL文件的最小值,这个值可以从$PGDATA/pg_wal目录下去看最后一个WAL 文件,这个文件的id+1即可

-bash-4.2$ pwd
/var/lib/pgsql/14/data/pg_wal
-bash-4.2$ ls -l
total 16384
-rw-------. 1 postgres postgres 16777216 May 30 12:04 000000010000000000000014
drwx------. 2 postgres postgres        6 May 24 02:20 archive_status
-bash-4.2$ 

这个文件+1,-l 000000010000000000000015
设置next-transaction
-x, –next-transaction-id=XID,这个参数设置pg_control中的下一个XID的值,这个值可以从pg_xact目录下的文件中查询

-bash-4.2$ pwd
/var/lib/pgsql/14/data/pg_xact
-bash-4.2$ ls -ltr
total 8
-rw-------. 1 postgres postgres 8192 May 30 12:03 0000

最后一个是0000,那么下一个XID就是0001,然后乘以 1048576 (0×100000),实际上后面直接加5个0就行了。注意,这个值是16进制的。-x 0×000100000
multixact-ids设置
-m, –multixact-ids=MXID1,MXID2,这个参数包含两个部分,MXID1和MXID2,都可以从$PGDATA/pg_multixact/offsets目录下获得。MXID1的值,首先找到最大值,+1,再乘以 65536 (0×10000,相当于后面加4个0)作为这个参数的前半部分。找到最小的值,后面加4个0,作为MXID2的值

-bash-4.2$ pwd
/var/lib/pgsql/14/data/pg_multixact/offsets
-bash-4.2$ ls -ltr
total 8
-rw-------. 1 postgres postgres 8192 May 29 22:06 0000
-bash-4.2$ 

-m 0×00010000, 0×00000000(由于oldest multitransaction ID不能为0,因此后续这个值需要适当调整)
multixact-offset设置
-O, –multixact-offset=OFFSET,这个参数可以从$PGDATA/pg_multixact/members目录下获得。找到最大值,+1,乘以 52352 (0xCC80)

-bash-4.2$ pwd
/var/lib/pgsql/14/data/pg_multixact/members
-bash-4.2$ ls -ltr
total 8
-rw-------. 1 postgres postgres 8192 May 24 02:20 0000

-O 0xCC80
尝试执行pg_resetwal

-bash-4.2$ pg_resetwal -l 000000010000000000000015 -x 0x000100000 -m 0x00010000,0x00000000 -O 0xCC80 $PGDATA
pg_resetwal: error: oldest multitransaction ID (-m) must not be 0

multixact-ids值不对,进行调整后处理
postmaster.pid文件需要清理
由于PG库异常关闭,需要人工清理掉该文件

-bash-4.2$ pg_resetwal -l 000000010000000000000015 -x 0x000100000 -m 0x00020000,0x00010000 -O 0xCC80 $PGDATA
pg_resetwal: error: lock file "postmaster.pid" exists
-bash-4.2$ rm -rf postmaster.pid 

pg_resetwal结果预览

-bash-4.2$ pg_resetwal -l 000000010000000000000015 -x 0x000100000 -m 0x00020000,0x00010000 -O 0xCC80 $PGDATA
pg_resetwal: warning: pg_control exists but is broken or wrong version; ignoring it
Guessed pg_control values:

pg_control version number:            1300
Catalog version number:               202107181
Database system identifier:           7103392535324046312
Latest checkpoint's TimeLineID:       1
Latest checkpoint's full_page_writes: off
Latest checkpoint's NextXID:          0:3
Latest checkpoint's NextOID:          12000
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        3
Latest checkpoint's oldestXID's DB:   0
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 0
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float8 argument passing:              by value
Data page checksum version:           0


Values to be changed:

First log segment after reset:        000000010000000000000015
NextMultiXactId:                      131072
OldestMultiXid:                       65536
OldestMulti's DB:                     0
NextMultiOffset:                      52352
NextXID:                              1048576
OldestXID:                            3
OldestXID's DB:                       0

If these values seem acceptable, use -f to force reset.

pg_resetwal进行创建pg_control并启动PG

-bash-4.2$ pg_resetwal -l 000000010000000000000015 -x 0x000100000 -m 0x00020000,0x00010000 -O 0xCC80 -f $PGDATA
pg_resetwal: warning: pg_control exists but is broken or wrong version; ignoring it
Write-ahead log reset
-bash-4.2$ pg_ctl start
waiting for server to start....2022-05-30 13:33:28.266 CST [51437] LOG:  
redirecting log output to logging collector process
2022-05-30 13:33:28.266 CST [51437] HINT:  Future log output will appear in directory "log".
 done
server started

验证数据

-bash-4.2$ psql
psql (14.3)
Type "help" for help.

postgres=#  select count(1) from ac_event;
 count  
--------
 245275
(1 row)

这种方法恢复之后,建议理解dump数据,然后导入到新库中