gpk-update-icon进程占用CPU资源100%

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:gpk-update-icon进程占用CPU资源100%

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在一个客户的机器上发现gpk-update-icon进程长期占用cpu 100%
20230517141726


gpk-update-icon进程在GUI模式下会自动通知rpm软件包更新,是由gnome-packagekit的bug造成的。
gpk-update-icon使用递归主循环,递归循环从dbus回调调用。因此,它处于调度操作的中间,并且在操作完成之前dbus无法进一步调度。
临时解决方法

killall gpk-update-icon

20230517141819


掉掉相关进程之后临时恢复正常,如果要防止后续再发生该问题,可以把系统启动到非图形化界面

[root@HIS_DG ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.8 (Santiago)
[root@HIS_DG ~]# runlevel
N 5
[root@HIS_DG ~]# who -r
         run-level 5  2021-11-13 12:04
[root@HIS_DG ~]# init 3
[root@HIS_DG ~]# who -r
         run-level 3  2023-05-17 14:12                   last=5
[root@HIS_DG ~]# runlevel
5 3

或者卸载相关包

#yum remove gnome-packagekit 或 rpm -e gpk-update-icon

难见的oracle 9i恢复—2023年

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:难见的oracle 9i恢复—2023年

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

时过境迁,以前恢复大量oracle 8/9版本的库,现在一套oracle 9i的库都比较稀奇了.今天恢复客户一套9.2.0.6的aix环境rac库,通过分析确认主要问题:
1. 重建控制文件,resetlogs库遗漏数据文件
missing_dbf


2. 数据库启动主要报错ORA-600 2663和ORA-600 kclchkblk_4

Tue Nov  8 09:10:05 2022
Successfully onlined Undo Tablespace 1.
Dictionary check beginning
Tablespace 'TEMP' #2 found in data dictionary,
but not in the controlfile. Adding to controlfile.
File #84 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00084' in the controlfile.
This file can no longer be recovered so it must be dropped.
Dictionary check complete
Tue Nov  8 09:10:05 2022
SMON: enabling tx recovery
Tue Nov  8 09:10:05 2022
Database Characterset is ZHS16GBK
Tue Nov  8 09:10:05 2022
Errors in file /u01/prod/proddb/9.2.0/admin/udump/prod1_ora_536662.trc:
ORA-00600: internal error code, arguments: [2663], [3301], [2638369768], [3301], [2640322622], [], [], []
Tue Nov  8 09:10:06 2022
Errors in file /u01/prod/proddb/9.2.0/admin/bdump/prod1_smon_647352.trc:
ORA-00600: internal error code, arguments: [kclchkblk_4], [3301], [18446744072061740072],[3301],[18446744072052954088]
Tue Nov  8 09:10:06 2022
Errors in file /u01/prod/proddb/9.2.0/admin/udump/prod1_ora_536662.trc:
ORA-00600: internal error code, arguments: [2663], [3301], [2638369768], [3301], [2640322622], [], [], []
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 536662
ORA-1092 signalled during: alter database open...

根据客户文件名称的规则,推算出来84号文件实际的文件名(因为使用的是lv[aix的hacmp管理的lv的裸设备方式]),通过dbv确认文件无坏块

DBVERIFY: Release 9.2.0.6.0 - Production on Sat May 13 16:44:09 2023

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

DBVERIFY - Verification starting : FILE = /dev/ra_txn_ind12.dbf


DBVERIFY - Verification complete

Total Pages Examined         : 256000
Total Pages Processed (Data) : 0
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 299
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 13
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 255688
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Highest block SCN            : 11177081099136 (2602.1576194944)

bbed验证文件该文件是否是84号文件

$ bbed blocksize=8192 filename='/dev/ra_txn_ind12.dbf'   
Password: 

BBED: Release 2.0.0.0.0 - Limited Production on Mon May 15 09:45:44 2023

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

************* !!! For Oracle Internal Use only !!! ***************

BBED> map
 File: /dev/ra_txn_ind12.dbf (0)
 Block: 1                                     Dba:0x00000000
------------------------------------------------------------
 Data File Header

 struct kcvfh, 608 bytes                    @0       

 ub4 tailchk                                @8188    


BBED> p kcvfh
struct kcvfh, 608 bytes                     @0       
   struct kcvfhbfh, 20 bytes                @0       
      ub1 type_kcbh                         @0        0x0b
      ub1 frmt_kcbh                         @1        0x02
      ub1 spare1_kcbh                       @2        0x00
      ub1 spare2_kcbh                       @3        0x00
      ub4 rdba_kcbh                         @4        0x15000001
      ub4 bas_kcbh                          @8        0x00000000
      ub2 wrp_kcbh                          @12       0x0000
      ub1 seq_kcbh                          @14       0x01
      ub1 flg_kcbh                          @15       0x04 (KCBHFCKV)
      ub2 chkval_kcbh                       @16       0x1b4a
      ub2 spare3_kcbh                       @18       0x0000
   struct kcvfhhdr, 76 bytes                @20      
      ub4 kccfhswv                          @20       0x09200000
      ub4 kccfhcvn                          @24       0x08000000
      ub4 kccfhdbi                          @28       0x05d15ccf
      ……
      ub4 kccfhcsq                          @40       0x00525a20
      ub4 kccfhfsz                          @44       0x0003e800
      s_blkz kccfhbsz                       @48       0x00
      ub2 kccfhfno                          @52       0x0054
      ub2 kccfhtyp                          @54       0x0003
   ……
   ub4 kcvfhrfn                             @528      0x00000054  ---确认是84号文件
  ……

通过bbed修改文件相关信息,然后尝试rename文件,但是recover datafile 84报错

Mon May 15 09:49:44 2023
alter database rename file '/u01/prod/proddb/9.2.0/dbs/MISSING00084' to '/dev/ra_txn_ind12.dbf'
Mon May 15 09:49:44 2023
Completed: alter database rename file '/u01/prod/proddb/9.2.0
Mon May 15 09:51:15 2023
ALTER DATABASE RECOVER  datafile 84  
Media Recovery Start
Mon May 15 09:51:15 2023
Errors in file /u01/prod/proddb/9.2.0/admin/udump/prod1_ora_467190.trc:
ORA-07445: exception encountered: core dump [] [] [] [] [] []

通过处理之后,数据库recover 正常,但是open报ORA-600 4193错误

Mon May 15 09:57:53 2023
ALTER DATABASE RECOVER  DATABASE  
Media Recovery Start
Mon May 15 09:57:53 2023
Recovery of Online Redo Log: Thread 1 Group 1 Seq 4 Reading mem 0
  Mem# 0 errs 0: /dev/rlog01a.dbf
  Mem# 1 errs 0: /dev/rlog01b.dbf
Media Recovery Complete
Completed: ALTER DATABASE RECOVER  DATABASE  
Mon May 15 09:59:24 2023
alter database open
Mon May 15 09:59:24 2023
Beginning crash recovery of 1 threads
Mon May 15 09:59:24 2023
Started redo scan
Mon May 15 09:59:24 2023
Completed redo scan
 75 redo blocks read, 0 data blocks need recovery
Mon May 15 09:59:24 2023
Started recovery at
 Thread 1: logseq 4, block 2, scn 3301.2638369687
Mon May 15 09:59:24 2023
Recovery of Online Redo Log: Thread 1 Group 1 Seq 4 Reading mem 0
  Mem# 0 errs 0: /dev/rlog01a.dbf
  Mem# 1 errs 0: /dev/rlog01b.dbf
Mon May 15 09:59:24 2023
Completed redo application
Mon May 15 09:59:24 2023
Ended recovery at
 Thread 1: logseq 4, block 77, scn 3301.2638389765
 0 data blocks read, 0 data blocks written, 75 redo blocks read
Crash recovery completed successfully
Mon May 15 09:59:25 2023
Thread 1 advanced to log sequence 5
Thread 1 opened at log sequence 5
  Current log# 2 seq# 5 mem# 0: /dev/rlog02a.dbf
  Current log# 2 seq# 5 mem# 1: /dev/rlog02b.dbf
Successful open of redo thread 1
Mon May 15 09:59:25 2023
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon May 15 09:59:25 2023
SMON: enabling cache recovery
Mon May 15 09:59:25 2023
ARC0: Media recovery disabled
Mon May 15 09:59:25 2023
Successfully onlined Undo Tablespace 1.
Dictionary check beginning
Tablespace 'TEMP' #2 found in data dictionary,
but not in the controlfile. Adding to controlfile.
Dictionary check complete
Mon May 15 09:59:25 2023
SMON: enabling tx recovery
Mon May 15 09:59:25 2023
Database Characterset is ZHS16GBK
Mon May 15 09:59:25 2023
Errors in file /u01/prod/proddb/9.2.0/admin/bdump/prod1_smon_413872.trc:
ORA-00600: internal error code, arguments: [4193], [781], [6399], [], [], [], [], []
Mon May 15 09:59:25 2023
Errors in file /u01/prod/proddb/9.2.0/admin/udump/prod1_ora_844004.trc:
ORA-00600: internal error code, arguments: [4193], [56042], [1895], [], [], [], [], []
Mon May 15 09:59:26 2023
Doing block recovery for fno: 12 blk: 153
Mon May 15 09:59:26 2023
Doing block recovery for fno: 12 blk: 2893
Mon May 15 09:59:26 2023
Recovery of Online Redo Log: Thread 1 Group 2 Seq 5 Reading mem 0
  Mem# 0 errs 0: /dev/rlog02a.dbf
Mon May 15 09:59:26 2023
Recovery of Online Redo Log: Thread 1 Group 2 Seq 5 Reading mem 0
Mon May 15 09:59:26 2023
  Mem# 1 errs 0: /dev/rlog02b.dbf
Mon May 15 09:59:26 2023
  Mem# 0 errs 0: /dev/rlog02a.dbf
  Mem# 1 errs 0: /dev/rlog02b.dbf
Doing block recovery for fno: 12 blk: 3009
Mon May 15 09:59:26 2023
Recovery of Online Redo Log: Thread 1 Group 2 Seq 5 Reading mem 0
  Mem# 0 errs 0: /dev/rlog02a.dbf
  Mem# 1 errs 0: /dev/rlog02b.dbf
Mon May 15 09:59:26 2023
Doing block recovery for fno: 12 blk: 89
Mon May 15 09:59:26 2023
Recovery of Online Redo Log: Thread 1 Group 2 Seq 5 Reading mem 0
  Mem# 0 errs 0: /dev/rlog02a.dbf
  Mem# 1 errs 0: /dev/rlog02b.dbf
Mon May 15 09:59:26 2023
Errors in file /u01/prod/proddb/9.2.0/admin/udump/prod1_ora_844004.trc:
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4193], [56042], [1895], [], [], [], [], []
Error 607 happened during db open, shutting down database
USER: terminating instance due to error 607
Instance terminated by USER, pid = 844004
ORA-1092 signalled during: alter database open...

绕过该错误之后,数据库启动报ORA-600 2662错误

$ sqlplus "/ as sysdba"

SQL*Plus: Release 9.2.0.6.0 - Production on Mon May 15 10:04:44 2023

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

Connected to an idle instance.

SQL> startup mount pfile='/tmp/pfile'
ORACLE instance started.

Total System Global Area 1102023336 bytes
Fixed Size                   744104 bytes
Variable Size             922746880 bytes
Database Buffers          167772160 bytes
Redo Buffers               10760192 bytes
Database mounted.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Mon May 15 10:05:03 2023
SMON: enabling cache recovery
Mon May 15 10:05:03 2023
ARC0: Media recovery disabled
Mon May 15 10:05:03 2023
SMON: enabling tx recovery
Mon May 15 10:05:03 2023
Database Characterset is ZHS16GBK
Mon May 15 10:05:03 2023
Errors in file /u01/prod/proddb/9.2.0/admin/bdump/prod1_smon_413880.trc:
ORA-00600: internal error code, arguments: [2662], [3301], [2638409995], [3301], [2644132966], [4195678]
Mon May 15 10:05:04 2023
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Mon May 15 10:05:04 2023
Errors in file /u01/prod/proddb/9.2.0/admin/bdump/prod1_smon_413880.trc:
ORA-00600: internal error code, arguments: [2662], [3301], [2638409998], [3301], [2644132966], [4195678]
Mon May 15 10:05:04 2023
Errors in file /u01/prod/proddb/9.2.0/admin/bdump/prod1_smon_413880.trc:
ORA-00600: internal error code, arguments: [2662], [3301], [2638409998], [3301], [2644132966], [4195678]
SMON: terminating instance due to error 600
Instance terminated by SMON, pid = 413880

解决该错误之后,数据库open正常

$ sqlplus "/ as sysdba"

SQL*Plus: Release 9.2.0.6.0 - Production on Mon May 15 10:10:30 2023

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

Connected to an idle instance.

SQL> startup mount pfile='/tmp/pfile'
ORACLE instance started.

Total System Global Area 1102023336 bytes
Fixed Size                   744104 bytes
Variable Size             922746880 bytes
Database Buffers          167772160 bytes
Redo Buffers               10760192 bytes
Database mounted.
SQL> alter database open;

Database altered.

逻辑方式导出数据,本次恢复任务基本完成.
以前有过的类似恢复案例(类似较多选择典型几个):
ORA-600 2663
ORA-600 2663 故障恢复
ORA-600 2662
ora-600 2662和ora-600 kclchkblk_4恢复
redo异常 ORA-600 kclchkblk_4 故障恢复
ORA-600 4193 错误说明和解决
ORA-00600 [2662]和ORA-00600 [4194]恢复

udev_start导致vip漂移(参见情况:rac在线加盘操作引起)

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:udev_start导致vip漂移(参见情况:rac在线加盘操作引起)

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户对asm进行扩容,执行udev_start命令之后,所有的vip全部漂移,业务全部中断
20230513203654


优先恢复业务,把所有vip漂移回来

[grid@rac3 ~]$  srvctl relocate vip -i rac1 -n rac1 -f -v
VIP was relocated successfully.
[grid@rac3 ~]$  srvctl relocate vip -i rac2 -n rac2 -f -v
VIP was relocated successfully.
[grid@rac3 ~]$  srvctl relocate vip -i rac3 -n rac3 -f -v
VIP was relocated successfully.
[grid@rac3 ~]$  srvctl relocate vip -i rac4 -n rac4 -f -v
VIP was relocated successfully.

vip恢复正常,业务也恢复正常
20230513203712


出现该问题的原因是由于udev_start命令引起网卡瞬间中断,从而使得vip发生漂移
20230513212316

查看ifcfg配置文件
20230513212440

引起该问题的原因是udev对网卡进行了操作,从而引起该问题,处理建议在对应的ifcfg文件中加上 HOTPLUG=”no” (pulbic,private和其他需要关注的网络)
参考:Network interface going down when dynamically adding disks to storage using udev in RHEL 6 (Doc ID 1569028.1)
20230513213713

又一例ORA-600 kcbzpbuf_1恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:又一例ORA-600 kcbzpbuf_1恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库突然报ORA-600 kdddgb1和ORA-600 kcl_snd_cur_2错误,并且导致实例crash

Tue May 09 22:29:40 2023
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_338012.trc  (incident=962050):
ORA-00600: internal error code, arguments: [kdddgb1], [0], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_962050/orcl1_ora_338012_i962050.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue May 09 22:29:43 2023
Hex dump of (file 75, block 1154926) in trace file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lms3_217928.trc
Corrupt block relative dba: 0x12d19f6e (file 75, block 1154926)
Bad header found during preparing block for transfer
Data in bad block:
 type: 0 format: 2 rdba: 0x1affe051
 last change scn: 0x0009.a2266e65 seq: 0x2 flg: 0x10
 spare1: 0x83 spare2: 0x36 spare3: 0x3700
 consistency value in tail: 0x6e650002
 check value in block header: 0x0
 block checksum disabled
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lms3_217928.trc  (incident=960186):
ORA-00600: internal error code, arguments: [kcl_snd_cur_2], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_960186/orcl1_lms3_217928_i960186.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue May 09 22:29:43 2023
Sweep [inc][962050]: completed
Sweep [inc][960186]: completed
Sweep [inc2][962050]: completed
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lms3_217928.trc:
ORA-00600: internal error code, arguments: [kcl_snd_cur_2], [], [], [], [], [], [], [], [], [], [], []
LMS3 (ospid: 217928): terminating the instance due to error 484
System state dump requested by (instance=1, osid=217928 (LMS3)), summary=[abnormal instance termination].
System State dumped to trace file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_217897_20230509222949.trc
Tue May 09 22:29:52 2023
ORA-1092 : opitsk aborting process
Tue May 09 22:29:53 2023
ORA-1092 : opitsk aborting process
Tue May 09 22:29:54 2023
Instance terminated by LMS3, pid = 217928

另外一个正在运行的实例做instance recovery,然后节点报ORA-600 kcbzpbuf_1,节点也crash,再次启动一直该错误无法正常启动.

Wed May 10 08:17:07 2023
Hex dump of (file 75, block 1154926) in trace file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbw9_134621.trc
Corrupt block relative dba: 0x12d19f6e (file 75, block 1154926)
Bad header found during preparing block for write
Data in bad block:
 type: 0 format: 2 rdba: 0x1affe051
 last change scn: 0x0009.a2266e65 seq: 0x2 flg: 0x34
 spare1: 0x83 spare2: 0x36 spare3: 0x3700
 consistency value in tail: 0x6e650002
 check value in block header: 0xf894
 computed block checksum: 0x0
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbw9_134621.trc  (incident=2240402):
ORA-00600: internal error code, arguments: [kcbzpbuf_1], [4], [1], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_2240402/orcl1_dbw9_134621_i2240402.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbw9_134621.trc:
ORA-00600: internal error code, arguments: [kcbzpbuf_1], [4], [1], [], [], [], [], [], [], [], [], []
DBW9 (ospid: 134621): terminating the instance due to error 471
Wed May 10 08:17:08 2023
System state dump requested by (instance=1, osid=134621 (DBW9)), summary=[abnormal instance termination].
System State dumped to trace file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_134555_20230510081708.trc
Instance terminated by DBW9, pid = 134621

尝试直接recover datafile 75失败,报ORA-03113

SQL> recover datafile 75;
ORA-03113: end-of-file on communication channel
Process ID: 281304
Session ID: 14161 Serial number: 1503

dbv检查file 75,发现15个block逻辑坏块

[oracle@oradb21 ~]$ dbv userid=xxx/xxx file=+datadg/orcl/datafile/xifenfei01.377.1130539753

DBVERIFY: Release 11.2.0.4.0 - Production on Wed May 10 08:29:44 2023

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = +datadg/orcl/datafile/xifenfei01.377.1130539753
Block Checking: DBA = 314866909, Block Type = KTB-managed data block
data header at 0x7f852b573064
kdbchk: row locked by non-existent transaction
        table=0   slot=13
        lockid=101   ktbbhitc=2
Page 294109 failed with check code 6101
Block Checking: DBA = 314866928, Block Type = KTB-managed data block
data header at 0x7f852b599064
kdbchk: row locked by non-existent transaction
        table=0   slot=18
        lockid=101   ktbbhitc=2
Page 294128 failed with check code 6101
Block Checking: DBA = 315415269, Block Type = KTB-managed data block
data header at 0x7f852b583064
kdbchk: the amount of space used is not equal to block size
        used=7470 fsc=0 avsp=625 dtl=8088
Page 842469 failed with check code 6110
Block Checking: DBA = 315415302, Block Type = KTB-managed data block
data header at 0x7f852b3c3064
kdbchk: row locked by non-existent transaction
        table=0   slot=13
        lockid=101   ktbbhitc=2
Page 842502 failed with check code 6101
Block Checking: DBA = 315415350, Block Type = KTB-managed data block
data header at 0x7f852b423064
kdbchk: row locked by non-existent transaction
        table=0   slot=14
        lockid=101   ktbbhitc=2
Page 842550 failed with check code 6101
Block Checking: DBA = 315415351, Block Type = KTB-managed data block
data header at 0x7f852b425064
kdbchk: row locked by non-existent transaction
        table=0   slot=10
        lockid=101   ktbbhitc=2
Page 842551 failed with check code 6101
Block Checking: DBA = 315415397, Block Type = KTB-managed data block
data header at 0x7f852b481064
kdbchk: row locked by non-existent transaction
        table=0   slot=14
        lockid=101   ktbbhitc=2
Page 842597 failed with check code 6101
Block Checking: DBA = 315415414, Block Type = KTB-managed data block
data header at 0x7f852b4a3064
kdbchk: row locked by non-existent transaction
        table=0   slot=14
        lockid=101   ktbbhitc=2
Page 842614 failed with check code 6101
Block Checking: DBA = 315665300, Block Type = KTB-managed data block
data header at 0x7f852b2dd0ac
kdbchk: the amount of space used is not equal to block size
        used=7191 fsc=0 avsp=832 dtl=8016
Page 1092500 failed with check code 6110
Block Checking: DBA = 315665302, Block Type = KTB-managed data block
data header at 0x7f852b2e10ac
kdbchk: row locked by non-existent transaction
        table=0   slot=14
        lockid=101   ktbbhitc=5
Page 1092502 failed with check code 6101
Block Checking: DBA = 315665316, Block Type = KTB-managed data block
data header at 0x7f852b2fd0ac
kdbchk: the amount of space used is not equal to block size
        used=7140 fsc=0 avsp=883 dtl=8016
Page 1092516 failed with check code 6110
Block Checking: DBA = 315665491, Block Type = KTB-managed data block
data header at 0x7f852f4170c4
kdbchk: row locked by non-existent transaction
        table=0   slot=3
        lockid=101   ktbbhitc=6
Page 1092691 failed with check code 6101
Block Checking: DBA = 315727518, Block Type = KTB-managed data block
data header at 0x7f852b4f50c4
kdbchk: row locked by non-existent transaction
        table=0   slot=8
        lockid=101   ktbbhitc=6
Page 1154718 failed with check code 6101
Block Checking: DBA = 315727614, Block Type = KTB-managed data block
data header at 0x7f852b5b50ac
kdbchk: row locked by non-existent transaction
        table=0   slot=15
        lockid=101   ktbbhitc=5
Page 1154814 failed with check code 6101
Block Checking: DBA = 315727646, Block Type = KTB-managed data block
data header at 0x7f852b3f30ac
kdbchk: row locked by non-existent transaction
        table=0   slot=3
        lockid=101   ktbbhitc=5
Page 1154846 failed with check code 6101


DBVERIFY - Verification complete

Total Pages Examined         : 1835008
Total Pages Processed (Data) : 250749
Total Pages Failing   (Data) : 15
Total Pages Processed (Index): 74532
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 1244181
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 265546
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 2720428335 (9.2720428335)

通过对坏块一些处理,数据库open成功,以前有过类似恢复ORA-600 kcbzpbuf_1故障恢复

SQL> alter database open;

Database altered.

alert日志报事务异常

ORACLE Instance orcl1 (pid = 34) - Error 1578 encountered while recovering transaction (697, 6) on object 170692.
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_smon_301450.trc:
ORA-01578: ORACLE data block corrupted (file # 75, block # 1154926)
ORA-01110: data file 75: '+DATADG/orcl/datafile/xifenfei01.377.1130539753'
Archived Log entry 9299 added for thread 1 sequence 4781 ID 0x5f4a1865 dest 1:
Wed May 10 08:24:03 2023
NOTE: dependency between database orcl and diskgroup resource ora.ARCHDG.dg is established
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Wed May 10 08:24:04 2023
Starting background process EMNC
Wed May 10 08:24:04 2023
EMNC started with pid=49, OS id=305303 
Archived Log entry 9300 added for thread 2 sequence 4530 ID 0x5f4a1865 dest 1:
ARC2: Archiving disabled thread 2 sequence 4531
Archived Log entry 9301 added for thread 2 sequence 4531 ID 0x5f4a1865 dest 1:
Wed May 10 08:24:13 2023
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_p000_305307.trc  (incident=2560578):
ORA-01578: ORACLE data block corrupted (file # 75, block # 1154926)
ORA-01110: data file 75: '+DATADG/orcl/datafile/xifenfei01.377.1130539753'
Incident details in: /oracle/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_2560578/orcl1_p000_305307_i2560578.trc
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_p000_305307.trc  (incident=2560579):
ORA-01578: ORACLE data block corrupted (file # , block # )
Incident details in: /oracle/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_2560579/orcl1_p000_305307_i2560579.trc
Wed May 10 08:24:15 2023
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_smon_301450.trc  (incident=2560427):
ORA-01578: ORACLE data block corrupted (file # 75, block # 1154926)
ORA-01110: data file 75: '+DATADG/orcl/datafile/xifenfei01.377.1130539753'
Incident details in: /oracle/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_2560427/orcl1_smon_301450_i2560427.trc
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_smon_301450.trc  (incident=2560432):
ORA-01578: ORACLE data block corrupted (file # 75, block # 1154926)
ORA-01110: data file 75: '+DATADG/orcl/datafile/xifenfei01.377.1130539753'
ORACLE Instance orcl1 (pid = 34) - Error 1578 encountered while recovering transaction (717, 20) on object 170692.
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_smon_301450.trc:
ORA-01578: ORACLE data block corrupted (file # 75, block # 1154926)
ORA-01110: data file 75: '+DATADG/orcl/datafile/xifenfei01.377.1130539753'

处理异常事务,并且定位异常对象表

SQL> select owner,object_name,object_type from dba_objects where object_id=170692;

OWNER
--------------------------------------------------------------------------------
OBJECT_NAME
--------------------------------------------------------------------------------
OBJECT_TYPE
---------------------------------------------------------
XFF
T_XIFENFEI
TABLE

rman检测逻辑坏块所属对象也是这个表(15个坏块均为该表),对该表数据进行重建抛弃损坏数据,完成本次恢复

ORA-01172 ORA-01151 故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-01172 ORA-01151 故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

节点2报Error: Controlfile sequence number in file header is different from the one in memory,导致实例异常

Tue May 09 23:03:24 2023
Thread 2 cannot allocate new log, sequence 16728
Checkpoint not complete
  Current log# 3 seq# 16727 mem# 0: +DATA/xff/onlinelog/group_3.265.941900045
  Current log# 3 seq# 16727 mem# 1: +FRA/xff/onlinelog/group_3.259.941900045
Thread 2 advanced to log sequence 16728 (LGWR switch)
  Current log# 4 seq# 16728 mem# 0: +DATA/xff/onlinelog/group_4.266.941900045
  Current log# 4 seq# 16728 mem# 1: +FRA/xff/onlinelog/group_4.260.941900045
Tue May 09 23:03:31 2023
LNS: Standby redo logfile selected for thread 2 sequence 16728 for destination LOG_ARCHIVE_DEST_2
Tue May 09 23:03:32 2023
Archived Log entry 431615 added for thread 2 sequence 16727 ID 0x5ffc99b5 dest 1:
Tue May 09 23:05:30 2023
Error: Controlfile sequence number in file header is different from the one in memory
       Please check that the correct mount options are used if controlfile is located on NFS
USER (ospid: 30162): terminating the instance
Tue May 09 23:05:30 2023
System state dump requested by (instance=2, osid=30162), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_diag_6650.trc
Instance terminated by USER, pid = 30162

在节点1 进行实例重组之后,节点1 实例异常

Tue May 09 23:04:54 2023
Thread 1 cannot allocate new log, sequence 2060
Checkpoint not complete
  Current log# 1 seq# 2059 mem# 0: +DATA/xff/onlinelog/group_1.261.941899887
  Current log# 1 seq# 2059 mem# 1: +FRA/xff/onlinelog/group_1.257.941899887
Thread 1 advanced to log sequence 2060 (LGWR switch)
  Current log# 2 seq# 2060 mem# 0: +DATA/xff/onlinelog/group_2.262.941899889
  Current log# 2 seq# 2060 mem# 1: +FRA/xff/onlinelog/group_2.258.941899889
Tue May 09 23:04:58 2023
********************* ATTENTION: ******************** 
 The controlfile header block returned by the OS
 has a sequence number that is too old. 
 The controlfile might be corrupted.
 PLEASE DO NOT ATTEMPT TO START UP THE INSTANCE 
 without following the steps below.
 RE-STARTING THE INSTANCE CAN CAUSE SERIOUS DAMAGE 
 TO THE DATABASE, if the controlfile is truly corrupted.
 In order to re-start the instance safely, 
 please do the following:
 (1) Save all copies of the controlfile for later 
     analysis and contact your OS vendor and Oracle support.
 (2) Mount the instance and issue: 
     ALTER DATABASE BACKUP CONTROLFILE TO TRACE;
 (3) Unmount the instance. 
 (4) Use the script in the trace file to
     RE-CREATE THE CONTROLFILE and open the database. 
*****************************************************
Tue May 09 23:05:31 2023
Reconfiguration started (old inc 20, new inc 22)
List of instances:
 1 (myinst: 1) 
 Global Resource Directory frozen
 * dead instance detected - domain 0 invalid = TRUE 
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
Tue May 09 23:05:31 2023
 LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue May 09 23:05:31 2023
 LMS 0: 3 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info 
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Post SMON to start 1st pass IR
Tue May 09 23:05:32 2023
Instance recovery: looking for dead threads
 Submitted all GCS remote-cache requests
 Post SMON to start 1st pass IR
 Fix write in gcs resources
Reconfiguration complete
Tue May 09 23:06:00 2023
ARC1 (ospid: 26512): terminating the instance
Tue May 09 23:06:00 2023
System state dump requested by (instance=1, osid=26512 (ARC1)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_diag_26311.trc
Tue May 09 23:06:01 2023
ORA-1092 : opitsk aborting process
Instance terminated by ARC1, pid = 26512

实例重启报错

Recovery of Online Redo Log: Thread 1 Group 1 Seq 2059 Reading mem 0
  Mem# 0: +DATA/dbm/onlinelog/group_1.261.941899887
  Mem# 1: +FRA/dbm/onlinelog/group_1.257.941899887
Recovery of Online Redo Log: Thread 2 Group 3 Seq 16727 Reading mem 0
  Mem# 0: +DATA/dbm/onlinelog/group_3.265.941900045
  Mem# 1: +FRA/dbm/onlinelog/group_3.259.941900045
Recovery of Online Redo Log: Thread 2 Group 4 Seq 16728 Reading mem 0
  Mem# 0: +DATA/dbm/onlinelog/group_4.266.941900045
  Mem# 1: +FRA/dbm/onlinelog/group_4.260.941900045
Hex dump of (file 1, block 102777) in trace file /u01/app/oracle/diag/rdbms/dbm/dbm2/trace/dbm2_ora_30749.trc
Reading datafile '+DATA/dbm/datafile/system.256.941899799' for corruption at rdba: 0x00419179 (file 1, block 102777)
Reread (file 1, block 102777) found different corrupt data (logically corrupt)
Hex dump of (file 1, block 102777) in trace file /u01/app/oracle/diag/rdbms/dbm/dbm2/trace/dbm2_ora_30749.trc
RECOVERY OF THREAD 2 STUCK AT BLOCK 102777 OF FILE 1
Abort recovery for domain 0
Aborting crash recovery due to error 1172
Errors in file /u01/app/oracle/diag/rdbms/dbm/dbm2/trace/dbm2_ora_30749.trc:
ORA-01172: recovery of thread 2 stuck at block 102777 of file 1
ORA-01151: use media recovery to recover block, restore backup if needed
Abort recovery for domain 0
Errors in file /u01/app/oracle/diag/rdbms/dbm/dbm2/trace/dbm2_ora_30749.trc:
ORA-01172: recovery of thread 2 stuck at block 102777 of file 1
ORA-01151: use media recovery to recover block, restore backup if needed
ORA-1172 signalled during: ALTER DATABASE OPEN /* db agent *//* {0:890:17} */...

人工recover操作失败报ORA-600 3020错误

SQL> recover datafile 1;
ORA-00283: recovery session canceled due to errors
ORA-00600: internal error code, arguments: [3020], [1], [102777], [4297081],[], []
ORA-10567: Redo is inconsistent with data block (file# 1, block# 102777, file
offset is 841949184 bytes)
ORA-10564: tablespace SYSTEM
ORA-01110: data file 1: '+DATA/dbm/datafile/system.256.941899799'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 469884

---alert日志
Tue May 09 23:28:44 2023
ALTER DATABASE RECOVER  datafile 1  
Media Recovery Start
Serial Media Recovery started
Recovery of Online Redo Log: Thread 2 Group 3 Seq 16727 Reading mem 0
  Mem# 0: +DATA/xff/onlinelog/group_3.265.941900045
  Mem# 1: +FRA/xff/onlinelog/group_3.259.941900045
ORA-279 signalled during: ALTER DATABASE RECOVER  datafile 1  ...
ALTER DATABASE RECOVER    CONTINUE DEFAULT  
Media Recovery Log +FRA/xff/archivelog/2023_05_09/thread_1_seq_2055.20899.1136415701
ORA-279 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
ALTER DATABASE RECOVER    CONTINUE DEFAULT  
Media Recovery Log +FRA/xff/archivelog/2023_05_09/thread_1_seq_2056.20837.1136415753
ORA-279 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
ALTER DATABASE RECOVER    CONTINUE DEFAULT  
Media Recovery Log +FRA/xff/archivelog/2023_05_09/thread_1_seq_2057.20911.1136415803
ORA-279 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
ALTER DATABASE RECOVER    CONTINUE DEFAULT  
Media Recovery Log +FRA/xff/archivelog/2023_05_09/thread_1_seq_2058.21898.1136415853
Recovery of Online Redo Log: Thread 2 Group 4 Seq 16728 Reading mem 0
  Mem# 0: +DATA/xff/onlinelog/group_4.266.941900045
  Mem# 1: +FRA/xff/onlinelog/group_4.260.941900045
Recovery of Online Redo Log: Thread 1 Group 1 Seq 2059 Reading mem 0
  Mem# 0: +DATA/xff/onlinelog/group_1.261.941899887
  Mem# 1: +FRA/xff/onlinelog/group_1.257.941899887
Hex dump of (file 1, block 102777) in trace file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_16246.trc
Reading datafile '+DATA/xff/datafile/system.256.941899799' for corruption at rdba: 0x00419179 (file 1, block 102777)
Reread (file 1, block 102777) found different corrupt data (logically corrupt)
Hex dump of (file 1, block 102777) in trace file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_16246.trc
Tue May 09 23:28:59 2023
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_16246.trc  (incident=6868615):
ORA-00600: internal error code, arguments: [3020], [1], [102777], [4297081], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 1, block# 102777, file offset is 841949184 bytes)
ORA-10564: tablespace SYSTEM
ORA-01110: data file 1: '+DATA/xff/datafile/system.256.941899799'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 469884
Incident details in: /u01/app/oracle/diag/rdbms/xff/xff1/incident/incdir_6868615/xff1_ora_16246_i6868615.trc
Tue May 09 23:29:00 2023
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Media Recovery failed with error 600
ORA-283 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
ALTER DATABASE RECOVER CANCEL 
ORA-1112 signalled during: ALTER DATABASE RECOVER CANCEL ...

根据上述报错信息可以确认报错的是一个index,而且非系统核心对象,可以通过allow 1 corruption方式进行恢复,并且open库成功

SQL> recover  datafile 1 allow 1 corruption;
Media recovery complete.
SQL> alter database open;

Database altered.

SQL> select owner,object_name,object_type from dba_objects where object_id=469884;

OWNER
--------------------------------------------------------------------------------
OBJECT_NAME
--------------------------------------------------------------------------------
OBJECT_TYPE
---------------------------------------------------------
SYSTEM
PK_XFF_SERVERS
INDEX

SQL> alter index system.PK_XFF_SERVERS rebuild online;

Index altered.

数据库完美恢复,数据0丢失,业务可以直接正常使用

存储双活同步导致数据库异常恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:存储双活同步导致数据库异常恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户双活存储异常之后,单个存储运行,故障存储修复之后,双活同步,出现多套系统异常,上一篇:Control file mount id mismatch!故障处理,这套是win的rac无法正常启动,ocr磁盘组异常(报ORA-600 kfrValAcd30无法正常mount)

C:\Users\Administrator>crsctl start cluster -all
CRS-2672: 尝试启动 'ora.crf' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.asm' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.crf' (在 'xff1' 上)
CRS-2672: 尝试启动 'ora.asm' (在 'xff1' 上)
CRS-2676: 成功启动 'ora.crf' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.crf' (在 'xff1' 上)
CRS-5017: 资源操作 "ora.asm start" 遇到以下错误:
ORA-00600: internal error code, arguments: [kfrValAcd30], [OCR_VOTE], [1], [14], [7556], [15], [7584], [], [], [], [], []
。有关详细信息, 请参阅 "(:CLSN00107:)" (位于 "F:\app\grid\Administrator\diag\crs\xff2\crs\trace\ohasd_oraagent_system.trc" 中)。
CRS-2674: 未能启动 'ora.asm' (在 'xff2' 上)
CRS-2679: 尝试清除 'ora.asm' (在 'xff2' 上)
CRS-5017: 资源操作 "ora.asm start" 遇到以下错误:
ORA-00600: internal error code, arguments: [kfrValAcd30], [OCR_VOTE], [1], [14], [7556], [15], [7584], [], [], [], [], []
。有关详细信息, 请参阅 "(:CLSN00107:)" (位于 "F:\app\grid\Administrator\diag\crs\xff1\crs\trace\ohasd_oraagent_system.trc" 中)。
CRS-2674: 未能启动 'ora.asm' (在 'xff1' 上)
CRS-2679: 尝试清除 'ora.asm' (在 'xff1' 上)
CRS-2681: 成功清除 'ora.asm' (在 'xff2' 上)
CRS-2673: 尝试停止 'ora.crf' (在 'xff2' 上)
CRS-2677: 成功停止 'ora.crf' (在 'xff2' 上)
CRS-2681: 成功清除 'ora.asm' (在 'xff1' 上)
CRS-2673: 尝试停止 'ora.crf' (在 'xff1' 上)
CRS-2677: 成功停止 'ora.crf' (在 'xff1' 上)
CRS-4705: 无法在节点 xff1 上启动集群件。
CRS-4705: 无法在节点 xff2 上启动集群件。
CRS-4000: 命令 Start 失败, 或已完成但出现错误。

因为是ocr磁盘组操作比较简单,直接重建该磁盘组,还原ocr等即可

C:\Users\Administrator>asmtool -list
NTFS                             \Device\Harddisk0\Partition1              300M
NTFS                             \Device\Harddisk0\Partition4           599472M
NTFS                             \Device\Harddisk0\Partition5          1000000M
ORCLDISKDATA0                    \Device\Harddisk1\Partition1          1048587M
ORCLDISKDATA1                    \Device\Harddisk2\Partition1          1048587M
ORCLDISKDATA2                    \Device\Harddisk3\Partition1          1048587M
ORCLDISKDATA3                    \Device\Harddisk4\Partition1          1048587M
ORCLDISKDATA4                    \Device\Harddisk6\Partition1           460797M

C:\Users\Administrator>crsctl start crs -excl -nocrs
CRS-4123: Oracle 高可用性服务已启动。
CRS-2672: 尝试启动 'ora.evmd' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.mdnsd' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.mdnsd' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.evmd' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.gpnpd' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.gpnpd' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.cssdmonitor' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.gipcd' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.cssdmonitor' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.gipcd' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.cssd' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.cssd' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.ctssd' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.ctssd' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.asm' (在 'xff2' 上)
CRS-5017: 资源操作 "ora.asm start" 遇到以下错误:
ORA-00600: internal error code, arguments: [kfrValAcd30], [OCR_VOTE], [1], [14], [7556], [15], [7584], [], [], [], [], []
。有关详细信息, 请参阅 "(:CLSN00107:)" (位于 "F:\app\grid\Administrator\diag\crs\xff2\crs\trace\ohasd_oraagent_system.trc" 中)。
CRS-2674: 未能启动 'ora.asm' (在 'xff2' 上)
CRS-2679: 尝试清除 'ora.asm' (在 'xff2' 上)
CRS-2681: 成功清除 'ora.asm' (在 'xff2' 上)
CRS-2673: 尝试停止 'ora.ctssd' (在 'xff2' 上)
CRS-2677: 成功停止 'ora.ctssd' (在 'xff2' 上)
CRS-4000: 命令 Start 失败, 或已完成但出现错误。

C:\Users\Administrator>sqlplus / as sysasm

SQL*Plus: Release 12.1.0.2.0 Production on 星期四 5月 4 13:52:07 2023

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

已连接到空闲例程。

SQL> startup nomount pfile='f:/pfile_asm.txt';
ASM 实例已启动

Total System Global Area 1140850688 bytes
Fixed Size                  3054680 bytes
Variable Size            1112630184 bytes
ASM Cache                  25165824 bytes

SQL>  create diskgroup OCR_VOTE  external redundancy disk '\\.\ORCLDISKDATA4' force  attribute 'COMPATIBLE.ASM' = '12.1.0';

Diskgroup created.

F:\>ocrconfig -restore backup00.ocr

F:\>crsctl replace votedisk +OCR_VOTE
已成功添加表决磁盘 e2b8fdbd05ae4f9fbf3531630853dbbc。
已成功将表决磁盘组替换为 +OCR_VOTE。
CRS-4266: 已成功替换表决文件

F:\>crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   e2b8fdbd05ae4f9fbf3531630853dbbc (\\.\ORCLDISKDATA4) [OCR_VOTE]
找到了 1 个表决磁盘。

F:\>ocrcheck
Oracle 集群注册表的状态如下:
         版本                  :          4
         总空间 (KB)     :     409568
         已用空间 (KB)      :       1348
         可用空间 (KB):     408220
         ID                       :  820087446
         设备/文件名         :  +OCR_VOTE
                                    设备/文件完整性检查成功

                                    设备/文件尚未配置

                                    设备/文件尚未配置

                                    设备/文件尚未配置

                                    设备/文件尚未配置

         集群注册表完整性检查成功

         逻辑损坏检查成功

mount其他磁盘组成功

SQL> alter diskgroup arch mount;

Diskgroup altered.

SQL>


SQL> alter diskgroup data mount;

Diskgroup altered.

尝试恢复数据库失败

C:\Users\Administrator>sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on 星期四 5月 4 14:09:39 2023

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

已连接到空闲例程。

SQL> startup mount;
ORACLE 例程已经启动。

Total System Global Area 2.0992E+11 bytes
Fixed Size                  7797816 bytes
Variable Size            1.3798E+11 bytes
Database Buffers         7.1672E+10 bytes
Redo Buffers              260636672 bytes
数据库装载完毕。

SQL> recover database;
ORA-10562: Error occurred while applying redo to data block (file# 13, block#1033775)
ORA-10564: tablespace USERS
ORA-01110: 数据文件 13: '+DATA/XFF/users07.dbf'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 40396
ORA-00600: 内部错误代码, 参数: [kdolkr-2], [2], [1], [44], [], [], [], [], [],[], [], []


SQL> recover datafile 2;
ORA-00283: 恢复会话因错误而取消
ORA-00742: 日志读取在线程 1 序列 60656 块 1150508 中检测到写入丢失情况
ORA-00312: 联机日志 3 线程 1: '+DATA/XFF/redo03.log'


SQL> recover datafile 1;
ORA-00283: 恢复会话因错误而取消
ORA-00742: 日志读取在线程 1 序列 60656 块 1150508 中检测到写入丢失情况
ORA-00312: 联机日志 3 线程 1: '+DATA/XFF/redo03.log'

SQL> recover datafile 10;
ORA-00283: ??????????
ORA-10562: Error occurred while applying redo to data block (file# 10, block#
2899468)
ORA-10564: tablespace USERS
ORA-01110: ???? 10: '+DATA/XFF/users04.dbf'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 40396
ORA-00600: ??????, ??: [ktbair2: illegal  inheritance], [], [], [], [], [], [],[], [], [], [], []

除了ORA-00742,还有其他一些日志应用错误,比如:ORA-600 ktbair2: illegal inheritance,ORA-600 kdolkr-2等,无法正常应用日志,尝试强制打开库,报ORA-600 2662错误.

SQL> alter database open resetlogs;
alter database open resetlogs
*
第 1 行出现错误:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [8], [678024613], [8],
[678508930], [12583040], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [2662], [8], [678024612], [8],
[678508930], [12583040], [], [], [], [], [], []
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [8], [678024610], [8],
[678508930], [12583040], [], [], [], [], [], []
进程 ID: 4628
会话 ID: 996 序列号: 48547

通过自研的Patch_SCN工具快速解决该问题
20230507195805


open数据库成功,实现最大限度抢救客户数据.

Control file mount id mismatch!故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Control file mount id mismatch!故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

通过沟通确认客户由于存储双活异常,业务运行在主存储上,另外一套存储修复之后,进行存储双活同步,结果在这个过程中由于遭遇Control file mount id mismatch! 导致数据库crash了

2023-05-03T20:21:07.446873+08:00
Archived Log entry 491897 added for T-1.S-246903 ID 0x97d92f0b LAD:1
2023-05-03T20:47:53.902701+08:00
Error: 2141
Control file mount id mismatch!
fhmid: 2592441863, SGA mid: 2624617448
Requesting DIAG on each RAC instance to dump the control file header block
2023-05-03T20:47:55.906490+08:00
Errors in file /opt/rac/oracle/diag/rdbms/xff/xff1/trace/xff1_rms0_20989.trc:
2023-05-03T20:47:56.521500+08:00
RMS0 (ospid: 20989): terminating the instance
2023-05-03T20:47:56.610656+08:00
System state dump requested by (instance=1, osid=20989 (RMS0)), summary=[abnormal instance termination].
System State dumped to trace file /opt/rac/oracle/diag/rdbms/xff/xff1/trace/xff1_diag_20912_20230503204756.trc
2023-05-03T20:47:58.480397+08:00
License high water mark = 395
2023-05-03T20:48:02.600203+08:00
Instance terminated by RMS0, pid = 20989
2023-05-03T20:48:02.601563+08:00
Warning: 2 processes are still attach to shmid 393226:
 (size: 28672 bytes, creator pid: 19941, last attach/detach pid: 20912)
2023-05-03T20:48:03.481726+08:00
USER (ospid: 967): terminating the instance
2023-05-03T20:48:03.483351+08:00
Instance terminated by USER, pid = 967

节点自动重启报错ORA-600 kccsbck_first

2023-05-03T20:48:34.870435+08:00
NOTE: ASMB mounting group 2 (FRA)
NOTE: ASM background process initiating disk discovery for grp 2 (reqid:0)
NOTE: Assigning number (2,1) to disk (/dev/asm_data0g)
NOTE: Assigning number (2,0) to disk (/dev/asm_data0f)
SUCCESS: mounted group 2 (FRA)
NOTE: grp 2 disk 1: FRA_0001 path:/dev/asm_data0g
NOTE: grp 2 disk 0: FRA_0000 path:/dev/asm_data0f
2023-05-03T20:48:34.919965+08:00
NOTE: dependency between database xff and diskgroup resource ora.FRA.dg is established
2023-05-03T20:48:38.983416+08:00
Errors in file /opt/rac/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_2436.trc  (incident=1333249):
ORA-00600: ??????, ??: [kccsbck_first], [1], [2624617448], [], [], [], [], [], [], [], [], []
Incident details in: /opt/rac/oracle/diag/rdbms/xff/xff1/incident/incdir_1333249/xff1_ora_2436_i1333249.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORA-600 signalled during: ALTER DATABASE MOUNT /* db agent *//* {0:8:116} */...

再次重启数据库报错ORA-00742 ORA-00312

2023-05-04T08:18:59.635790+08:00
Aborting crash recovery due to error 742
2023-05-04T08:18:59.635897+08:00
Errors in file /opt/rac/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_80855.trc:
ORA-00742: ??????? 2 ?? 244996 ? 8262 ??????????
ORA-00312: ???? 7 ?? 2: '+FRA/xff/ONLINELOG/group_7.446.1059323695'
ORA-00312: ???? 7 ?? 2: '+DATA/xff/ONLINELOG/group_7.272.1059323695'
Abort recovery for domain 0, flags 4
2023-05-04T08:18:59.647994+08:00
Errors in file /opt/rac/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_80855.trc:
ORA-00742: ??????? 2 ?? 244996 ? 8262 ??????????
ORA-00312: ???? 7 ?? 2: '+FRA/xff/ONLINELOG/group_7.446.1059323695'
ORA-00312: ???? 7 ?? 2: '+DATA/xff/ONLINELOG/group_7.272.1059323695'
ORA-742 signalled during: ALTER DATABASE OPEN /* db agent *//* {2:37368:2} */...
2023-05-04T08:19:00.820708+08:00
License high water mark = 33
2023-05-04T08:19:00.820936+08:00
USER (ospid: 82788): terminating the instance
2023-05-04T08:19:01.827132+08:00
Instance terminated by USER, pid = 82788

明显数据库在启动的时候做实例恢复,发现redo写丢失,从而引起数据库无法正常open,对于此类故障,处理比较多
ORA-00742 ORA-00312 故障恢复-1
ORA-00742 ORA-00312故障恢复-2
ORA-00742: 日志读取在线程 %d 序列 %d 块 %d 中检测到写入丢失情况

Maximum of 148 enabled roles exceeded for user ZLHIS. Not loading all the roles.

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Maximum of 148 enabled roles exceeded for user ZLHIS. Not loading all the roles.

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

中联的his系统在alert日志中经常会看到如下的日志告警

[oracle@oracle1 trace]$ tail -f alert_orcl.log 
Tue May 02 22:06:46 2023
Maximum of 148 enabled roles exceeded for user ZLHIS. Not loading all the roles.
Tue May 02 22:06:50 2023
Maximum of 148 enabled roles exceeded for user ZLHIS. Not loading all the roles.
Tue May 02 22:06:50 2023
Maximum of 148 enabled roles exceeded for user ZLHIS. Not loading all the roles.
Tue May 02 22:06:50 2023
Maximum of 148 enabled roles exceeded for user ZLHIS. Not loading all the roles.
Tue May 02 22:06:50 2023
Maximum of 148 enabled roles exceeded for user ZLHIS. Not loading all the roles.

查询ZLHIS用户当前的role情况

SQL>  select Grantee, count(*) "Role Number" from
  2   (
  3   select distinct connect_by_root grantee Grantee, granted_role
  4   from dba_role_privs 
 connect by prior granted_role=grantee
  5    6   ) where GRANTEE='ZLHIS'
 group by Grantee  7  
  8  /

GRANTEE                        Role Number
------------------------------ -----------
ZLHIS                                  149

虽然max_enabled_roles参数为150

SQL> show parameter MAX_ENABLED_ROLES;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
max_enabled_roles                    integer     150

但是用户支持的默认最大enable role为148个,对于该问题可以把一些角色的权限进行合并,然后再授权给ZLHIS,或者删除掉一些不需要的角色授权.如果一定需要这些角色,而且使其在用户登录的时候enable,可以以下两种常见方法解决

alter user <username> default roles <list of roles>;   
--可以是all或者角色列表

或者在会话中启用role

set roles all;
or
execute dbms_session.set_role('ALL');

参考:What to Check When Dealing With Ora-28031: Maximum Of 148 Enabled Roles Exceeded? (Doc ID 778785.1)

echo 0 > /proc/sys/kernel/hung_task_timeout_secs disables this message

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:echo 0 > /proc/sys/kernel/hung_task_timeout_secs disables this message

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户反馈数据库无法登录,系统ssh也无法登录,但是可以ping通,通过sqlplus sys/pwd@tns as sysdba方式登录成功,直接对数据库进行shutdown abort操作,然后系统可以正常ssh登录.通过分析发现一些io问题

系统messages日志报错
20230430084031


默认情况下, Linux会最多使用40%[根据系统配置决定]的可用内存作为文件系统缓存。当超过这个阈值后,文件系统会把将缓存中的内存全部写入磁盘, 导致后续的IO请求都是同步的。将缓存写入磁盘时,有一个默认120秒的超时时间。 出现上面的问题的原因是IO子系统的处理速度不够快,不能在120秒将缓存中的数据全部写入磁盘。IO系统响应缓慢,导致越来越多的请求堆积,最终系统内存全部被占用,导致系统失去响应。

检查系统io情况
20230430084643

磁盘在io请求很小的情况下busy 100%,属于不正常情况,让客户安排人检查硬盘情况
20230430084802

发现raid 5中有一块磁盘异常从而引起性能下降,客户安排人员换盘之后,系统恢复正常.

调整系统参数缓解
对于linux系统文件系统缓存可以进行调整参数vm.dirty_background_ratio和vm.dirty_ratio为适当值,比如

vm.dirty_background_ratio = 5
vm.dirty_ratio = 10

ORA-600 kzrini:!uprofile处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 kzrini:!uprofile处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

朋友反馈,oracle expdp导出数据hang住,然后重启数据库,启动成功,但是执行导出的时候报ORA-600 kzrini:!uprofile错误
kzrini uprofile


因为有过大量类似故障的恢复case,所以第一反应这种故障很可能就是tab$被清空的故障,参见:警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703,对于这类故障,我们是国内第一个发现并且进行预警(在blog,qq/微信群,oracle技术大会等),但是还有很多朋友中招.今天这个发现故障之后没有重启,比较顺利,进行山删除插入即可,参考以前的处理:ORA-600 16703直接把orachk备份表插入到tab$恢复