Database SOS – 第27页 – Professional Database Recovery Support@+8617813235971

ORA-27154 ORA-27300 ORA-27301 ORA-27302故障处理

标题：ORA-27154 ORA-27300 ORA-27301 ORA-27302故障处理

根据经验对系统的内核参数做了一些调整,结果导致数据库启动失败提示报ORA-27154 ORA-27300 ORA-27301 ORA-27302错误

ORA-27154: post/wait create failed
ORA-27300: OS system dependent operation:semget failed with status: 28
ORA-27301: OS failure message: No space left on device
ORA-27302: failure occurred at: sskgpcreates

根据官方描述:Database Startup Fails with ORA-27300: OS system dependent operation:semget failed with status: 28 (Doc ID 949468.1),出现该问题原因可能是由于kernel.sem参数配置不合适当时,该库的processes配置为:20000,kernel.sem参数配置为:kernel.sem = 250 32000 100 128,参数说明:

kernel.sem = SEMMSL SEMMNS SEMOPM SEMMNI
SEMMSL - max semaphores per array
SEMMNS - max semaphores system wide
SEMOPM - max ops per semop call
SEMMNI - max number of arrays

理论上每个这样的配置最大值SEMMSL*SEMMNI=32000大于process的20000的设置,可是实际上控制每个信号集的信号数量没有达到250,而是只有156,通过ipcs命令可以看

[oracle@xifenfei ~]$ ipcs

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      
0x00000000 32768      oracle     640        33554432   30                      
0x00000000 65537      oracle     640        4261412864 30                      
0xc2d167d0 98306      oracle     640        2097152    30                      
0x00000072 131075     root       444        1          1                       

------ Semaphore Arrays --------
key        semid      owner      perms      nsems   
0x450e15bd 0 	      root       666        1
0x0000cace 32769      root       666        1
0x358b172c 327683     oracle     660        104
0x9053d038 11075588   oracle     660        156
0x9053d039 11108357   oracle     660        156
0x9053d03a 11141126   oracle     660        156
0x9053d03b 11173895   oracle     660        156

从而使得SEMMSL*SEMMNI小于processes值,进而数据库启动报ORA-27154 ORA-27300 ORA-27301 ORA-27302,修改kernel.sem = 250 64000 128 256,数据库启动成功

Patch SCN工具for Linux

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：Patch SCN工具for Linux

前几年开发了图形化的win平台的patch scn工具:一键修改Oracle SCN工具升级(patch scn)

最近基于linux平台开发了命令行方式的Patch scn工具,大概使用如下:
1. 数据库必须处于mount状态才能修改scn

[oracle@iZbp11c0qyuuo1gr7j98upZ tmp]$ ./Patch_SCN   328 247884300
ERROR:Oracle Database must be in Mount state in order to modify SCN

2. Patch_SCN参数提示

[oracle@iZbp11c0qyuuo1gr7j98upZ tmp]$ ./Patch_SCN   
Use Parameters: PID  SCN(10)  [ADDR(16,don't 0x)]

3.修改SCN 具体操作

1)启动数据库到mount状态
SQL> startup mount;
ORACLE instance started.

Total System Global Area  551165952 bytes
Fixed Size                  2255112 bytes
Variable Size             369100536 bytes
Database Buffers          171966464 bytes
Redo Buffers                7843840 bytes
Database mounted.

2)查询数据库当前scn
SQL> select CHECKPOINT_CHANGE# from v$database;

CHECKPOINT_CHANGE#
------------------
           2478843

3)新会话中修改数据库scn
[oracle@iZbp11c0qyuuo1gr7j98upZ tmp]$ ./Patch_SCN   328 247884300
Please press ENTER to continue...
Modify the Oracle SCN value to:EC66A0C:247884300

4)启动数据库并查询scn
SQL> ALTER DATABASE OPEN;

Database altered.

SQL>  select CHECKPOINT_CHANGE# from v$database;

CHECKPOINT_CHANGE#
------------------
         247884301 （比修改scn稍微大由于数据库已经启动会自动增加scn值）

这个小工具直接通过内存地址修改scn,绕过Oracle在一些版本中的oradebug的限制:oradebug poke ORA-32521/ORA-32519故障解决,软件还处于测试阶段,后续稳定后将对外提供软件形式服务.

awr创建snapshot等待library cache: mutex X

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：awr创建snapshot等待library cache: mutex X

客户一个11.2.0.4的库,在准备收集awr的时候发现没有snap id

人工创建snapshot发现hang住了

查询该会话等待事件为:library cache: mutex X,查看以前mmon的子进程m000/1的trace信息

Trace file /u01/app/oracle/diag/rdbms/xff/xff/trace/xff_m000_6241.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1
System name:    Linux
Node name:      HIS
Release:        5.4.17-2102.201.3.el7uek.x86_64
Version:        #2 SMP Fri Apr 23 09:05:55 PDT 2021
Machine:        x86_64
VM name:        VMWare Version: 6
Instance name: xff
Redo thread mounted by this instance: 1
Oracle process number: 5714
Unix process pid: 6241, image: oracle@HIS (M000)


*** 2024-06-19 11:44:39.483
*** SESSION ID:(8709.38013) 2024-06-19 11:44:39.483
*** CLIENT ID:() 2024-06-19 11:44:39.483
*** SERVICE NAME:(SYS$BACKGROUND) 2024-06-19 11:44:39.483
*** MODULE NAME:(MMON_SLAVE) 2024-06-19 11:44:39.483
*** ACTION NAME:(Auto-Flush Slave Action) 2024-06-19 11:44:39.483

DDE rules only execution for: ORA 12751
----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
Executing ASYNC actions
----- START DDE Action: 'ORA_12751_DUMP' (Sync) -----
Runtime exceeded 900 seconds
Time limit violation detected at:
ksedsts()+465<-kspol_12751_dump()+145<-dbgdaExecuteAction()+1065<-dbgerRunAction()+109<-dbgerRunActions()
+4134<-dbgexPhaseII()+1873<-dbgexProcessError()+2680<-dbgeExecuteForError()+88<-dbgePostErrorKGE()+2136<-
dbkePostKGE_kgsf()+71<-kgeselv()+276<-kgesecl0()+139<-kgxWait()+1412<-kgxExclusive()+447<-
kglGetMutex()+140<-kglGetHandleReference()+69<-kglic0()+319<-kksIterCursorStat()+330<-kewrrtsq_rank_topsql()
+240<-kewrbtsq_build_topsql()+128<-kewrftsq_flush_topsql()+679<-kewrft_flush_table()+397<-
kewrftec_flush_table_ehdlcx()+766<-kewrfat_flush_all_tables()+1406<-kewrfos_flush_onesnap()+170
<-kewrfsc_flush_snapshot_c()+623<-kewrafs_auto_flush_slave()+769<-kebm_slave_main()+586<-ksvrdp()+1766
<-opirip()+674<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201
<-__libc_start_main()+245
Current Wait Stack:
 0: waiting for 'library cache: mutex X'
    idn=0x644e2de0, value=0xf3a00000000, where=0x7c
    wait_id=1189 seq_num=1190 snap_id=1
    wait times: snap=3 min 0 sec, exc=3 min 0 sec, total=3 min 0 sec
    wait times: max=infinite, heur=15 min 3 sec
    wait counts: calls=16376 os=16376
    in_wait=1 iflags=0x15b2
There is at least one session blocking this session.
  Dumping 1 direct blocker(s):
    inst: 1, sid: 3898, ser: 47299
  Dumping final blocker:
    inst: 1, sid: 3898, ser: 47299
Wait State:
  fixed_waits=0 flags=0x22 boundary=(nil)/-1
Session Wait History:
    elapsed time of 0.000016 sec since current wait
 0: waited for 'library cache: mutex X'
    idn=0x644e2de0, value=0xf3a00000000, where=0x7c
    wait_id=1188 seq_num=1189 snap_id=1
    wait times: snap=12 min 2 sec, exc=12 min 2 sec, total=12 min 2 sec
    wait times: max=infinite
    wait counts: calls=65535 os=65535
    occurred after 0.327543 sec of elapsed time
 1: waited for 'db file sequential read'
    file#=0x2, block#=0x1a5b, blocks=0x1
    wait_id=1187 seq_num=1188 snap_id=1
    wait times: snap=0.000420 sec, exc=0.000420 sec, total=0.000420 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.000251 sec of elapsed time
 2: waited for 'db file sequential read'
    file#=0x1, block#=0x82e6, blocks=0x1
    wait_id=1186 seq_num=1187 snap_id=1
    wait times: snap=0.000429 sec, exc=0.000429 sec, total=0.000429 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.001085 sec of elapsed time
 3: waited for 'db file sequential read'
    file#=0x2, block#=0x11344, blocks=0x1
    wait_id=1185 seq_num=1186 snap_id=1
    wait times: snap=0.000356 sec, exc=0.000356 sec, total=0.000356 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.000008 sec of elapsed time
 4: waited for 'db file sequential read'
    file#=0x2, block#=0x19eb, blocks=0x1
    wait_id=1184 seq_num=1185 snap_id=1
    wait times: snap=0.000397 sec, exc=0.000397 sec, total=0.000397 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.000044 sec of elapsed time
 5: waited for 'db file sequential read'
    file#=0x2, block#=0xb1659, blocks=0x1
    wait_id=1183 seq_num=1184 snap_id=1
    wait times: snap=0.000003 sec, exc=0.000003 sec, total=0.000003 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.000010 sec of elapsed time
 6: waited for 'db file sequential read'
    file#=0x2, block#=0xb1658, blocks=0x1
    wait_id=1182 seq_num=1183 snap_id=1
    wait times: snap=0.000453 sec, exc=0.000453 sec, total=0.000453 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.000009 sec of elapsed time
 7: waited for 'db file sequential read'
    file#=0x2, block#=0x19e1, blocks=0x1
    wait_id=1181 seq_num=1182 snap_id=1
    wait times: snap=0.000388 sec, exc=0.000388 sec, total=0.000388 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.000017 sec of elapsed time
 8: waited for 'db file sequential read'
    file#=0x2, block#=0x19e2, blocks=0x1
    wait_id=1180 seq_num=1181 snap_id=1
    wait times: snap=0.000415 sec, exc=0.000415 sec, total=0.000415 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.004826 sec of elapsed time
 9: waited for 'db file sequential read'
    file#=0x2, block#=0x2ffc0c, blocks=0x1
    wait_id=1179 seq_num=1180 snap_id=1
    wait times: snap=0.000404 sec, exc=0.000404 sec, total=0.000404 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.000007 sec of elapsed time
Sampled Session History of session 8709 serial 38013
---------------------------------------------------
The sampled session history is constructed by sampling
the target session every 1 second. The sampling process
captures at each sample if the session is in a non-idle wait,
an idle wait, or not in a wait. If the session is in a
non-idle wait then one interval is shown for all the samples
the session was in the same non-idle wait. If the
session is in an idle wait or not in a wait for
consecutive samples then one interval is shown for all
the consecutive samples. Though we display these consecutive
samples  in a single interval the session may NOT be continuously
idle or not in a wait (the sampling process does not know).

The history is displayed in reverse chronological order.

sample interval: 1 sec, max history 120 sec
---------------------------------------------------
  [118 samples,                                            11:42:39 - 11:44:39]
    waited for 'library cache: mutex X', seq_num: 1190
      p1: 'idn'=0x644e2de0
      p2: 'value'=0xf3a00000000
      p3: 'where'=0x7c
      time_waited: >= 120 sec (still in wait)
  [3 samples,                                              11:42:39 - 11:42:38]
    idle wait at each sample
---------------------------------------------------
Sampled Session History Summary:
  longest_non_idle_wait: 'library cache: mutex X'
  [118 samples, 11:42:39 - 11:44:39]
      time_waited: >= 120 sec (still in wait)
---------------------------------------------------
----- END DDE Action: 'ORA_12751_DUMP' (SUCCESS, 8 csec) -----
----- END DDE Actions Dump (total 8 csec) -----
KGX cleanup...
KGX Atomic Operation Log 0x1de44da670
 Mutex 0x1d113cf7c8(8709, 0) idn 2de0 oper EXCL(6)
 Library Cache uid 8709 efd 7 whr 49 slp 0
 oper=0 pt1=(nil) pt2=(nil) pt3=(nil)
 pt4=(nil) pt5=(nil) ub4=0
KGX cleanup...
KGX Atomic Operation Log 0x1de44da6c8
 Mutex 0x1de9468550(3898, 0) idn 644e2de0 oper GET_EXCL(5)
 Library Cache uid 8709 efd 7 whr 124 slp 16376
 oper=0 pt1=0x1de9468410 pt2=(nil) pt3=(nil)
 pt4=(nil) pt5=(nil) ub4=0
*** KEWRAFM1: Error=12751 encountered by kewrfteh
*** KEWRAFS: Error=12751 encountered by Auto Flush Slave.
KEBM: MMON slave action policy violation. kewrmafsa_; viol=1; err=12751

一般来说类似这样的系统自动任务被阻塞很可能是由于那种bug导致,找到相关mos文档: library cache: mutex x waits during AWR Flush High Cursor Scan (Doc ID 2382741.1),确认为:Bug 19294556 AWR Flush Waiting For Cursor Scan, Library Cache : Mutex X，目前没有好的workaround,而且在11.2.0.4基础版上没有对应的patch

exadata换flash卡的一些操作

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：exadata换flash卡的一些操作

客户有一套oracle exadata x3-2的1/4配置（采用高容量磁盘）的机器，反馈由于flash卡异常导致性能很慢,通过临时关闭异常卡所在机器业务恢复正常

相关版本信息

[root@oa0cel03 ~]# imageinfo

Kernel version: 2.6.32-400.11.1.el5uek #1 SMP Thu Nov 22 03:29:09 PST 2012 x86_64
Cell version: OSS_11.2.3.2.1_LINUX.X64_130109
Cell rpm version: cell-11.2.3.2.1_LINUX.X64_130109-1

Active image version: 11.2.3.2.1.130109
Active image activated: 2013-06-27 02:24:19 -0700
Active image status: success
Active system partition on device: /dev/md6
Active software partition on device: /dev/md8

In partition rollback: Impossible

Cell boot usb partition: /dev/sdm1
Cell boot usb version: 11.2.3.2.1.130109

Inactive image version: 11.2.3.2.0.120713
Inactive image activated: 2012-10-14 06:46:16 -0700
Inactive image status: success
Inactive system partition on device: /dev/md5
Inactive software partition on device: /dev/md7

[root@oa0cel03 ~]# cellcli
CellCLI: Release 11.2.3.2.1 - Production on Thu Jun 20 18:28:37 CST 2024

Copyright (c) 2007, 2012, Oracle.  All rights reserved.
Cell Efficiency Ratio: 3,617

CellCLI> list cell detail
	 name:              	 oa0cel03
	 bbuTempThreshold:  	 60
	 bbuChargeThreshold:	 800
	 bmcType:           	 IPMI
	 cellVersion:       	 OSS_11.2.3.2.1_LINUX.X64_130109
	 cpuCount:          	 24
	 diagHistoryDays:   	 7
	 fanCount:          	 8/8
	 fanStatus:         	 normal
	 flashCacheMode:    	 WriteBack
	 id:                	 1238FM507A
	 interconnectCount: 	 3
	 interconnect1:     	 bondib0
	 iormBoost:         	 0.0
	 ipaddress1:        	 192.168.10.5/22
	 kernelVersion:     	 2.6.32-400.11.1.el5uek
	 locatorLEDStatus:  	 off
	 makeModel:         	 Oracle Corporation SUN FIRE X4270 M3 SAS
	 metricHistoryDays: 	 7
	 notificationMethod:	 snmp
	 notificationPolicy:	 critical,warning,clear
	 offloadEfficiency: 	 3,616.5
	 powerCount:        	 2/2
	 powerStatus:       	 normal
	 releaseVersion:    	 11.2.3.2.1
	 releaseTrackingBug:	 14522699
	 snmpSubscriber:    	 host=oa0db02.qhsrmyy.com,port=3872,community=cell
	                    	 host=oa0db01.qhsrmyy.com,port=3872,community=cell
	 status:            	 online
	 temperatureReading:	 28.0
	 temperatureStatus: 	 normal
	 upTime:            	 0 days, 3:49
	 cellsrvStatus:     	 running
	 msStatus:          	 running
	 rsStatus:          	 running

客户第一次换盘之后，依旧有性能问题，先把griddisk给inactive

[root@oa0cel03 ~]# cellcli -e list metriccurrent attributes name,metricvalue where name like \'FC_BY_DIRTY.*\'
	 FC_BY_DIRTY	 38,820 MB
[root@oa0cel03 ~]# cellcli -e "alter flashcache all flush"
Flash cache on FD_00_oa0cel03 successfully altered
Flash cache on FD_01_oa0cel03 successfully altered
Flash cache on FD_02_oa0cel03 successfully altered
Flash cache on FD_03_oa0cel03 successfully altered
Flash cache on FD_04_oa0cel03 successfully altered
Flash cache on FD_05_oa0cel03 successfully altered
Flash cache on FD_06_oa0cel03 successfully altered
Flash cache on FD_07_oa0cel03 successfully altered
Flash cache on FD_09_exastlx01 successfully altered
Flash cache on FD_10_exastlx01 successfully altered
Flash cache on FD_11_oa0cel03 skipped because FD_11_oa0cel03 is degraded
Flash cache on FD_12_oa0cel03 successfully altered
Flash cache on FD_13_oa0cel03 successfully altered
Flash cache on FD_14_oa0cel03 successfully altered
Flash cache on FD_15_oa0cel03 successfully altered
[root@oa0cel03 ~]# cellcli -e list metriccurrent attributes name,metricvalue where name like \'FC_BY_DIRTY.*\'
	 FC_BY_DIRTY	 0.000 MB
[root@oa0cel03 ~]# cellcli -e "alter griddisk all inactive"
GridDisk DATA_oa0_CD_00_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_01_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_02_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_03_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_04_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_05_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_06_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_07_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_08_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_09_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_10_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_11_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_02_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_03_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_04_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_05_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_06_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_07_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_08_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_09_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_10_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_11_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_00_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_01_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_02_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_03_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_04_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_05_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_06_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_07_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_08_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_09_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_10_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_11_oa0cel03 successfully altered
[root@oa0cel03 ~]# cellcli -e list griddisk
	 DATA_oa0_CD_00_oa0cel03	 inactive
	 DATA_oa0_CD_01_oa0cel03	 inactive
	 DATA_oa0_CD_02_oa0cel03	 inactive
	 DATA_oa0_CD_03_oa0cel03	 inactive
	 DATA_oa0_CD_04_oa0cel03	 inactive
	 DATA_oa0_CD_05_oa0cel03	 inactive
	 DATA_oa0_CD_06_oa0cel03	 inactive
	 DATA_oa0_CD_07_oa0cel03	 inactive
	 DATA_oa0_CD_08_oa0cel03	 inactive
	 DATA_oa0_CD_09_oa0cel03	 inactive
	 DATA_oa0_CD_10_oa0cel03	 inactive
	 DATA_oa0_CD_11_oa0cel03	 inactive
	 DBFS_DG_CD_02_oa0cel03  	 inactive
	 DBFS_DG_CD_03_oa0cel03  	 inactive
	 DBFS_DG_CD_04_oa0cel03  	 inactive
	 DBFS_DG_CD_05_oa0cel03  	 inactive
	 DBFS_DG_CD_06_oa0cel03  	 inactive
	 DBFS_DG_CD_07_oa0cel03  	 inactive
	 DBFS_DG_CD_08_oa0cel03  	 inactive
	 DBFS_DG_CD_09_oa0cel03  	 inactive
	 DBFS_DG_CD_10_oa0cel03  	 inactive
	 DBFS_DG_CD_11_oa0cel03  	 inactive
	 RECO_oa0_CD_00_oa0cel03	 inactive
	 RECO_oa0_CD_01_oa0cel03	 inactive
	 RECO_oa0_CD_02_oa0cel03	 inactive
	 RECO_oa0_CD_03_oa0cel03	 inactive
	 RECO_oa0_CD_04_oa0cel03	 inactive
	 RECO_oa0_CD_05_oa0cel03	 inactive
	 RECO_oa0_CD_06_oa0cel03	 inactive
	 RECO_oa0_CD_07_oa0cel03	 inactive
	 RECO_oa0_CD_08_oa0cel03	 inactive
	 RECO_oa0_CD_09_oa0cel03	 inactive
	 RECO_oa0_CD_10_oa0cel03	 inactive
	 RECO_oa0_CD_11_oa0cel03	 inactive
[root@oa0cel03 ~]#  cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome 
	 DATA_oa0_CD_00_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_01_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_02_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_03_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_04_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_05_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_06_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_07_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_08_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_09_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_10_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_11_oa0cel03	 OFFLINE	 Yes
	 DBFS_DG_CD_02_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_03_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_04_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_05_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_06_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_07_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_08_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_09_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_10_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_11_oa0cel03  	 OFFLINE	 Yes
	 RECO_oa0_CD_00_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_01_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_02_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_03_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_04_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_05_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_06_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_07_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_08_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_09_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_10_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_11_oa0cel03	 OFFLINE	 Yes

客户继续换卡尝试，最终确认4号卡槽损坏，放弃这个槽位重建flashcache

[root@oa0cel03 ~]# cellcli -e list celldisk
	 CD_00_oa0cel03	 normal
	 CD_01_oa0cel03	 normal
	 CD_02_oa0cel03	 normal
	 CD_03_oa0cel03	 normal
	 CD_04_oa0cel03	 normal
	 CD_05_oa0cel03	 normal
	 CD_06_oa0cel03	 normal
	 CD_07_oa0cel03	 normal
	 CD_08_oa0cel03	 normal
	 CD_09_oa0cel03	 normal
	 CD_10_oa0cel03	 normal
	 CD_11_oa0cel03	 normal
	 FD_00_oa0cel03	 not present
	 FD_01_oa0cel03	 not present
	 FD_02_oa0cel03	 not present
	 FD_03_oa0cel03	 not present
	 FD_04_oa0cel03	 normal
	 FD_05_oa0cel03	 normal
	 FD_06_oa0cel03	 normal
	 FD_07_oa0cel03	 normal
	 FD_08_oa0cel03	 normal
	 FD_09_oa0cel03	 normal
	 FD_10_oa0cel03	 normal
	 FD_10_exastlx01 normal
	 FD_12_oa0cel03	 normal
	 FD_13_oa0cel03	 normal
	 FD_14_oa0cel03	 normal
	 FD_15_oa0cel03	 normal

这个里面FD_10_exastlx01名字是以前老的卡上面留下来的,太影响视觉感官了，删除重建

[root@oa0cel03 ~]# cellcli -e drop celldisk FD_10_exastlx01
CellDisk FD_10_exastlx01 successfully dropped
[root@oa0cel03 ~]# cellcli -e create celldisk all flashdisk
CellDisk FD_11_oa0cel03 successfully created
[root@oa0cel03 ~]# cellcli -e list celldisk
	 CD_00_oa0cel03	 normal
	 CD_01_oa0cel03	 normal
	 CD_02_oa0cel03	 normal
	 CD_03_oa0cel03	 normal
	 CD_04_oa0cel03	 normal
	 CD_05_oa0cel03	 normal
	 CD_06_oa0cel03	 normal
	 CD_07_oa0cel03	 normal
	 CD_08_oa0cel03	 normal
	 CD_09_oa0cel03	 normal
	 CD_10_oa0cel03	 normal
	 CD_11_oa0cel03	 normal
	 FD_00_oa0cel03	 not present
	 FD_01_oa0cel03	 not present
	 FD_02_oa0cel03	 not present
	 FD_03_oa0cel03	 not present
	 FD_04_oa0cel03	 normal
	 FD_05_oa0cel03	 normal
	 FD_06_oa0cel03	 normal
	 FD_07_oa0cel03	 normal
	 FD_08_oa0cel03	 normal
	 FD_09_oa0cel03	 normal
	 FD_10_oa0cel03	 normal
	 FD_11_oa0cel03	 normal
	 FD_12_oa0cel03	 normal
	 FD_13_oa0cel03	 normal
	 FD_14_oa0cel03	 normal
	 FD_15_oa0cel03	 normal

删除flashlog和flashcache

[root@oa0cel03 ~]# cellcli -e drop flashlog
Flash log oa0cel03_FLASHLOG successfully dropped
[root@oa0cel03 ~]# 
[root@oa0cel03 ~]# 
[root@oa0cel03 ~]# 
[root@oa0cel03 ~]# cellcli -e drop flashcache
Flash cache oa0cel03_FLASHCACHE successfully dropped

尝试重建flashlog和flashcache

[root@oa0cel03 ~]# cellcli -e create flashlog all size=512M
Flash log oa0cel03_FLASHLOG successfully created, but the following cell disks were degraded because their 
statuses are not normal: FD_00_oa0cel03, FD_01_oa0cel03, FD_02_oa0cel03, FD_03_oa0cel03

由于有一些celldisk实际硬盘不存在，无法直接创建成功，需要删除对应的celldisk

[root@oa0cel03 ~]# cellcli -e drop celldisk FD_00_oa0cel03

CELL-04519: Cannot complete the drop of cell disk: FD_00_oa0cel03. Received error: 
CELL-04516: LUN Object cannot be obtained for cell disk: FD_00_oa0cel03 
Cell disks not dropped: FD_00_oa0cel03 

--强制删除
[root@oa0cel03 ~]# cellcli -e drop celldisk FD_00_oa0cel03  force

CellDisk FD_00_oa0cel03 successfully dropped
[root@oa0cel03 ~]# cellcli -e drop celldisk FD_01_oa0cel03  force
CellDisk FD_01_oa0cel03 successfully dropped
[root@oa0cel03 ~]# cellcli -e drop celldisk FD_02_oa0cel03  force
CellDisk FD_02_oa0cel03 successfully dropped
[root@oa0cel03 ~]# cellcli -e drop celldisk FD_03_oa0cel03  force
CellDisk FD_03_oa0cel03 successfully dropped
[root@oa0cel03 ~]# cellcli -e list celldisk
	 CD_00_oa0cel03	 normal
	 CD_01_oa0cel03	 normal
	 CD_02_oa0cel03	 normal
	 CD_03_oa0cel03	 normal
	 CD_04_oa0cel03	 normal
	 CD_05_oa0cel03	 normal
	 CD_06_oa0cel03	 normal
	 CD_07_oa0cel03	 normal
	 CD_08_oa0cel03	 normal
	 CD_09_oa0cel03	 normal
	 CD_10_oa0cel03	 normal
	 CD_11_oa0cel03	 normal
	 FD_04_oa0cel03	 normal
	 FD_05_oa0cel03	 normal
	 FD_06_oa0cel03	 normal
	 FD_07_oa0cel03	 normal
	 FD_08_oa0cel03	 normal
	 FD_09_oa0cel03	 normal
	 FD_10_oa0cel03	 normal
	 FD_11_oa0cel03	 normal
	 FD_12_oa0cel03	 normal
	 FD_13_oa0cel03	 normal
	 FD_14_oa0cel03	 normal
	 FD_15_oa0cel03	 normal

创建flashlog和flashcache

[root@oa0cel03 ~]# cellcli -e create flashlog all size=512M
Flash log oa0cel03_FLASHLOG successfully created
[root@oa0cel03 ~]# cellcli -e list flashlog detail
	 name:              	 oa0cel03_FLASHLOG
	 cellDisk:               …………
	 creationTime:      	 2024-06-21T18:20:51+08:00
	 degradedCelldisks: 	 
	 effectiveSize:     	 384M
	 efficiency:        	 100.0
	 id:                	 f3ab3882-fa03-4f49-b0ca-879ef3f2ac05
	 size:              	 384M
	 status:            	 normal
[root@oa0cel03 ~]# cellcli -e create flashcache all
Flash cache oa0cel03_FLASHCACHE successfully created
[root@oa0cel03 ~]# cellcli -e list flashcache detail
	 name:              	 oa0cel03_FLASHCACHE
	 cellDisk:          	 …………
	 creationTime:      	 2024-06-21T18:21:24+08:00
	 degradedCelldisks: 	 
	 effectiveCacheSize:	 1116.5625G
	 id:                	 2195ac46-3021-461f-a6d5-5f64ff1da546
	 size:              	 1116.5625G
	 status:            	 normal
[root@oa0cel03 ~]#  cellcli -e list cell detail | grep flashCacheMode
	 flashCacheMode:    	 WriteBack

active griddisk，把这个cell的griddisk加入到asm磁盘组中

[root@oa0cel03 ~]# cellcli -e "alter griddisk all active"
GridDisk DATA_oa0_CD_00_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_01_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_02_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_03_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_04_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_05_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_06_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_07_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_08_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_09_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_10_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_11_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_02_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_03_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_04_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_05_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_06_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_07_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_08_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_09_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_10_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_11_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_00_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_01_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_02_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_03_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_04_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_05_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_06_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_07_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_08_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_09_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_10_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_11_oa0cel03 successfully altered
[root@oa0cel03 ~]# cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome,status
	 DATA_oa0_CD_00_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_01_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_02_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_03_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_04_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_05_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_06_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_07_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_08_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_09_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_10_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_11_oa0cel03	 SYNCING	 Yes	 active
	 DBFS_DG_CD_02_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_03_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_04_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_05_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_06_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_07_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_08_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_09_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_10_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_11_oa0cel03  	 ONLINE 	 Yes	 active
	 RECO_oa0_CD_00_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_01_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_02_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_03_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_04_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_05_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_06_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_07_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_08_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_09_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_10_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_11_oa0cel03	 SYNCING	 Yes	 active
[root@oa0cel03 ~]# cellcli -e list metriccurrent attributes name,metricvalue where name like \'FC_BY_DIRTY.*\'
	 FC_BY_DIRTY	 585 MB

RMAN-06207 RMAN-06208报错解决

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：RMAN-06207 RMAN-06208报错解决

数据库在rman备份时执行delete noprompt expired backup报RMAN-06207和RMAN-06208错误

RMAN>delete noprompt expired backup;
RMAN retention policy will be applied to the command
RMAN retention policy is set to recovery window of 1 days
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1364 instance=hisdb1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=3102 instance=hisdb1 device type=DISK
allocated channel: ORA_DISK_3
channel ORA_DISK_3: SID=4534 instance=hisdb1 device type=DISK
allocated channel: ORA_DISK_4
channel ORA_DISK_4: SID=5054 instance=hisdb1 device type=DISK
Deleting the following obsolete backups and copies:
Type                 Key    Completion Time    Filename/Handle
-------------------- ------ ------------------ --------------------
Datafile Copy        3      15-JUN-18          +DATA/hisdb/datafile/xifenfei.282.978867569
Datafile Copy        4      15-JUN-18          +DATA/hisdb/datafile/xifenfei.283.978867589
Backup Set           54704  14-JUN-24
  Backup Piece       54704  14-JUN-24          /oraback/rmanfile/ctl_20240614_m22ta31a_1_1.rman

RMAN-06207: WARNING: 3 objects could not be deleted for DISK channel(s) due
RMAN-06208:          to mismatched status.  Use CROSSCHECK command to fix status
RMAN-06210: List of Mismatched objects
RMAN-06211: ==========================
RMAN-06212:   Object Type   Filename/Handle
RMAN-06213: --------------- ---------------------------------------------------
RMAN-06214: Datafile Copy   +DATA/hisdb/datafile/xifenfei.282.978867569
RMAN-06214: Datafile Copy   +DATA/hisdb/datafile/xifenfei.283.978867589
RMAN-06214: Backup Piece    /oraback/rmanfile/ctl_20240614_m22ta31a_1_1.rman

处理方法crosscheck copy

RMAN> crosscheck copy;

using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1364 instance=hisdb1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=3102 instance=hisdb1 device type=DISK
allocated channel: ORA_DISK_3
channel ORA_DISK_3: SID=4534 instance=hisdb1 device type=DISK
allocated channel: ORA_DISK_4
channel ORA_DISK_4: SID=5070 instance=hisdb1 device type=DISK
specification does not match any control file copy in the repository
validation succeeded for archived log
archived log file name=+FRA/hisdb/archivelog/2024_06_16/thread_2_seq_149520.1675.1171833177 
     RECID=589464 STAMP=1171833177
Crosschecked 1 objects

validation failed for datafile copy
datafile copy file name=+DATA/hisdb/datafile/xifenfei.282.978867569 RECID=3 STAMP=978867571
validation failed for datafile copy
datafile copy file name=+DATA/hisdb/datafile/xifenfei.283.978867589 RECID=4 STAMP=978867590
Crosschecked 2 objects

再次验证删除不再提示错误

RMAN>   DELETE noprompt OBSOLETE;

RMAN retention policy will be applied to the command
RMAN retention policy is set to recovery window of 1 days
using channel ORA_DISK_1
using channel ORA_DISK_2
using channel ORA_DISK_3
using channel ORA_DISK_4
no obsolete backups found

RAC主机相差超过10分钟导致crs无法启动

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：RAC主机相差超过10分钟导致crs无法启动

客户反馈有一套19c 2节点rac，断电之后，一个节点数据库无法正常启动,通过crsctl命令查看发现crs进程没有正常启动

[root@xifenf1 ~]# /u01/app/19.0/grid/bin/crsctl status res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                               STABLE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.crf
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.crsd
      1        ONLINE  OFFLINE                               STABLE
ora.cssd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.ctssd
      1        ONLINE  OFFLINE                               STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.evmd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.gipcd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.gpnpd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.mdnsd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.storage
      1        ONLINE  ONLINE       xifenf1                  STABLE
--------------------------------------------------------------------------------

查看crs的alert日志发现集群时间间隔超过600s，无法启动csst进程

2024-06-11 17:33:09.953 [OCSSD(5020)]CRS-1605: CSSD voting file is online: /dev/asm_ocr5; 
 details in /u01/app/grid/diag/crs/xifenf1/crs/trace/ocssd.trc.
2024-06-11 17:33:09.956 [OCSSD(5020)]CRS-1605: CSSD voting file is online: /dev/asm_ocr1; 
 details in /u01/app/grid/diag/crs/xifenf1/crs/trace/ocssd.trc.
2024-06-11 17:33:10.024 [OCSSD(5020)]CRS-1605: CSSD voting file is online: /dev/asm_ocr2; 
 details in /u01/app/grid/diag/crs/xifenf1/crs/trace/ocssd.trc.
2024-06-11 17:33:10.031 [OCSSD(5020)]CRS-1605: CSSD voting file is online: /dev/asm_ocr4; 
 details in /u01/app/grid/diag/crs/xifenf1/crs/trace/ocssd.trc.
2024-06-11 17:33:10.040 [OCSSD(5020)]CRS-1605: CSSD voting file is online: /dev/asm_ocr3; 
 details in /u01/app/grid/diag/crs/xifenf1/crs/trace/ocssd.trc.
2024-06-11 17:33:11.900 [OCSSD(5020)]CRS-1601: CSSD Reconfiguration complete. Active nodes are xifenf1 xifenf2 .
2024-06-11 17:33:13.344 [OCSSD(5020)]CRS-1720: Cluster Synchronization Services daemon (CSSD) is ready for operation.
2024-06-11 17:33:13.809 [OCTSSD(5488)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 5488
2024-06-11 17:33:16.017 [OCTSSD(5488)]CRS-2407: The new Cluster Time Synchronization Service reference node is host xifenf2.
2024-06-11 17:33:16.018 [OCTSSD(5488)]CRS-2401: The Cluster Time Synchronization Service started on host xifenf1.
2024-06-11 17:33:16.105 [OCTSSD(5488)]CRS-2419: The clock on host xifenf1 differs from mean cluster time by 1031504618 microseconds. 
  The Cluster Time Synchronization Service will not perform time synchronization 
  because the time difference is beyond the permissible offset of 600 seconds. 
  Details in /u01/app/grid/diag/crs/xifenf1/crs/trace/octssd.trc.
2024-06-11 17:33:16.579 [OCTSSD(5488)]CRS-2402: The Cluster Time Synchronization Service aborted on host xifenf1. 
  Details at (:ctsselect_mstm4:) in /u01/app/grid/diag/crs/xifenf1/crs/trace/octssd.trc.

查看主机时间

[grid@xifenf1 ~]$ date ;ssh xifenf2 date
Tue Jun 11 17:54:09 CST 2024
Tue Jun 11 18:04:34 CST 2024

修改主机时间

[root@xifenf1 ~]# date -s "20240611 18:06:00"
Tue Jun 11 18:06:00 CST 2024
[root@xifenf1 ~]# su - grid
Last login: Tue Jun 11 17:37:53 CST 2024 on pts/0
[grid@xifenf1 ~]$ date ;ssh xifenf2 date
Tue Jun 11 18:06:09 CST 2024
Tue Jun 11 18:05:34 CST 2024

重启crs

[root@xifenf1 ~]# /u01/app/19.0/grid/bin/crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'xifenf1'
CRS-2673: Attempting to stop 'ora.storage' on 'xifenf1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'xifenf1'
CRS-2673: Attempting to stop 'ora.crf' on 'xifenf1'
CRS-2677: Stop of 'ora.storage' on 'xifenf1' succeeded
CRS-2673: Attempting to stop 'ora.evmd' on 'xifenf1'
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'xifenf1'
CRS-2677: Stop of 'ora.mdnsd' on 'xifenf1' succeeded
CRS-2677: Stop of 'ora.crf' on 'xifenf1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'xifenf1' succeeded
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'xifenf1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'xifenf1'
CRS-2677: Stop of 'ora.cssd' on 'xifenf1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'xifenf1'
CRS-2673: Attempting to stop 'ora.gipcd' on 'xifenf1'
CRS-2677: Stop of 'ora.gpnpd' on 'xifenf1' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'xifenf1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'xifenf1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@xifenf1 ~]# /u01/app/19.0/grid/bin/crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@xifenf1 ~]# /u01/app/19.0/grid/bin/crsctl status res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.crf
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.crsd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.cssd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.ctssd
      1        ONLINE  ONLINE       xifenf1                  ACTIVE:35600,STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.evmd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.gipcd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.gpnpd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.mdnsd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.storage
      1        ONLINE  ONLINE       xifenf1                  STABLE
--------------------------------------------------------------------------------

O/S-Error: (OS 23) 数据错误(循环冗余检查)—故障处理

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：O/S-Error: (OS 23) 数据错误(循环冗余检查)—故障处理

有客户由于磁盘坏道导致数据文件访问报ORA-27070 OSD-04016 O/S-Error等相关错误

rman 尝试读取88号文件

RMAN> backup datafile 88 format 'e:\rman\%d_%T_%I.%s%p';

启动 backup 于 05-6月 -24
使用目标数据库控制文件替代恢复目录
分配的通道: ORA_DISK_1
通道 ORA_DISK_1: SID=2246 设备类型=DISK
通道 ORA_DISK_1: 正在启动全部数据文件备份集
通道 ORA_DISK_1: 正在指定备份集内的数据文件
输入数据文件: 文件号=00088 名称=E:\APP\ADMINISTRATOR\ORADATA\XFF\XIFENFEI.ORA
通道 ORA_DISK_1: 正在启动段 1 于 05-6月 -24
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: backup 命令 (ORA_DISK_1 通道上, 在 06/05/2024 18:33:43 上) 失败
ORA-19501: 文件 "E:\APP\ADMINISTRATOR\ORADATA\XFF\XIFENFEI.ORA", 块编号 322944 (块大小=8192) 上出现读取错误
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 23) 数据错误(循环冗余检查)。

检查系统日志,发现有报:设备 \Device\Harddisk0\DR0 有一个不正确的区块

到目前为止基本上可以判断是文件系统或者磁盘层面出现坏道导致该问题（磁盘坏道概率更大）,使用工具对损坏数据文件进行强制拷贝,提示少量扇区数据无法拷贝

通过dbv检查恢复出来文件效果

E:\check_db>DBV FILE=D:/XIFENFEI.ORA

DBVERIFY: Release 11.2.0.3.0 - Production on 星期日 6月 9 22:17:28 2024

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - 开始验证: FILE = D:\XIFENFEI.ORA
页 323055 标记为损坏
Corrupt block relative dba: 0x1604edef (file 88, block 323055)
Bad header found during dbv:
Data in bad block:
 type: 229 format: 5 rdba: 0xe5e5e5e5
 last change scn: 0xe5e5.e5e5e5e5 seq: 0xe5 flg: 0xe5
 spare1: 0xe5 spare2: 0xe5 spare3: 0xe5e5
 consistency value in tail: 0x4c390601
 check value in block header: 0xe5e5
 computed block checksum: 0x5003



DBVERIFY - 验证完成

检查的页总数: 524288
处理的页总数 (数据): 204510
失败的页总数 (数据): 0
处理的页总数 (索引): 127485
失败的页总数 (索引): 0
处理的页总数 (其他): 3030
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 189262
标记为损坏的总页数: 1
流入的页总数: 0
加密的总页数        : 0
最高块 SCN            : 184063522 (3470.184063522)

运气不错，就一个数据库block异常,通过dba_extents查询坏块所属对象，运气不太好是一个表数据

SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
  3   WHERE FILE_ID = &FILE_ID
  4     AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
输入 file_id 的值:  88
原值    3:  WHERE FILE_ID = &FILE_ID
新值    3:  WHERE FILE_ID = 88
输入 block_id 的值:  323055
原值    4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
新值    4:    AND 323055 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1

OWNER
------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE       TABLESPACE_NAME                PARTITION_NAME
------------------ ------------------------------ ------------------------------
XFFUSER
XFFTABLE
TABLE              XFFTBS_TAB2

设置跳过坏块导出数据，然后重命名原表导入数据,完成本次恢复，以前有过类似恢复:

一次侥幸的OSD-04016 O/S-Error异常恢复
 O/S-Error: (OS 23) 数据错误(循环冗余检查) 数据库恢复

异常断电数据库恢复-从ORA-600 2131到ORA-08102: 未找到索引关键字, 对象号 39

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：异常断电数据库恢复-从ORA-600 2131到ORA-08102: 未找到索引关键字, 对象号 39

数据库启动报ORA-600 2131，以前遇到过类似问题:ORA-600 2131故障处理

SQL> alter database mount;
alter database mount
*
第 1 行出现错误:
ORA-00600: ??????, ??: [2131], [9], [8], [], [], [], [], [], [], [], [], []

Tue Jun 04 14:12:18 2024
RECO started with pid=15, OS id=3244 
Tue Jun 04 14:12:18 2024
MMON started with pid=16, OS id=3256 
Tue Jun 04 14:12:18 2024
MMNL started with pid=17, OS id=3432 
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
starting up 1 shared server(s) ...
ORACLE_BASE from environment = E:\app\Administrator
Tue Jun 04 14:12:22 2024
alter database mount exclusive
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\XFF\XFF\trace\XFF_ora_2536.trc  (incident=427583):
ORA-00600: ??????, ??: [2131], [9], [8], [], [], [], [], [], [], [], [], []
Tue Jun 04 14:12:28 2024
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORA-600 signalled during: alter database mount exclusive...

重建ctl，然后重试recover 数据库，报ORA-600 kdourp_inorder2和ORA-600 3020错误，这些错误本质都是由于redo信息和block信息不匹配导致

SQL> recover datafile 1;
ORA-00283: 恢复会话因错误而取消
ORA-10562: Error occurred while applying redo to data block (file# 1, block# 74805)
ORA-10564: tablespace SYSTEM
ORA-01110: 数据文件 1: 'E:\ORADATA\XFF\SYSTEM01.DBF'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 8
ORA-00600: 内部错误代码, 参数: [kdourp_inorder2], [16], [3], [0], [108], [], [], [], [], [], [], []


SQL> recover datafile 7;
ORA-00283: 恢复会话因错误而取消
ORA-00600: 内部错误代码, 参数: [3020], [7], [385], [29360513], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 7, block# 385, file offset is 3153920 bytes)
ORA-10564: tablespace UNDOTBS2
ORA-01110: 数据文件 7: 'E:\ORADATA\XFF\UNDOTBS2.DBF'
ORA-10560: block type 'KTU UNDO BLOCK'

通过屏蔽一致性,修改文件头scn,强制打开数据库

SQL> recover database until cancel;
ORA-00279: 更改 56782359 (在 06/04/2024 14:00:36 生成) 对于线程 1 是必需的
ORA-00289: 建议: E:\APP\ARCHIVELOG\ARC0000005415_1165094245.0001
ORA-00280: 更改 56782359 (用于线程 1) 在序列 #5415 中


指定日志: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: 警告: RECOVER 成功但 OPEN RESETLOGS 将出现如下错误
ORA-01194: 文件 1 需要更多的恢复来保持一致性
ORA-01110: 数据文件 1: 'E:\ORADATA\XFF\SYSTEM01.DBF'


ORA-01112: 未启动介质恢复


SQL> alter database open resetlogs;

数据库已更改。

尝试导出数据报ORA-08102,导致数据库无法正常导出

C:\Users\Administrator>expdp "'/ as sysdba'" full=y dumpfile=full_20240604_%U.dmp DIRECTORY=expdp_dir 
logfile=full_20240604.log parallel=2 EXCLUDE=STATISTICS,AUDIT

Export: Release 11.2.0.4.0 - Production on 星期二 6月 4 18:40:26 2024

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

连接到: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORA-31626: 作业不存在
ORA-31633: 无法创建主表 "SYS.SYS_EXPORT_FULL_05"
ORA-06512: 在 "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: 在 "SYS.KUPV$FT", line 1038
ORA-08102: 未找到索引关键字, 对象号 39, 文件 1, 块 97540 (2)

obj 39 为OBJ$的I_OBJ4对象报ORA-08102

SQL> select owner,object_name,object_type from dba_objects where object_id=39
  2  /

OWNER                          OBJECT_NAME                    OBJECT_TYPE
------------------------------ ------------------------------ -------------------
SYS                            I_OBJ4                         INDEX

该对象属于bootstrap$中核心对象，无法直接rebuild,参考下面文章处理,然后再尝试导出数据
分享I_OBJ4 ORA-8102故障恢复案例
 使用bbed 修复I_OBJ4 index 报ORA-8102错误
 bootstrap$核心index(I_OBJ1,I_USER1,I_FILE#_BLOCK#,I_IND1,I_TS#,I_CDEF1等)异常恢复—ORA-00701错误解决

C:\Users\Administrator>expdp "'/ as sysdba'" full=y dumpfile=full_20240604_%U.dmp DIRECTORY=expdp_dir 
logfile=full_20240604.log parallel=2 EXCLUDE=STATISTICS,AUDIT

Export: Release 11.2.0.4.0 - Production on 星期二 6月 4 18:43:47 2024

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

连接到: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORA-31626: 作业不存在
ORA-31637: 无法创建作业 SYS_EXPORT_FULL_01 (用户 SYS)
ORA-06512: 在 "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: 在 "SYS.KUPV$FT_INT", line 798
ORA-39080: 无法为数据泵作业创建队列 "KUPC$C_1_20240604184348" 和 "KUPC$S_1_20240604184348"
ORA-06512: 在 "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: 在 "SYS.KUPC$QUE_INT", line 1534
ORA-08102: 未找到索引关键字, 对象号 53, 文件 1, 块 97715 (2)

通过类似方法分析确认为CDEF$的I_CDEF1 index,处理方法和I_OBJ4一样,然后导出数据成功,导入到新库中,在这个迁移过程中遭遇Wrapped 加密的package body无效的问题,具体参见:数据泵迁移Wrapped PLSQL之后报PLS-00753

数据泵迁移Wrapped PLSQL之后报PLS-00753

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：数据泵迁移Wrapped PLSQL之后报PLS-00753

记录一个oracle datapump迁移库遇到Wrapped PLSQL无效的Bug 18881811 Data Pump Import of Wrapped PLSQL Corrupts the Body（主要影响版本11.2.0.4/12.1.0.2）

导入时提示:ORA-39082 Object type PACKAGE_BODY:created with compilation warnings.
尝试编译对象提示

SQL> alter package xx.xx compile body;

Warning: Package Body altered with compilation errors.

SQL> show error

PLS-00753: malformed or corrupted wrapped unit

由于涉及的package body 比较多,人工处理比较麻烦,采取临时解决方法
1. drop 掉所有报错的package
2. 使用exp导出所有的对象（rows=n）
3. 使用imp导入
4. 编译无效对象

数据库open报ORA-600 kcratr_scan_lastbwr故障处理

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：数据库open报ORA-600 kcratr_scan_lastbwr故障处理

由于断电，导致数据库正常open报ORA-600 kcratr_scan_lastbwr错误

Wed Jan 17 18:23:26 2024
ALTER DATABASE   MOUNT
Successful mount of redo thread 1, with mount id 1028618590
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT
Wed Jan 17 18:23:30 2024
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 32 processes
Started redo scan
Hex dump of (file 3, block 144) in trace file /database/oracle/app/oracle/diag/rdbms/xff/xff/trace/xff_ora_66361.trc
Reading datafile '/database/oracle/app/oracle/oradata/xff/datafile/o1_mf_undotbs1_hct7001s_.dbf' 
  for corruption at rdba: 0x00c00090 (file 3, block 144)
Reread (file 3, block 144) found same corrupt data (logically corrupt)
Write verification failed for File 3 Block 144 (rdba 0xc00090)
Errors in file /database/oracle/app/oracle/diag/rdbms/xff/xff/trace/xff_ora_66361.trc  (incident=672241):
ORA-00600: internal error code, arguments: [kcratr_scan_lastbwr], [], [], [], [], []
Incident details in: /database/oracle/app/oracle/diag/rdbms/xff/xff/incident/incdir_672241/xff_ora_66361_i672241.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Aborting crash recovery due to error 600
Errors in file /database/oracle/app/oracle/diag/rdbms/xff/xff/trace/xff_ora_66361.trc:
ORA-00600: internal error code, arguments: [kcratr_scan_lastbwr], [], [], [], [], []
Errors in file /database/oracle/app/oracle/diag/rdbms/xff/xff/trace/xff_ora_66361.trc:
ORA-00600: internal error code, arguments: [kcratr_scan_lastbwr], [], [], [], [], []
ORA-600 signalled during: ALTER DATABASE OPEN...

尝试recover 数据库报ORA-600 3020错误

Wed Jan 17 18:28:38 2024
ALTER DATABASE RECOVER  database  
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 96 slaves
Wed Jan 17 18:28:41 2024
Recovery of Online Redo Log: Thread 1 Group 2 Seq 410864 Reading mem 0
  Mem# 0: /database/oracle/app/oracle/oradata/xff/onlinelog/o1_mf_2_hct740hq_.log
Wed Jan 17 18:28:42 2024
ORA-00600: internal error code, arguments: [3020], [3], [240], [12583152], [], []
ORA-10567: Redo is inconsistent with data block (file# 3, block# 240, file offset is 1966080 bytes)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 3: '/database/oracle/app/oracle/oradata/xff/datafile/o1_mf_undotbs1_hct7001s_.dbf'
ORA-10560: block type 'KTU SMU HEADER BLOCK'
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Slave exiting with ORA-600 exception
Errors in file /database/oracle/app/oracle/diag/rdbms/xff/xff/trace/xff_pr19_68212.trc:
ORA-00600: internal error code, arguments: [3020], [3], [240], [12583152], [], []
ORA-10567: Redo is inconsistent with data block (file# 3, block# 240, file offset is 1966080 bytes)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 3: '/database/oracle/app/oracle/oradata/xff/datafile/o1_mf_undotbs1_hct7001s_.dbf'
ORA-10560: block type 'KTU SMU HEADER BLOCK'
Errors in file /database/oracle/app/oracle/diag/rdbms/xff/xff/trace/xff_ora_68038.trc  (incident=672243):
ORA-00600: internal error code, arguments: [3020], [3], [240], [12583152], [], []
ORA-10567: Redo is inconsistent with data block (file# 3, block# 240, file offset is 1966080 bytes)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 3: '/database/oracle/app/oracle/oradata/xff/datafile/o1_mf_undotbs1_hct7001s_.dbf'
ORA-10560: block type 'KTU SMU HEADER BLOCK'
Wed Jan 17 18:28:43 2024
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Recovery Slave PR19 previously exited with exception 600
Wed Jan 17 18:28:43 2024
Sweep [inc][672865]: completed
Media Recovery failed with error 448
Errors in file /database/oracle/app/oracle/diag/rdbms/xff/xff/trace/xff_pr00_68115.trc:
ORA-00283: recovery session canceled due to errors
ORA-00448: normal completion of background process
ORA-600 signalled during: ALTER DATABASE RECOVER  database  ...

加上隐含参数尝试强制拉库

alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 6165467436
Clearing online redo logfile 1 /database/oracle/app/oracle/oradata/xff/onlinelog/o1_mf_1_hct740fp_.log
Clearing online log 1 of thread 1 sequence number 0
Clearing online redo logfile 1 complete
Clearing online redo logfile 2 /database/oracle/app/oracle/oradata/xff/onlinelog/o1_mf_2_hct740hq_.log
Clearing online log 2 of thread 1 sequence number 0
Clearing online redo logfile 2 complete
Clearing online redo logfile 3 /database/oracle/app/oracle/oradata/xff/onlinelog/o1_mf_3_hct740k7_.log
Clearing online log 3 of thread 1 sequence number 0
Clearing online redo logfile 3 complete
Online log /database/oracle/app/oracle/oradata/xff/onlinelog/o1_mf_1_hct740fp_.log: Thread 1 Group 1 was previously cleared
Online log /database/oracle/app/oracle/oradata/xff/onlinelog/o1_mf_2_hct740hq_.log: Thread 1 Group 2 was previously cleared
Online log /database/oracle/app/oracle/oradata/xff/onlinelog/o1_mf_3_hct740k7_.log: Thread 1 Group 3 was previously cleared
Fri Jan 19 09:24:59 2024
Setting recovery target incarnation to 2
Initializing SCN for created control file
Database SCN compatibility initialized to 3
Warning - High Database SCN: Current SCN value is 6165467439, threshold SCN value is 0
If you have not previously reported this warning on this database,
  please notify Oracle Support so that additional diagnosis can be performed.
Fri Jan 19 09:24:59 2024
Assigning activation ID 1028784413 (0x3d52011d)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /database/oracle/app/oracle/oradata/xff/onlinelog/o1_mf_1_hct740fp_.log
Successful open of redo thread 1
Fri Jan 19 09:24:59 2024
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Jan 19 09:24:59 2024
SMON: enabling cache recovery
Errors in file /database/oracle/app/oracle/diag/rdbms/xff/xff/trace/xff_ora_84860.trc  (incident=1344255):
ORA-00600: internal error code, arguments: [2662], [1], [1870500147], [1], [1870515285], [12583040]
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /database/oracle/app/oracle/diag/rdbms/xff/xff/trace/xff_ora_84860.trc:
ORA-00600: internal error code, arguments: [2662], [1], [1870500147], [1], [1870515285], [12583040]
Errors in file /database/oracle/app/oracle/diag/rdbms/xff/xff/trace/xff_ora_84860.trc:
ORA-00600: internal error code, arguments: [2662], [1], [1870500147], [1], [1870515285], [12583040]
Error 600 happened during db open, shutting down database
USER (ospid: 84860): terminating the instance due to error 600
Instance terminated by USER, pid = 84860
ORA-1092 signalled during: alter database open resetlogs...

客户自行恢复到这一步,后面无法处理,接手之后进行恢复，其实后面比较简单了，就是修改下数据库scn,数据库就可以open起来,然后处理异常的undo和对象即可,可以参考以前类似文章:
ORA-600 2662快速恢复之Patch scn工具
 硬件故障导致ORA-600 2662错误处理
 ORA-00600 [2662]和ORA-00600 [4194]恢复
更多参考:惜分飞blog中ORA-600 2662文章