ORA-00756 ORA-10567故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-00756 ORA-10567故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库异常断电之后,recover 报ORA-00756 ORA-10567等错

SQL> recover database;
ORA-00756: 恢复操作检测到数据块写入丢失
ORA-10567: Redo is inconsistent with data block (file# 1,block# 113855,file offset is 932700160 bytes)
ORA-10564: tablespace SYSTEM
ORA-01110: 数据文件 1: 'H:\BAIDUNETDISK\XIFENFEI\SYSTEM01.DBF'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 67

alert日志报大量block逻辑错误

2024-07-16T13:16:31.050599+08:00
Slave exiting with ORA-10562 exception
2024-07-16T13:16:31.050599+08:00
Errors in file C:\APP\XFF\diag\rdbms\XIFENFEI\XIFENFEI\trace\XIFENFEI_pr01_43460.trc:
ORA-10562: Error occurred while applying redo to data block (file# 3, block# 107952)
ORA-10564: tablespace SYSAUX
ORA-01110: 数据文件 3: 'H:\BAIDUNETDISK\XIFENFEI\SYSAUX01.DBF'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 8689
ORA-00600: 内部错误代码, 参数: [ktbair2: illegal  inheritance], , [], [], [], []
2024-07-16T13:16:31.088497+08:00
Slave exiting with ORA-10562 exception
2024-07-16T13:16:31.088497+08:00
Errors in file C:\APP\XFF\diag\rdbms\XIFENFEI\XIFENFEI\trace\XIFENFEI_pr0e_10596.trc:
ORA-10562: Error occurred while applying redo to data block (file# 1, block# 755)
ORA-10564: tablespace SYSTEM
ORA-01110: 数据文件 1: 'H:\BAIDUNETDISK\XIFENFEI\SYSTEM01.DBF'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 64
ORA-00600: 内部错误代码, 参数: [kdolkr-2], [2], [155], [26], , []
2024-07-16T13:16:31.106449+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2024-07-16T13:16:31.130385+08:00
Slave exiting with ORA-10562 exception
2024-07-16T13:16:31.130385+08:00
Errors in file C:\APP\XFF\diag\rdbms\XIFENFEI\XIFENFEI\trace\XIFENFEI_pr0i_40632.trc:
ORA-10562: Error occurred while applying redo to data block (file# 1, block# 110095)
ORA-10564: tablespace SYSTEM
ORA-01110: 数据文件 1: 'H:\BAIDUNETDISK\XIFENFEI\SYSTEM01.DBF'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 40
ORA-00600: 内部错误代码, 参数: [kdxdBlkCheckError], [1], [4304399], [6401], , []
2024-07-16T13:16:31.157313+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2024-07-16T13:16:31.181249+08:00
Slave exiting with ORA-10562 exception
2024-07-16T13:16:31.182247+08:00
Errors in file C:\APP\XFF\diag\rdbms\XIFENFEI\XIFENFEI\trace\XIFENFEI_pr09_15592.trc:
ORA-10562: Error occurred while applying redo to data block (file# 1, block# 5490)
ORA-10564: tablespace SYSTEM
ORA-01110: 数据文件 1: 'H:\BAIDUNETDISK\XIFENFEI\SYSTEM01.DBF'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 822
ORA-00600: 内部错误代码, 参数: [kdxdBlkCheckError], [1], [4199794], [6401], , []
2024-07-16T13:16:31.242087+08:00
Slave exiting with ORA-10562 exception
2024-07-16T13:16:31.242087+08:00
Errors in file C:\APP\XFF\diag\rdbms\XIFENFEI\XIFENFEI\trace\XIFENFEI_pr05_28908.trc:
ORA-10562: Error occurred while applying redo to data block (file# 3, block# 3935)
ORA-10564: tablespace SYSAUX
ORA-01110: 数据文件 3: 'H:\BAIDUNETDISK\XIFENFEI\SYSAUX01.DBF'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 8694
ORA-00600: 内部错误代码, 参数: [6102], [27], [2], , [], []
2024-07-16T13:16:31.265025+08:00
Slave exiting with ORA-10562 exception
2024-07-16T13:16:31.266023+08:00
Errors in file C:\APP\XFF\diag\rdbms\XIFENFEI\XIFENFEI\trace\XIFENFEI_pr0d_24400.trc:
ORA-10562: Error occurred while applying redo to data block (file# 1, block# 51243)
ORA-10564: tablespace SYSTEM
ORA-01110: 数据文件 1: 'H:\BAIDUNETDISK\XIFENFEI\SYSTEM01.DBF'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 8
ORA-00600: 内部错误代码, 参数: [ktbair2: illegal  inheritance], , [], [], [], []
2024-07-16T13:16:31.272007+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2024-07-16T13:16:31.293948+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2024-07-16T13:16:31.294946+08:00
Slave exiting with ORA-10562 exception
2024-07-16T13:16:31.294946+08:00
Errors in file C:\APP\XFF\diag\rdbms\XIFENFEI\XIFENFEI\trace\XIFENFEI_pr02_24168.trc:
ORA-10562: Error occurred while applying redo to data block (file# 1, block# 114402)
ORA-10564: tablespace SYSTEM
ORA-01110: 数据文件 1: 'H:\BAIDUNETDISK\XIFENFEI\SYSTEM01.DBF'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 64
ORA-00600: 内部错误代码, 参数: [kdbBlkCheckError], [1], [4308706], [6124], , []
2024-07-16T13:16:31.307911+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2024-07-16T13:16:31.315890+08:00
Slave exiting with ORA-10562 exception
2024-07-16T13:16:31.316916+08:00
Errors in file C:\APP\XFF\diag\rdbms\XIFENFEI\XIFENFEI\trace\XIFENFEI_pr0h_37312.trc:
ORA-10562: Error occurred while applying redo to data block (file# 1, block# 116359)
ORA-10564: tablespace SYSTEM
ORA-01110: 数据文件 1: 'H:\BAIDUNETDISK\XIFENFEI\SYSTEM01.DBF'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 64
ORA-00600: 内部错误代码, 参数: [kdbBlkCheckError], [1], [4310663], [6124], , []
2024-07-16T13:16:31.329881+08:00
Slave exiting with ORA-10562 exception
2024-07-16T13:16:31.329881+08:00
Errors in file C:\APP\XFF\diag\rdbms\XIFENFEI\XIFENFEI\trace\XIFENFEI_pr0g_38356.trc:
ORA-10562: Error occurred while applying redo to data block (file# 1, block# 115210)
ORA-10564: tablespace SYSTEM
ORA-01110: 数据文件 1: 'H:\BAIDUNETDISK\XIFENFEI\SYSTEM01.DBF'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 64
ORA-00600: 内部错误代码, 参数: [kdbBlkCheckError], [1], [4309514], [6124], , []
2024-07-16T13:16:49.657116+08:00

Corrupt block relative dba: 0x01000c1d (file 4, block 3101)
Fractured block found during in-flux buffer recovery
Data in bad block:
 type: 2 format: 2 rdba: 0x01000c1d
 last change scn: 0x0000.0000.00ddfe50 seq: 0x1 flg: 0x04
 spare3: 0x0
 consistency value in tail: 0xcaae0205
 check value in block header: 0x2ebc
 computed block checksum: 0xee94

Reread (file 4, block 3101) found same corrupt data (no logical check)
2024-07-16T13:16:49.893484+08:00
Errors in file C:\APP\XFF\diag\rdbms\XIFENFEI\XIFENFEI\trace\XIFENFEI_pr00_23116.trc:
ORA-00283: 恢复会话因错误而取消
ORA-00448: 后台进程正常结束

dbv检查system文件报有坏块

C:\Users\XFF>dbv file=H:\BaiduNetdisk\XIFENFEI\system01.dbf

DBVERIFY: Release 19.0.0.0.0 - Production on 星期二 7月 16 13:17:32 2024

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - 开始验证: FILE = H:\BAIDUNETDISK\XIFENFEI\SYSTEM01.DBF
页 11290 流入 - 很可能是介质损坏
Corrupt block relative dba: 0x00402c1a (file 1, block 11290)
Fractured block found during dbv:
Data in bad block:
 type: 6 format: 2 rdba: 0x00402c1a
 last change scn: 0x0000.0000.00dec9ca seq: 0x1 flg: 0x06
 spare3: 0x0
 consistency value in tail: 0x56e00601
 check value in block header: 0xaf3c
 computed block checksum: 0xdc2d

页 50842 流入 - 很可能是介质损坏
Corrupt block relative dba: 0x0040c69a (file 1, block 50842)
Fractured block found during dbv:
Data in bad block:
 type: 6 format: 2 rdba: 0x0040c69a
 last change scn: 0x0000.0000.00de200e seq: 0x1 flg: 0x06
 spare3: 0x0
 consistency value in tail: 0x799a0601
 check value in block header: 0x68ef
 computed block checksum: 0x5994

页 113852 流入 - 很可能是介质损坏
Corrupt block relative dba: 0x0041bcbc (file 1, block 113852)
Fractured block found during dbv:
Data in bad block:
 type: 6 format: 2 rdba: 0x0041bcbc
 last change scn: 0x0000.0000.00df78b9 seq: 0x1 flg: 0x06
 spare3: 0x0
 consistency value in tail: 0x1f1c0601
 check value in block header: 0xf5fc
 computed block checksum: 0x46af



DBVERIFY - 验证完成

检查的页总数: 119040
处理的页总数 (数据): 82822
失败的页总数 (数据): 0
处理的页总数 (索引): 14268
失败的页总数 (索引): 0
处理的页总数 (其他): 4570
处理的总页数 (段)  : 1
失败的总页数 (段)  : 0
空的页总数: 17377
标记为损坏的总页数: 3
流入的页总数: 3
加密的总页数        : 0
最高块 SCN            : 14645988 (0.14645988)

由于无法直接应用日志打开库,尝试屏蔽一致性,强制打开库,报ORA-600 kcbzib_kcrsds_1错误

SQL> startup mount pfile='d:/pfile.txt';
ORACLE 例程已经启动。

Total System Global Area 5167381760 bytes
Fixed Size                  9039104 bytes
Variable Size             989855744 bytes
Database Buffers         4160749568 bytes
Redo Buffers                7737344 bytes
数据库装载完毕。
SQL>
SQL>
SQL>
SQL> recover database until cancel;
ORA-00279: 更改 14599839 (在  生成) 对于线程 1 是必需的


指定日志: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: 警告: RECOVER 成功但 OPEN RESETLOGS 将出现如下错误
ORA-01194: 文件 1 需要更多的恢复来保持一致性
ORA-01110: 数据文件 1: 'H:\BAIDUNETDISK\XIFENFEI\SYSTEM01.DBF'


ORA-01112: 未启动介质恢复


SQL> alter database open resetlogs;
alter database open resetlogs
*
第 1 行出现错误:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [],

进程 ID: 23392
会话 ID: 618 序列号: 30029

使用Patch_SCN工具修改scn(修改oracle scn小工具(patch scn)),然后打开库,报ORA-600 6711
patch_scn-kcbzib_kcrsds_1


SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [6711], [4310861], [1], [4309052],
[0], 
进程 ID: 40100
会话 ID: 618 序列号: 10845

这个故障最近刚刚处理过一次,见:数据库启动报ORA-600 6711故障分析处理,open数据库之后,尝试导出数据,报各种错误
ORA-600 6711报错

C:\Users\XFF>exp "'/ as sysdba'" owner=XIFENFEI file=e:/XIFENFEI.dmp log=e:/XIFENFEI.log  

Export: Release 19.0.0.0.0 - Production on 星期二 7月 16 13:33:11 2024
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.


连接到: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
已导出 ZHS16GBK 字符集和 AL16UTF16 NCHAR 字符集

即将导出指定的用户...
. 正在导出 pre-schema 过程对象和操作
EXP-00008: 遇到 ORACLE 错误 600
ORA-00600: 内部错误代码, 参数: [6711], [4310861], [1], [4309052], [0], 
EXP-00083: 调用 SYS.DBMS_AW_EXP.schema_info_exp 时出现前一问题
. 正在导出用户 XIFENFEI 的外部函数库名
. 导出 PUBLIC 类型同义词
. 正在导出专用类型同义词
EXP-00008: 遇到 ORACLE 错误 1578
ORA-01578: ORACLE 数据块损坏 (文件号 1, 块号 11290)
ORA-01110: 数据文件 1: 'H:\BAIDUNETDISK\XIFENFEI\SYSTEM01.DBF'
EXP-00000: 导出终止失败

C:\Users\XFF>sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on 星期二 7月 16 13:33:23 2024
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


连接到:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> select object_name,object_type from dba_objects where object_id=64;

OBJECT_NAME
--------------------------------------------------------------------------------
OBJECT_TYPE
-----------------------
C_OBJ#_INTCOL#
CLUSTER

ORA-01578报错

C:\Users\XFF>exp "'/ as sysdba'" owner=XIFENFEI file=e:/XIFENFEI.dmp log=e:/XIFENFEI.log

Export: Release 19.0.0.0.0 - Production on 星期二 7月 16 13:34:07 2024
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.


连接到: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
已导出 ZHS16GBK 字符集和 AL16UTF16 NCHAR 字符集

即将导出指定的用户...
. 正在导出 pre-schema 过程对象和操作
. 正在导出用户 XIFENFEI 的外部函数库名
. 导出 PUBLIC 类型同义词
. 正在导出专用类型同义词
EXP-00008: 遇到 ORACLE 错误 1578
ORA-01578: ORACLE 数据块损坏 (文件号 1, 块号 11290)
ORA-01110: 数据文件 1: 'H:\BAIDUNETDISK\XIFENFEI\SYSTEM01.DBF'
EXP-00000: 导出终止失败

C:\Users\XFF>sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on 星期二 7月 16 13:34:21 2024
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


连接到:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
  3   WHERE FILE_ID = &FILE_ID
  4     AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
输入 file_id 的值:  1
原值    3:  WHERE FILE_ID = &FILE_ID
新值    3:  WHERE FILE_ID = 1
输入 block_id 的值:  11290
原值    4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
新值    4:    AND 11290 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1

OWNER
--------------------------------------------------------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE       TABLESPACE_NAME
------------------ ------------------------------
PARTITION_NAME
--------------------------------------------------------------------------------
SYS
I_OBJ2
INDEX              SYSTEM

SQL> create table t1 as select * from dba_objects;
create table t1 as select * from dba_objects
                                 *
第 1 行出现错误:
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01578: ORACLE 数据块损坏 (文件号 1, 块号 50842)
ORA-01110: 数据文件 1: 'H:\BAIDUNETDISK\XIFENFEI\SYSTEM01.DBF'


SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
  3   WHERE FILE_ID = &FILE_ID
  4     AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
输入 file_id 的值:  1
原值    3:  WHERE FILE_ID = &FILE_ID
新值    3:  WHERE FILE_ID = 1
输入 block_id 的值:  50842
原值    4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
新值    4:    AND 50842 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1

OWNER
--------------------------------------------------------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE       TABLESPACE_NAME
------------------ ------------------------------
PARTITION_NAME
--------------------------------------------------------------------------------
SYS
I_COL3
INDEX              SYSTEM

通过上述分析,确认还有I_OBJ2和I_COL3这两个核心index异常,参考:bootstrap$核心index(I_OBJ1,I_USER1,I_FILE#_BLOCK#,I_IND1,I_TS#,I_CDEF1等)异常恢复—ORA-00701错误解决 进行处理,数据库可以正常导出


连接到: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
已导出 ZHS16GBK 字符集和 AL16UTF16 NCHAR 字符集

即将导出指定的用户...
. 正在导出 pre-schema 过程对象和操作
. 正在导出用户 XIFENFEI 的外部函数库名
. 导出 PUBLIC 类型同义词
. 正在导出专用类型同义词
. 正在导出用户 XIFENFEI 的对象类型定义
即将导出 XIFENFEI 的对象...
. 正在导出数据库链接
. 正在导出序号
. 正在导出簇定义
. 即将导出 XIFENFEI 的表通过常规路径...
. . 正在导出表                    标准诊断明细
.....
导出了                                                         50213 行
…………
. . 正在导出表                    诊断旁支分类
导出了                                                             1 行
. 正在导出同义词
. 正在导出视图
. 正在导出存储过程
. 正在导出运算符
. 正在导出引用完整性约束条件
. 正在导出触发器
. 正在导出索引类型
. 正在导出位图, 功能性索引和可扩展索引
. 正在导出后期表活动
. 正在导出实体化视图
. 正在导出快照日志
. 正在导出作业队列
. 正在导出刷新组和子组
. 正在导出维
. 正在导出 post-schema 过程对象和操作
. 正在导出统计信息
成功终止导出, 没有出现警告。

ORA-01092 ORA-00604 ORA-08103故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-01092 ORA-00604 ORA-08103故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库启动报

SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00604: error occurred at recursive SQL level 1
ORA-08103: object no longer exists
进程 ID: 39348
会话 ID: 67 序列号: 29322

对应的alert日志

Mon Jul 15 10:59:46 2024
SMON: enabling cache recovery
Mon Jul 15 10:59:46 2024
Undo initialization finished serial:0 start:302658203 end:302658218 diff:15 ms (0.0 seconds)
Verifying minimum file header compatibility (11g) for tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
Mon Jul 15 10:59:46 2024
SMON: enabling tx recovery
Mon Jul 15 10:59:46 2024
Database Characterset is UTF8
Mon Jul 15 10:59:46 2024
Errors in file C:\APP\XFF\diag\rdbms\xff\xff\trace\xff_ora_46664.trc:
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-08103: 对象不再存在
Mon Jul 15 10:59:46 2024
Errors in file C:\APP\XFF\diag\rdbms\xff\xff\trace\xff_ora_46664.trc:
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-08103: 对象不再存在
Error 604 happened during db open, shutting down database
USER (ospid: 46664): terminating the instance due to error 604
Starting background process ARC2
Process ARC2 submission failed with error = 1092
Mon Jul 15 10:59:47 2024
Errors in file C:\APP\XFF\diag\rdbms\xff\xff\trace\xff_arc0_33164.trc:
ORA-00444: 后台进程 "ARC2" 启动失败
ORA-01092: ORACLE 实例终止。强制断开连接
Mon Jul 15 10:59:51 2024
Instance terminated by USER, pid = 46664
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (46664) as a result of ORA-1092

跟踪启动过程发现delete from histgrm$ where obj# = :1遭遇到ORA-08103错误

=====================
PARSING IN CURSOR #18135904 lid=0 tim=302295369306 hv=3667723989 ad='7ffda7f5b500' sqlid='2mp99nzd9u1qp'
delete from histgrm$ where obj# = :1
END OF STMT
PARSE #18135904:c=0,e=1191,p=0,cr=44,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=302295369306
BINDS #16769312:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000001 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=01144048  bln=22  avl=02  flg=05
  value=66
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=01144060  bln=22  avl=02  flg=01
  value=1
EXEC #16769312:c=0,e=85,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=3,plh=2239883476,tim=302295369571
FETCH #16769312:c=0,e=7,p=0,cr=4,cu=0,mis=0,r=1,dep=2,og=3,plh=2239883476,tim=302295369592
CLOSE #16769312:c=0,e=4,dep=2,type=3,tim=302295369610
BINDS #16769312:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000001 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=01144048  bln=22  avl=02  flg=05
  value=66
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=01144060  bln=22  avl=02  flg=01
  value=2
EXEC #16769312:c=0,e=79,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=3,plh=2239883476,tim=302295369724
FETCH #16769312:c=0,e=6,p=0,cr=4,cu=0,mis=0,r=1,dep=2,og=3,plh=2239883476,tim=302295369740
CLOSE #16769312:c=0,e=4,dep=2,type=3,tim=302295369756
BINDS #18135904:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=01145078  bln=22  avl=06  flg=05
  value=4294951147
WAIT #18135904: nam='db file sequential read' ela= 127 file#=1 block#=609 blocks=1 obj#=67 tim=302295370065
WAIT #18135904: nam='db file sequential read' ela= 188 file#=1 block#=243448 blocks=1 obj#=67 tim=302295370285
Dumping Short Stack
ksedsts()+314<-kcbzib()+17818<-kcbgtcr()+12688<-ktrgtc2()+802<-qeilbk1()+7661<-qeilsr()+185<-qerixtFetch()
…………
<-opidrv()+848<-sou2o()+94<-opimai_real()+281<-opimai()+170<-00007FFC2EAC7374<-00007FFC2FADCC91
kcbzib: dump suspect buffer, err2=8103
Encrypted block <0, 4437752> content will not be dumped. Dumping header only.
buffer tsn: 0 rdba: 0x0043b6f8 (1/243448)
scn: 0x0.0 seq: 0x01 flg: 0x05 tail: 0x00000001
frmt: 0x02 chkval: 0x11bb type: 0x00=unknown
Dump of buffer cache at level 8 for pdb=0 tsn=0 rdba=4437752
BH (0x7ffd55f95998) file#: 1 rdba: 0x0043b6f8 (1/243448) class: 1 ba: 0x7ffd5555e000
  set: 50 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,0
  dbwrid: 1 obj: 67 objn: 67 tsn: [0/0] afn: 1 hint: f
  hash: [0x7ffdaca1f3a0,0x7ffdaca1f3a0] lru: [0x7ffd55f95bc0,0x7ffda9328448]
  ckptq: [NULL] fileq: [NULL]
  objq: [0x7ffd9d61a3c0,0x7ffd9d61a3c0] objaq: [0x7ffd9d61a3b0,0x7ffd9d61a3b0]
  use: [0x7ffdaaf604f8,0x7ffdaaf604f8] wait: [NULL]
  st: READING md: EXCL tch: 0
  flags: only_sequential_access
  Using State Objects
    ----------------------------------------
    SO: 0x00007FFDAAF60470, type: 46, owner: 0x00007FFD9B3C5ED8, flag: INIT/-/-/0x00 if: 0x1 c: 0x1
     proc=0x00007FFDAB2984C0, name=buffer handle, file=kcb2.h LINE:3317, pg=0 conuid=0
    (buffer) (CR) PR: 0x00007FFDAB2984C0 FLG: 0x0 SEQ: 0x439
    class bit: 0x0
    scan scn: 0.0
     cr[0]:
     sh[0]:
    kcbbfbp: [BH: 0x00007FFD55F95998, LINK: 0x00007FFDAAF604F8]
    type: normal pin
    where: qeilwhnp: qeilbk, why: 0
EXEC #18135904:c=234375,e=235311,p=2,cr=9,cu=0,mis=1,r=0,dep=1,og=4,plh=2015116224,tim=302295604662
ERROR #18135904:err=8103 tim=302295604678
STAT #18135904 id=1 cnt=0 pid=0 pos=1 obj=0 op='DELETE  HISTGRM$ (cr=0 pr=0 pw=0 time=2 us)'
STAT #18135904 id=2 obj=67 op='INDEX RANGE SCAN I_H_OBJ#_COL# (cr=0 pr=0 pw=0 time=0 us cost=3 size=376 card=47)'
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-08103: 对象不再存在
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-08103: 对象不再存在

*** 2024-07-15 10:53:30.201
USER (ospid: 39348): terminating the instance due to error 604

因为数据库启动执行的delete from histgrm$操作不是必须的,因此在数据库启动过程中让该sql不执行,实现数据库open成功

C:\Users\XFF>sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on 星期一 7月 15 11:15:33 2024

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

已连接到空闲例程。

SQL> startup mount pfile='d:/pfile.txt'
ORACLE 例程已经启动。

Total System Global Area 6442450944 bytes
Fixed Size                  6205768 bytes
Variable Size            1493175992 bytes
Database Buffers         4932501504 bytes
Redo Buffers               10567680 bytes
数据库装载完毕。
SQL> recover database;
完成介质恢复。
SQL> alter database open;

数据库已更改。

然后再对histgrm$表对象进行处理,数据库恢复正常

数据库启动报ORA-600 6711故障分析处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:数据库启动报ORA-600 6711故障分析处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

几个月以前的一个数据库故障,今天拿出来在win上重新分析,数据库启动报ORA-600 6711错

C:\Users\XFF>SQLPLUS / AS SYSDBA

SQL*Plus: Release 12.1.0.2.0 Production on 星期日 7月 14 16:17:32 2024

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

已连接到空闲例程。

SQL> startup mount pfile='d:/pfile.txt'
ORACLE 例程已经启动。

Total System Global Area 6442450944 bytes
Fixed Size                  6205768 bytes
Variable Size            1493175992 bytes
Database Buffers         4932501504 bytes
Redo Buffers               10567680 bytes
数据库装载完毕。
SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [6711], [4436379], [1], [4436389],
[0], [], [], [], [], [], [], []
进程 ID: 44144
会话 ID: 67 序列号: 39084

根据经验该报错为:ORA-600 [6711] “Cluster Key Chain corruption”,也就是说很可能是cluster相关对象异常导致该问题.


对启动过程进行跟踪

PARSING IN CURSOR #17695456 len=189 dep=4  tim=233428646426 hv=186852205 ad='7ffda1eea168' sqlid='2tkw12w5k68vd'
select user#,password,datats#,tempts#,type#,defrole,resource$, ptime,
decode(defschclass,NULL,'DEFAULT_CONSUMER_GROUP',defschclass),
spare1,spare4,ext_username,spare2 from user$ where name=:1
END OF STMT
PARSE #17695456:c=0,e=168,p=0,cr=0,cu=0,mis=1,r=0,dep=4,og=4,plh=0,tim=233428646426
BINDS #17695456:
 Bind#0
  oacdty=01 mxl=32(03) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=871 siz=32 off=0
  kxsbbbfp=010b2df0  bln=32  avl=03  flg=05
  value="SYS"
EXEC #17695456:c=0,e=418,p=0,cr=0,cu=0,mis=1,r=0,dep=4,og=4,plh=1457651150,tim=233428646901
WAIT #17695456: nam='db file sequential read' ela= 126 file#=1 block#=417 blocks=1 obj#=46 tim=233428647046
FETCH #17695456:c=0,e=153,p=1,cr=2,cu=0,mis=0,r=1,dep=4,og=4,plh=1457651150,tim=233428647069
STAT #17695456 id=1 cnt=1 pid=0 pos=1 obj=22 op='TABLE ACCESS BY INDEX ROWID USER$ 
  (cr=2 pr=1 pw=0 time=151 us cost=1 size=139 card=1)'
STAT #17695456 id=2 cnt=1 pid=1 pos=1 obj=46 op='INDEX UNIQUE SCAN I_USER1 (cr=1 pr=1 pw=0 time=149 us)'
CLOSE #17695456:c=0,e=2,dep=4,type=0,tim=233428647111
Incident 2601 created, dump file: C:\APP\XFF\diag\rdbms\ecp\ecp\incident\incdir_2601\ecp_ora_40516_i2601.trc
ORA-00600: 内部错误代码, 参数: [6711], [4436379], [1], [4436389], [0], [], [], [], [], [], [], []

FETCH #15289752:c=2062500,e=2544215,p=13,cr=65626,cu=28,mis=0,r=0,dep=3,og=3,plh=3312420081,tim=233431176536
=====================
PARSE ERROR #387363008:len=50 dep=1 uid=0 oct=3 lid=0 tim=233431176680 err=600
select cost from resource_cost$ where resource#=:1
ORA-00600: 内部错误代码, 参数: [6711], [4436379], [1], [4436389], [0], [], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [6711], [4436379], [1], [4436389], [0], [], [], [], [], [], [], []

这个操作触发了递归查询

PARSING IN CURSOR #387319440 len=151 dep=5 lid=0 tim=233428641503 hv=2507062328 ad='7ffd9ffa23a8' sqlid='7u49y06aqxg1s'
select /*+ rule */ bucket, endpoint, col#, epvalue, epvalue_raw, ep_repeat_count from histgrm$ 
where obj#=:1 and intcol#=:2 and row#=:3 order by bucket
END OF STMT
PARSE #387319440:c=0,e=11,p=0,cr=0,cu=0,mis=0,r=0,dep=5,og=3,plh=3312420081,tim=233428641503
BINDS #387319440:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000001 frm=00 csi=00 siz=72 off=0
  kxsbbbfp=00eb2be0  bln=22  avl=02  flg=05
  value=22
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=00eb2bf8  bln=22  avl=02  flg=01
  value=2
 Bind#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000001 frm=00 csi=00 siz=0 off=48
  kxsbbbfp=00eb2c10  bln=22  avl=01  flg=01
  value=0
EXEC #387319440:c=0,e=105,p=0,cr=0,cu=0,mis=0,r=0,dep=5,og=3,plh=3312420081,tim=233428641652
WAIT #387319440: nam='db file sequential read' ela= 124 file#=1 block#=45660 blocks=1 obj#=66 tim=233428641792
FETCH #387319440:c=0,e=173,p=1,cr=3,cu=0,mis=0,r=20,dep=5,og=3,plh=3312420081,tim=233428641834
STAT #387319440 id=1 cnt=20 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=3 pr=1 pw=0 time=169 us cost=0 size=0 card=0)'
STAT #387319440 id=2 cnt=20 pid=1 pos=1 obj=66 op='TABLE ACCESS CLUSTER HISTGRM$ (cr=3 pr=1 pw=0 time=148 us)'
STAT #387319440 id=3 cnt=1 pid=2 pos=1 obj=65 op='INDEX UNIQUE SCAN I_OBJ#_INTCOL# (cr=2 pr=0 pw=0 time=2 us)'
CLOSE #387319440:c=0,e=36,dep=5,type=3,tim=233428641886

查看对应的trace文件

[TOC00000]
Jump to table of contents
Dump continued from file: C:\APP\XFF\diag\rdbms\ecp\ecp\trace\ecp_ora_40516.trc
[TOC00001]
ORA-00600: 内部错误代码, 参数: [6711], [4436379], [1], [4436389], [0], [], [], [], [], [], [], []

[TOC00001-END]
[TOC00002]
========= Dump for incident 2601 (ORA 600 [6711]) ========
[TOC00003]
----- Beginning of Customized Incident Dump(s) -----
kdsDumpState: cdb: 0 dspdb: 0 type: 3
*** ENTER: kds state dump ***
            row 0x0043b1a5.28 continuation at: 0x0043b1a5.0 file# 1 block# 242085 slot 0 (dscnt: 0)
KDSTABN_GET: 1 ..... ntab: 2
curSlot: 0 ..... nrows: 40
Dumping kcb descriptor:
kcbds 0x0000000017100DF0 : tsn 0, rdba 0x0043b1a5, afn 1, objd 64, cls 1, tidflg 0x0 0x0 0x0
    dsflg 0x00100000, dsflg2 0x00004000, lobid 00000000:00000000, cnt 0, addr 0x00007FFD55D1C014 dx 0x0000000000000000
    env [0x0000000017178C7C]: (scn: 0x0000.54290647   xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  
    statement num=0  parent xid:  0x0000.000.00000000  st-scn: 0x0000.00000000  
    hi-scn: 0x0000.00000000  ma-scn: 0x0000.00000000  flg: 0x00000660)
kcb_dw_scan_dumpctx: not in DW scan
kdsgrp1_dump database not fully open
*** EXIT: kds state dump ***
----- End of Customized Incident Dump(s) -----
[TOC00003-END]

通过对相关rdba进行dump分析,确认对象id为64和trace中报的信息匹配

DUL> rdba 0x0043b1a5

  rdba   : 0x0043b1a5=4436389
  rfile# : 1
  block# : 242085

DUL> dump datafile 1 block 242085 header
Block Header:
block type=0x06 (table/index/cluster segment data block)
block format=0xa2 (oracle 10)
block rdba=0x0043b1a5 (file#=1, block#=242085)
scn=0x0000.438d4a86, seq=1, tail=0x4a860601
block checksum value=0xd591=54673, flag=6
Data Block Header Dump:
 Object id on Block? Y
 seg/obj: 0x40=64  csc: 0x00.438d4a80  itc: 2  flg: -  typ: 1 (data)
     fsl: 0  fnx: 0x0 ver: 0x01

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0002.01f.00014b92  0x00c01897.6e20.07  C---    0  scn 0x0000.438c5fca
0x02   0x000a.01a.0011bb8e  0x00c0292c.0317.42  --U-   22  fsc 0x0000.438d4a86
Data Block Dump:
================
flag=0x0 --------
ntab=2
nrow=41
frre=23
fsbo=0x68
ffeo=0xb90
avsp=0x1ce1
tosp=0x1ce1

进一步分析该id为什么对象,使用dul unload obj$
c_obj#_intcol#


确认对对象为cluster C_OBJ#_INTCOL#,对应的表为HISTGRM$(统计信息中存储直方图信息表),明白这一些,处理起来就比较容易了,open数据库过程中绕过该对象访问,然后对该表进行处理即可

RMAN SBT_TAPE备份无法被DISK通道识别

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:RMAN SBT_TAPE备份无法被DISK通道识别

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

经过测试确认rman SBT_TAPE通道备份的rman备份集无法被DISK通道(文件系统方式备份)方式恢复,关于rman的SBT_TAPE通道(带库方式备份)备份参考:模拟带库实现rman远程备份
发起SBT_TAPE通道备份备份

[oracle@xifenfei ~]$ rman target /

Recovery Manager: Release 11.2.0.4.0 - Production on Sun Jul 14 13:19:03 2024

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: XIFENFEI (DBID=1780931490)

RMAN> CONFIGURE DEFAULT DEVICE TYPE TO SBT_TAPE;

using target database control file instead of recovery catalog
old RMAN configuration parameters:
CONFIGURE DEFAULT DEVICE TYPE TO DISK;
new RMAN configuration parameters:
CONFIGURE DEFAULT DEVICE TYPE TO 'SBT_TAPE';
new RMAN configuration parameters are successfully stored

RMAN> backup  format 'ctl_%T_%U.rman' current controlfile;

Starting backup at 14-JUL-24
allocated channel: ORA_SBT_TAPE_1
channel ORA_SBT_TAPE_1: SID=147 device type=SBT_TAPE
channel ORA_SBT_TAPE_1: SBT/SSH2-SFTP
channel ORA_SBT_TAPE_1: starting full datafile backup set
channel ORA_SBT_TAPE_1: specifying datafile(s) in backup set
including current control file in backup set
channel ORA_SBT_TAPE_1: starting piece 1 at 14-JUL-24
channel ORA_SBT_TAPE_1: finished piece 1 at 14-JUL-24
piece handle=ctl_20240714_0r2vt3eh_1_1.rman tag=TAG20240714T131913 comment=API Version 2.0,MMS Version 1.0.9.0
channel ORA_SBT_TAPE_1: backup set complete, elapsed time: 00:00:01
Finished backup at 14-JUL-24

确认备份文件在文件系统中位置

[oracle@xifenfei ~]$ ls -l /tmp/rmanback/ctl_20240714_0r2vt3eh_1_1.rman
-rw-r--r-- 1 root root 9961472 Jul 14 13:19 /tmp/rmanback/ctl_20240714_0r2vt3eh_1_1.rman

尝试DISK通道加载备份集,报RMAN-07519错误

RMAN> catalog start with '/tmp/rmanback/ctl_20240714_0r2vt3eh_1_1.rman';

using target database control file instead of recovery catalog
searching for all files that match the pattern /tmp/rmanback/ctl_20240714_0r2vt3eh_1_1.rman

List of Files Unknown to the Database
=====================================
File Name: /tmp/rmanback/ctl_20240714_0r2vt3eh_1_1.rman

Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
no files cataloged

List of Files Which Where Not Cataloged
=======================================
File Name: /tmp/rmanback/ctl_20240714_0r2vt3eh_1_1.rman
  RMAN-07519: Reason: Error while cataloging. See alert.log.

查看alert日志信息报ORA-27048: skgfifi: file header information is invalid 等错误

Sun Jul 14 13:28:22 2024
Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_13260.trc:
ORA-19624: operation failed, retry possible
ORA-19870: error while restoring backup piece /tmp/rmanback/ctl_20240714_0r2vt3eh_1_1.rman
ORA-19505: failed to identify file "/tmp/rmanback/ctl_20240714_0r2vt3eh_1_1.rman"
ORA-27048: skgfifi: file header information is invalid
Additional information: 8
ORA-27048: skgfifi: file header information is invalid
Additional information: 8
ORA-27048: skgfifi: file header information is invalid
Additional information: 8

通过分析备份文件发现在带库方式备份和文件系统方式备份的文件头信息明显不一样
disk

tape


ORA-27154 ORA-27300 ORA-27301 ORA-27302故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-27154 ORA-27300 ORA-27301 ORA-27302故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

根据经验对系统的内核参数做了一些调整,结果导致数据库启动失败提示报ORA-27154 ORA-27300 ORA-27301 ORA-27302错误

ORA-27154: post/wait create failed
ORA-27300: OS system dependent operation:semget failed with status: 28
ORA-27301: OS failure message: No space left on device
ORA-27302: failure occurred at: sskgpcreates

ORA-27154


根据官方描述:Database Startup Fails with ORA-27300: OS system dependent operation:semget failed with status: 28 (Doc ID 949468.1),出现该问题原因可能是由于kernel.sem参数配置不合适当时,该库的processes配置为:20000,kernel.sem参数配置为:kernel.sem = 250 32000 100 128,参数说明:

kernel.sem = SEMMSL SEMMNS SEMOPM SEMMNI
SEMMSL - max semaphores per array
SEMMNS - max semaphores system wide
SEMOPM - max ops per semop call
SEMMNI - max number of arrays

理论上每个这样的配置最大值SEMMSL*SEMMNI=32000大于process的20000的设置,可是实际上控制每个信号集的信号数量没有达到250,而是只有156,通过ipcs命令可以看

[oracle@xifenfei ~]$ ipcs

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      
0x00000000 32768      oracle     640        33554432   30                      
0x00000000 65537      oracle     640        4261412864 30                      
0xc2d167d0 98306      oracle     640        2097152    30                      
0x00000072 131075     root       444        1          1                       

------ Semaphore Arrays --------
key        semid      owner      perms      nsems   
0x450e15bd 0 	      root       666        1
0x0000cace 32769      root       666        1
0x358b172c 327683     oracle     660        104
0x9053d038 11075588   oracle     660        156
0x9053d039 11108357   oracle     660        156
0x9053d03a 11141126   oracle     660        156
0x9053d03b 11173895   oracle     660        156

从而使得SEMMSL*SEMMNI小于processes值,进而数据库启动报ORA-27154 ORA-27300 ORA-27301 ORA-27302,修改kernel.sem = 250 64000 128 256,数据库启动成功

Patch SCN工具for Linux

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Patch SCN工具for Linux

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

前几年开发了图形化的win平台的patch scn工具:一键修改Oracle SCN工具升级(patch scn)
20221006220814
最近基于linux平台开发了命令行方式的Patch scn工具,大概使用如下:
1. 数据库必须处于mount状态才能修改scn

[oracle@iZbp11c0qyuuo1gr7j98upZ tmp]$ ./Patch_SCN   328 247884300
ERROR:Oracle Database must be in Mount state in order to modify SCN

2. Patch_SCN参数提示

[oracle@iZbp11c0qyuuo1gr7j98upZ tmp]$ ./Patch_SCN   
Use Parameters: PID  SCN(10)  [ADDR(16,don't 0x)]

3.修改SCN 具体操作

1)启动数据库到mount状态
SQL> startup mount;
ORACLE instance started.

Total System Global Area  551165952 bytes
Fixed Size                  2255112 bytes
Variable Size             369100536 bytes
Database Buffers          171966464 bytes
Redo Buffers                7843840 bytes
Database mounted.

2)查询数据库当前scn
SQL> select CHECKPOINT_CHANGE# from v$database;

CHECKPOINT_CHANGE#
------------------
           2478843

3)新会话中修改数据库scn
[oracle@iZbp11c0qyuuo1gr7j98upZ tmp]$ ./Patch_SCN   328 247884300
Please press ENTER to continue...
Modify the Oracle SCN value to:EC66A0C:247884300

4)启动数据库并查询scn
SQL> ALTER DATABASE OPEN;

Database altered.

SQL>  select CHECKPOINT_CHANGE# from v$database;

CHECKPOINT_CHANGE#
------------------
         247884301 (比修改scn稍微大由于数据库已经启动会自动增加scn值)

这个小工具直接通过内存地址修改scn,绕过Oracle在一些版本中的oradebug的限制:oradebug poke ORA-32521/ORA-32519故障解决,软件还处于测试阶段,后续稳定后将对外提供软件形式服务.

awr创建snapshot等待library cache: mutex X

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:awr创建snapshot等待library cache: mutex X

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户一个11.2.0.4的库,在准备收集awr的时候发现没有snap id
awr


人工创建snapshot发现hang住了
awr_snap

查询该会话等待事件为:library cache: mutex X,查看以前mmon的子进程m000/1的trace信息

Trace file /u01/app/oracle/diag/rdbms/xff/xff/trace/xff_m000_6241.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1
System name:    Linux
Node name:      HIS
Release:        5.4.17-2102.201.3.el7uek.x86_64
Version:        #2 SMP Fri Apr 23 09:05:55 PDT 2021
Machine:        x86_64
VM name:        VMWare Version: 6
Instance name: xff
Redo thread mounted by this instance: 1
Oracle process number: 5714
Unix process pid: 6241, image: oracle@HIS (M000)


*** 2024-06-19 11:44:39.483
*** SESSION ID:(8709.38013) 2024-06-19 11:44:39.483
*** CLIENT ID:() 2024-06-19 11:44:39.483
*** SERVICE NAME:(SYS$BACKGROUND) 2024-06-19 11:44:39.483
*** MODULE NAME:(MMON_SLAVE) 2024-06-19 11:44:39.483
*** ACTION NAME:(Auto-Flush Slave Action) 2024-06-19 11:44:39.483

DDE rules only execution for: ORA 12751
----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
Executing ASYNC actions
----- START DDE Action: 'ORA_12751_DUMP' (Sync) -----
Runtime exceeded 900 seconds
Time limit violation detected at:
ksedsts()+465<-kspol_12751_dump()+145<-dbgdaExecuteAction()+1065<-dbgerRunAction()+109<-dbgerRunActions()
+4134<-dbgexPhaseII()+1873<-dbgexProcessError()+2680<-dbgeExecuteForError()+88<-dbgePostErrorKGE()+2136<-
dbkePostKGE_kgsf()+71<-kgeselv()+276<-kgesecl0()+139<-kgxWait()+1412<-kgxExclusive()+447<-
kglGetMutex()+140<-kglGetHandleReference()+69<-kglic0()+319<-kksIterCursorStat()+330<-kewrrtsq_rank_topsql()
+240<-kewrbtsq_build_topsql()+128<-kewrftsq_flush_topsql()+679<-kewrft_flush_table()+397<-
kewrftec_flush_table_ehdlcx()+766<-kewrfat_flush_all_tables()+1406<-kewrfos_flush_onesnap()+170
<-kewrfsc_flush_snapshot_c()+623<-kewrafs_auto_flush_slave()+769<-kebm_slave_main()+586<-ksvrdp()+1766
<-opirip()+674<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201
<-__libc_start_main()+245
Current Wait Stack:
 0: waiting for 'library cache: mutex X'
    idn=0x644e2de0, value=0xf3a00000000, where=0x7c
    wait_id=1189 seq_num=1190 snap_id=1
    wait times: snap=3 min 0 sec, exc=3 min 0 sec, total=3 min 0 sec
    wait times: max=infinite, heur=15 min 3 sec
    wait counts: calls=16376 os=16376
    in_wait=1 iflags=0x15b2
There is at least one session blocking this session.
  Dumping 1 direct blocker(s):
    inst: 1, sid: 3898, ser: 47299
  Dumping final blocker:
    inst: 1, sid: 3898, ser: 47299
Wait State:
  fixed_waits=0 flags=0x22 boundary=(nil)/-1
Session Wait History:
    elapsed time of 0.000016 sec since current wait
 0: waited for 'library cache: mutex X'
    idn=0x644e2de0, value=0xf3a00000000, where=0x7c
    wait_id=1188 seq_num=1189 snap_id=1
    wait times: snap=12 min 2 sec, exc=12 min 2 sec, total=12 min 2 sec
    wait times: max=infinite
    wait counts: calls=65535 os=65535
    occurred after 0.327543 sec of elapsed time
 1: waited for 'db file sequential read'
    file#=0x2, block#=0x1a5b, blocks=0x1
    wait_id=1187 seq_num=1188 snap_id=1
    wait times: snap=0.000420 sec, exc=0.000420 sec, total=0.000420 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.000251 sec of elapsed time
 2: waited for 'db file sequential read'
    file#=0x1, block#=0x82e6, blocks=0x1
    wait_id=1186 seq_num=1187 snap_id=1
    wait times: snap=0.000429 sec, exc=0.000429 sec, total=0.000429 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.001085 sec of elapsed time
 3: waited for 'db file sequential read'
    file#=0x2, block#=0x11344, blocks=0x1
    wait_id=1185 seq_num=1186 snap_id=1
    wait times: snap=0.000356 sec, exc=0.000356 sec, total=0.000356 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.000008 sec of elapsed time
 4: waited for 'db file sequential read'
    file#=0x2, block#=0x19eb, blocks=0x1
    wait_id=1184 seq_num=1185 snap_id=1
    wait times: snap=0.000397 sec, exc=0.000397 sec, total=0.000397 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.000044 sec of elapsed time
 5: waited for 'db file sequential read'
    file#=0x2, block#=0xb1659, blocks=0x1
    wait_id=1183 seq_num=1184 snap_id=1
    wait times: snap=0.000003 sec, exc=0.000003 sec, total=0.000003 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.000010 sec of elapsed time
 6: waited for 'db file sequential read'
    file#=0x2, block#=0xb1658, blocks=0x1
    wait_id=1182 seq_num=1183 snap_id=1
    wait times: snap=0.000453 sec, exc=0.000453 sec, total=0.000453 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.000009 sec of elapsed time
 7: waited for 'db file sequential read'
    file#=0x2, block#=0x19e1, blocks=0x1
    wait_id=1181 seq_num=1182 snap_id=1
    wait times: snap=0.000388 sec, exc=0.000388 sec, total=0.000388 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.000017 sec of elapsed time
 8: waited for 'db file sequential read'
    file#=0x2, block#=0x19e2, blocks=0x1
    wait_id=1180 seq_num=1181 snap_id=1
    wait times: snap=0.000415 sec, exc=0.000415 sec, total=0.000415 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.004826 sec of elapsed time
 9: waited for 'db file sequential read'
    file#=0x2, block#=0x2ffc0c, blocks=0x1
    wait_id=1179 seq_num=1180 snap_id=1
    wait times: snap=0.000404 sec, exc=0.000404 sec, total=0.000404 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.000007 sec of elapsed time
Sampled Session History of session 8709 serial 38013
---------------------------------------------------
The sampled session history is constructed by sampling
the target session every 1 second. The sampling process
captures at each sample if the session is in a non-idle wait,
an idle wait, or not in a wait. If the session is in a
non-idle wait then one interval is shown for all the samples
the session was in the same non-idle wait. If the
session is in an idle wait or not in a wait for
consecutive samples then one interval is shown for all
the consecutive samples. Though we display these consecutive
samples  in a single interval the session may NOT be continuously
idle or not in a wait (the sampling process does not know).

The history is displayed in reverse chronological order.

sample interval: 1 sec, max history 120 sec
---------------------------------------------------
  [118 samples,                                            11:42:39 - 11:44:39]
    waited for 'library cache: mutex X', seq_num: 1190
      p1: 'idn'=0x644e2de0
      p2: 'value'=0xf3a00000000
      p3: 'where'=0x7c
      time_waited: >= 120 sec (still in wait)
  [3 samples,                                              11:42:39 - 11:42:38]
    idle wait at each sample
---------------------------------------------------
Sampled Session History Summary:
  longest_non_idle_wait: 'library cache: mutex X'
  [118 samples, 11:42:39 - 11:44:39]
      time_waited: >= 120 sec (still in wait)
---------------------------------------------------
----- END DDE Action: 'ORA_12751_DUMP' (SUCCESS, 8 csec) -----
----- END DDE Actions Dump (total 8 csec) -----
KGX cleanup...
KGX Atomic Operation Log 0x1de44da670
 Mutex 0x1d113cf7c8(8709, 0) idn 2de0 oper EXCL(6)
 Library Cache uid 8709 efd 7 whr 49 slp 0
 oper=0 pt1=(nil) pt2=(nil) pt3=(nil)
 pt4=(nil) pt5=(nil) ub4=0
KGX cleanup...
KGX Atomic Operation Log 0x1de44da6c8
 Mutex 0x1de9468550(3898, 0) idn 644e2de0 oper GET_EXCL(5)
 Library Cache uid 8709 efd 7 whr 124 slp 16376
 oper=0 pt1=0x1de9468410 pt2=(nil) pt3=(nil)
 pt4=(nil) pt5=(nil) ub4=0
*** KEWRAFM1: Error=12751 encountered by kewrfteh
*** KEWRAFS: Error=12751 encountered by Auto Flush Slave.
KEBM: MMON slave action policy violation. kewrmafsa_; viol=1; err=12751

一般来说类似这样的系统自动任务被阻塞很可能是由于那种bug导致,找到相关mos文档: library cache: mutex x waits during AWR Flush High Cursor Scan (Doc ID 2382741.1),确认为:Bug 19294556 AWR Flush Waiting For Cursor Scan, Library Cache : Mutex X,目前没有好的workaround,而且在11.2.0.4基础版上没有对应的patch

exadata换flash卡的一些操作

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:exadata换flash卡的一些操作

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户有一套oracle exadata x3-2的1/4配置(采用高容量磁盘)的机器,反馈由于flash卡异常导致性能很慢,通过临时关闭异常卡所在机器业务恢复正常
xd_flash_error


相关版本信息

[root@oa0cel03 ~]# imageinfo

Kernel version: 2.6.32-400.11.1.el5uek #1 SMP Thu Nov 22 03:29:09 PST 2012 x86_64
Cell version: OSS_11.2.3.2.1_LINUX.X64_130109
Cell rpm version: cell-11.2.3.2.1_LINUX.X64_130109-1

Active image version: 11.2.3.2.1.130109
Active image activated: 2013-06-27 02:24:19 -0700
Active image status: success
Active system partition on device: /dev/md6
Active software partition on device: /dev/md8

In partition rollback: Impossible

Cell boot usb partition: /dev/sdm1
Cell boot usb version: 11.2.3.2.1.130109

Inactive image version: 11.2.3.2.0.120713
Inactive image activated: 2012-10-14 06:46:16 -0700
Inactive image status: success
Inactive system partition on device: /dev/md5
Inactive software partition on device: /dev/md7

[root@oa0cel03 ~]# cellcli
CellCLI: Release 11.2.3.2.1 - Production on Thu Jun 20 18:28:37 CST 2024

Copyright (c) 2007, 2012, Oracle.  All rights reserved.
Cell Efficiency Ratio: 3,617

CellCLI> list cell detail
	 name:              	 oa0cel03
	 bbuTempThreshold:  	 60
	 bbuChargeThreshold:	 800
	 bmcType:           	 IPMI
	 cellVersion:       	 OSS_11.2.3.2.1_LINUX.X64_130109
	 cpuCount:          	 24
	 diagHistoryDays:   	 7
	 fanCount:          	 8/8
	 fanStatus:         	 normal
	 flashCacheMode:    	 WriteBack
	 id:                	 1238FM507A
	 interconnectCount: 	 3
	 interconnect1:     	 bondib0
	 iormBoost:         	 0.0
	 ipaddress1:        	 192.168.10.5/22
	 kernelVersion:     	 2.6.32-400.11.1.el5uek
	 locatorLEDStatus:  	 off
	 makeModel:         	 Oracle Corporation SUN FIRE X4270 M3 SAS
	 metricHistoryDays: 	 7
	 notificationMethod:	 snmp
	 notificationPolicy:	 critical,warning,clear
	 offloadEfficiency: 	 3,616.5
	 powerCount:        	 2/2
	 powerStatus:       	 normal
	 releaseVersion:    	 11.2.3.2.1
	 releaseTrackingBug:	 14522699
	 snmpSubscriber:    	 host=oa0db02.qhsrmyy.com,port=3872,community=cell
	                    	 host=oa0db01.qhsrmyy.com,port=3872,community=cell
	 status:            	 online
	 temperatureReading:	 28.0
	 temperatureStatus: 	 normal
	 upTime:            	 0 days, 3:49
	 cellsrvStatus:     	 running
	 msStatus:          	 running
	 rsStatus:          	 running

客户第一次换盘之后,依旧有性能问题,先把griddisk给inactive

[root@oa0cel03 ~]# cellcli -e list metriccurrent attributes name,metricvalue where name like \'FC_BY_DIRTY.*\'
	 FC_BY_DIRTY	 38,820 MB
[root@oa0cel03 ~]# cellcli -e "alter flashcache all flush"
Flash cache on FD_00_oa0cel03 successfully altered
Flash cache on FD_01_oa0cel03 successfully altered
Flash cache on FD_02_oa0cel03 successfully altered
Flash cache on FD_03_oa0cel03 successfully altered
Flash cache on FD_04_oa0cel03 successfully altered
Flash cache on FD_05_oa0cel03 successfully altered
Flash cache on FD_06_oa0cel03 successfully altered
Flash cache on FD_07_oa0cel03 successfully altered
Flash cache on FD_09_exastlx01 successfully altered
Flash cache on FD_10_exastlx01 successfully altered
Flash cache on FD_11_oa0cel03 skipped because FD_11_oa0cel03 is degraded
Flash cache on FD_12_oa0cel03 successfully altered
Flash cache on FD_13_oa0cel03 successfully altered
Flash cache on FD_14_oa0cel03 successfully altered
Flash cache on FD_15_oa0cel03 successfully altered
[root@oa0cel03 ~]# cellcli -e list metriccurrent attributes name,metricvalue where name like \'FC_BY_DIRTY.*\'
	 FC_BY_DIRTY	 0.000 MB
[root@oa0cel03 ~]# cellcli -e "alter griddisk all inactive"
GridDisk DATA_oa0_CD_00_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_01_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_02_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_03_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_04_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_05_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_06_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_07_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_08_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_09_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_10_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_11_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_02_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_03_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_04_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_05_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_06_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_07_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_08_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_09_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_10_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_11_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_00_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_01_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_02_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_03_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_04_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_05_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_06_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_07_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_08_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_09_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_10_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_11_oa0cel03 successfully altered
[root@oa0cel03 ~]# cellcli -e list griddisk
	 DATA_oa0_CD_00_oa0cel03	 inactive
	 DATA_oa0_CD_01_oa0cel03	 inactive
	 DATA_oa0_CD_02_oa0cel03	 inactive
	 DATA_oa0_CD_03_oa0cel03	 inactive
	 DATA_oa0_CD_04_oa0cel03	 inactive
	 DATA_oa0_CD_05_oa0cel03	 inactive
	 DATA_oa0_CD_06_oa0cel03	 inactive
	 DATA_oa0_CD_07_oa0cel03	 inactive
	 DATA_oa0_CD_08_oa0cel03	 inactive
	 DATA_oa0_CD_09_oa0cel03	 inactive
	 DATA_oa0_CD_10_oa0cel03	 inactive
	 DATA_oa0_CD_11_oa0cel03	 inactive
	 DBFS_DG_CD_02_oa0cel03  	 inactive
	 DBFS_DG_CD_03_oa0cel03  	 inactive
	 DBFS_DG_CD_04_oa0cel03  	 inactive
	 DBFS_DG_CD_05_oa0cel03  	 inactive
	 DBFS_DG_CD_06_oa0cel03  	 inactive
	 DBFS_DG_CD_07_oa0cel03  	 inactive
	 DBFS_DG_CD_08_oa0cel03  	 inactive
	 DBFS_DG_CD_09_oa0cel03  	 inactive
	 DBFS_DG_CD_10_oa0cel03  	 inactive
	 DBFS_DG_CD_11_oa0cel03  	 inactive
	 RECO_oa0_CD_00_oa0cel03	 inactive
	 RECO_oa0_CD_01_oa0cel03	 inactive
	 RECO_oa0_CD_02_oa0cel03	 inactive
	 RECO_oa0_CD_03_oa0cel03	 inactive
	 RECO_oa0_CD_04_oa0cel03	 inactive
	 RECO_oa0_CD_05_oa0cel03	 inactive
	 RECO_oa0_CD_06_oa0cel03	 inactive
	 RECO_oa0_CD_07_oa0cel03	 inactive
	 RECO_oa0_CD_08_oa0cel03	 inactive
	 RECO_oa0_CD_09_oa0cel03	 inactive
	 RECO_oa0_CD_10_oa0cel03	 inactive
	 RECO_oa0_CD_11_oa0cel03	 inactive
[root@oa0cel03 ~]#  cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome 
	 DATA_oa0_CD_00_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_01_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_02_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_03_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_04_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_05_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_06_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_07_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_08_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_09_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_10_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_11_oa0cel03	 OFFLINE	 Yes
	 DBFS_DG_CD_02_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_03_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_04_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_05_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_06_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_07_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_08_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_09_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_10_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_11_oa0cel03  	 OFFLINE	 Yes
	 RECO_oa0_CD_00_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_01_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_02_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_03_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_04_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_05_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_06_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_07_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_08_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_09_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_10_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_11_oa0cel03	 OFFLINE	 Yes

客户继续换卡尝试,最终确认4号卡槽损坏,放弃这个槽位重建flashcache

[root@oa0cel03 ~]# cellcli -e list celldisk
	 CD_00_oa0cel03	 normal
	 CD_01_oa0cel03	 normal
	 CD_02_oa0cel03	 normal
	 CD_03_oa0cel03	 normal
	 CD_04_oa0cel03	 normal
	 CD_05_oa0cel03	 normal
	 CD_06_oa0cel03	 normal
	 CD_07_oa0cel03	 normal
	 CD_08_oa0cel03	 normal
	 CD_09_oa0cel03	 normal
	 CD_10_oa0cel03	 normal
	 CD_11_oa0cel03	 normal
	 FD_00_oa0cel03	 not present
	 FD_01_oa0cel03	 not present
	 FD_02_oa0cel03	 not present
	 FD_03_oa0cel03	 not present
	 FD_04_oa0cel03	 normal
	 FD_05_oa0cel03	 normal
	 FD_06_oa0cel03	 normal
	 FD_07_oa0cel03	 normal
	 FD_08_oa0cel03	 normal
	 FD_09_oa0cel03	 normal
	 FD_10_oa0cel03	 normal
	 FD_10_exastlx01 normal
	 FD_12_oa0cel03	 normal
	 FD_13_oa0cel03	 normal
	 FD_14_oa0cel03	 normal
	 FD_15_oa0cel03	 normal

这个里面FD_10_exastlx01名字是以前老的卡上面留下来的,太影响视觉感官了,删除重建

[root@oa0cel03 ~]# cellcli -e drop celldisk FD_10_exastlx01
CellDisk FD_10_exastlx01 successfully dropped
[root@oa0cel03 ~]# cellcli -e create celldisk all flashdisk
CellDisk FD_11_oa0cel03 successfully created
[root@oa0cel03 ~]# cellcli -e list celldisk
	 CD_00_oa0cel03	 normal
	 CD_01_oa0cel03	 normal
	 CD_02_oa0cel03	 normal
	 CD_03_oa0cel03	 normal
	 CD_04_oa0cel03	 normal
	 CD_05_oa0cel03	 normal
	 CD_06_oa0cel03	 normal
	 CD_07_oa0cel03	 normal
	 CD_08_oa0cel03	 normal
	 CD_09_oa0cel03	 normal
	 CD_10_oa0cel03	 normal
	 CD_11_oa0cel03	 normal
	 FD_00_oa0cel03	 not present
	 FD_01_oa0cel03	 not present
	 FD_02_oa0cel03	 not present
	 FD_03_oa0cel03	 not present
	 FD_04_oa0cel03	 normal
	 FD_05_oa0cel03	 normal
	 FD_06_oa0cel03	 normal
	 FD_07_oa0cel03	 normal
	 FD_08_oa0cel03	 normal
	 FD_09_oa0cel03	 normal
	 FD_10_oa0cel03	 normal
	 FD_11_oa0cel03	 normal
	 FD_12_oa0cel03	 normal
	 FD_13_oa0cel03	 normal
	 FD_14_oa0cel03	 normal
	 FD_15_oa0cel03	 normal

删除flashlog和flashcache

[root@oa0cel03 ~]# cellcli -e drop flashlog
Flash log oa0cel03_FLASHLOG successfully dropped
[root@oa0cel03 ~]# 
[root@oa0cel03 ~]# 
[root@oa0cel03 ~]# 
[root@oa0cel03 ~]# cellcli -e drop flashcache
Flash cache oa0cel03_FLASHCACHE successfully dropped

尝试重建flashlog和flashcache

[root@oa0cel03 ~]# cellcli -e create flashlog all size=512M
Flash log oa0cel03_FLASHLOG successfully created, but the following cell disks were degraded because their 
statuses are not normal: FD_00_oa0cel03, FD_01_oa0cel03, FD_02_oa0cel03, FD_03_oa0cel03

由于有一些celldisk实际硬盘不存在,无法直接创建成功,需要删除对应的celldisk

[root@oa0cel03 ~]# cellcli -e drop celldisk FD_00_oa0cel03

CELL-04519: Cannot complete the drop of cell disk: FD_00_oa0cel03. Received error: 
CELL-04516: LUN Object cannot be obtained for cell disk: FD_00_oa0cel03 
Cell disks not dropped: FD_00_oa0cel03 

--强制删除
[root@oa0cel03 ~]# cellcli -e drop celldisk FD_00_oa0cel03  force

CellDisk FD_00_oa0cel03 successfully dropped
[root@oa0cel03 ~]# cellcli -e drop celldisk FD_01_oa0cel03  force
CellDisk FD_01_oa0cel03 successfully dropped
[root@oa0cel03 ~]# cellcli -e drop celldisk FD_02_oa0cel03  force
CellDisk FD_02_oa0cel03 successfully dropped
[root@oa0cel03 ~]# cellcli -e drop celldisk FD_03_oa0cel03  force
CellDisk FD_03_oa0cel03 successfully dropped
[root@oa0cel03 ~]# cellcli -e list celldisk
	 CD_00_oa0cel03	 normal
	 CD_01_oa0cel03	 normal
	 CD_02_oa0cel03	 normal
	 CD_03_oa0cel03	 normal
	 CD_04_oa0cel03	 normal
	 CD_05_oa0cel03	 normal
	 CD_06_oa0cel03	 normal
	 CD_07_oa0cel03	 normal
	 CD_08_oa0cel03	 normal
	 CD_09_oa0cel03	 normal
	 CD_10_oa0cel03	 normal
	 CD_11_oa0cel03	 normal
	 FD_04_oa0cel03	 normal
	 FD_05_oa0cel03	 normal
	 FD_06_oa0cel03	 normal
	 FD_07_oa0cel03	 normal
	 FD_08_oa0cel03	 normal
	 FD_09_oa0cel03	 normal
	 FD_10_oa0cel03	 normal
	 FD_11_oa0cel03	 normal
	 FD_12_oa0cel03	 normal
	 FD_13_oa0cel03	 normal
	 FD_14_oa0cel03	 normal
	 FD_15_oa0cel03	 normal

创建flashlog和flashcache

[root@oa0cel03 ~]# cellcli -e create flashlog all size=512M
Flash log oa0cel03_FLASHLOG successfully created
[root@oa0cel03 ~]# cellcli -e list flashlog detail
	 name:              	 oa0cel03_FLASHLOG
	 cellDisk:               …………
	 creationTime:      	 2024-06-21T18:20:51+08:00
	 degradedCelldisks: 	 
	 effectiveSize:     	 384M
	 efficiency:        	 100.0
	 id:                	 f3ab3882-fa03-4f49-b0ca-879ef3f2ac05
	 size:              	 384M
	 status:            	 normal
[root@oa0cel03 ~]# cellcli -e create flashcache all
Flash cache oa0cel03_FLASHCACHE successfully created
[root@oa0cel03 ~]# cellcli -e list flashcache detail
	 name:              	 oa0cel03_FLASHCACHE
	 cellDisk:          	 …………
	 creationTime:      	 2024-06-21T18:21:24+08:00
	 degradedCelldisks: 	 
	 effectiveCacheSize:	 1116.5625G
	 id:                	 2195ac46-3021-461f-a6d5-5f64ff1da546
	 size:              	 1116.5625G
	 status:            	 normal
[root@oa0cel03 ~]#  cellcli -e list cell detail | grep flashCacheMode
	 flashCacheMode:    	 WriteBack

active griddisk,把这个cell的griddisk加入到asm磁盘组中

[root@oa0cel03 ~]# cellcli -e "alter griddisk all active"
GridDisk DATA_oa0_CD_00_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_01_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_02_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_03_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_04_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_05_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_06_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_07_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_08_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_09_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_10_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_11_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_02_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_03_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_04_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_05_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_06_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_07_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_08_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_09_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_10_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_11_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_00_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_01_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_02_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_03_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_04_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_05_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_06_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_07_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_08_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_09_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_10_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_11_oa0cel03 successfully altered
[root@oa0cel03 ~]# cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome,status
	 DATA_oa0_CD_00_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_01_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_02_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_03_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_04_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_05_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_06_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_07_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_08_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_09_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_10_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_11_oa0cel03	 SYNCING	 Yes	 active
	 DBFS_DG_CD_02_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_03_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_04_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_05_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_06_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_07_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_08_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_09_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_10_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_11_oa0cel03  	 ONLINE 	 Yes	 active
	 RECO_oa0_CD_00_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_01_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_02_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_03_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_04_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_05_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_06_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_07_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_08_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_09_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_10_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_11_oa0cel03	 SYNCING	 Yes	 active
[root@oa0cel03 ~]# cellcli -e list metriccurrent attributes name,metricvalue where name like \'FC_BY_DIRTY.*\'
	 FC_BY_DIRTY	 585 MB

RMAN-06207 RMAN-06208报错解决

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:RMAN-06207 RMAN-06208报错解决

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库在rman备份时执行delete noprompt expired backup报RMAN-06207和RMAN-06208错误

RMAN>delete noprompt expired backup;
RMAN retention policy will be applied to the command
RMAN retention policy is set to recovery window of 1 days
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1364 instance=hisdb1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=3102 instance=hisdb1 device type=DISK
allocated channel: ORA_DISK_3
channel ORA_DISK_3: SID=4534 instance=hisdb1 device type=DISK
allocated channel: ORA_DISK_4
channel ORA_DISK_4: SID=5054 instance=hisdb1 device type=DISK
Deleting the following obsolete backups and copies:
Type                 Key    Completion Time    Filename/Handle
-------------------- ------ ------------------ --------------------
Datafile Copy        3      15-JUN-18          +DATA/hisdb/datafile/xifenfei.282.978867569
Datafile Copy        4      15-JUN-18          +DATA/hisdb/datafile/xifenfei.283.978867589
Backup Set           54704  14-JUN-24
  Backup Piece       54704  14-JUN-24          /oraback/rmanfile/ctl_20240614_m22ta31a_1_1.rman

RMAN-06207: WARNING: 3 objects could not be deleted for DISK channel(s) due
RMAN-06208:          to mismatched status.  Use CROSSCHECK command to fix status
RMAN-06210: List of Mismatched objects
RMAN-06211: ==========================
RMAN-06212:   Object Type   Filename/Handle
RMAN-06213: --------------- ---------------------------------------------------
RMAN-06214: Datafile Copy   +DATA/hisdb/datafile/xifenfei.282.978867569
RMAN-06214: Datafile Copy   +DATA/hisdb/datafile/xifenfei.283.978867589
RMAN-06214: Backup Piece    /oraback/rmanfile/ctl_20240614_m22ta31a_1_1.rman

处理方法crosscheck copy

RMAN> crosscheck copy;

using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1364 instance=hisdb1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=3102 instance=hisdb1 device type=DISK
allocated channel: ORA_DISK_3
channel ORA_DISK_3: SID=4534 instance=hisdb1 device type=DISK
allocated channel: ORA_DISK_4
channel ORA_DISK_4: SID=5070 instance=hisdb1 device type=DISK
specification does not match any control file copy in the repository
validation succeeded for archived log
archived log file name=+FRA/hisdb/archivelog/2024_06_16/thread_2_seq_149520.1675.1171833177 
     RECID=589464 STAMP=1171833177
Crosschecked 1 objects

validation failed for datafile copy
datafile copy file name=+DATA/hisdb/datafile/xifenfei.282.978867569 RECID=3 STAMP=978867571
validation failed for datafile copy
datafile copy file name=+DATA/hisdb/datafile/xifenfei.283.978867589 RECID=4 STAMP=978867590
Crosschecked 2 objects

再次验证删除不再提示错误

RMAN>   DELETE noprompt OBSOLETE;

RMAN retention policy will be applied to the command
RMAN retention policy is set to recovery window of 1 days
using channel ORA_DISK_1
using channel ORA_DISK_2
using channel ORA_DISK_3
using channel ORA_DISK_4
no obsolete backups found

RAC主机相差超过10分钟导致crs无法启动

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:RAC主机相差超过10分钟导致crs无法启动

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户反馈有一套19c 2节点rac,断电之后,一个节点数据库无法正常启动,通过crsctl命令查看发现crs进程没有正常启动

[root@xifenf1 ~]# /u01/app/19.0/grid/bin/crsctl status res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                               STABLE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.crf
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.crsd
      1        ONLINE  OFFLINE                               STABLE
ora.cssd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.ctssd
      1        ONLINE  OFFLINE                               STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.evmd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.gipcd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.gpnpd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.mdnsd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.storage
      1        ONLINE  ONLINE       xifenf1                  STABLE
--------------------------------------------------------------------------------

查看crs的alert日志发现集群时间间隔超过600s,无法启动csst进程

2024-06-11 17:33:09.953 [OCSSD(5020)]CRS-1605: CSSD voting file is online: /dev/asm_ocr5; 
 details in /u01/app/grid/diag/crs/xifenf1/crs/trace/ocssd.trc.
2024-06-11 17:33:09.956 [OCSSD(5020)]CRS-1605: CSSD voting file is online: /dev/asm_ocr1; 
 details in /u01/app/grid/diag/crs/xifenf1/crs/trace/ocssd.trc.
2024-06-11 17:33:10.024 [OCSSD(5020)]CRS-1605: CSSD voting file is online: /dev/asm_ocr2; 
 details in /u01/app/grid/diag/crs/xifenf1/crs/trace/ocssd.trc.
2024-06-11 17:33:10.031 [OCSSD(5020)]CRS-1605: CSSD voting file is online: /dev/asm_ocr4; 
 details in /u01/app/grid/diag/crs/xifenf1/crs/trace/ocssd.trc.
2024-06-11 17:33:10.040 [OCSSD(5020)]CRS-1605: CSSD voting file is online: /dev/asm_ocr3; 
 details in /u01/app/grid/diag/crs/xifenf1/crs/trace/ocssd.trc.
2024-06-11 17:33:11.900 [OCSSD(5020)]CRS-1601: CSSD Reconfiguration complete. Active nodes are xifenf1 xifenf2 .
2024-06-11 17:33:13.344 [OCSSD(5020)]CRS-1720: Cluster Synchronization Services daemon (CSSD) is ready for operation.
2024-06-11 17:33:13.809 [OCTSSD(5488)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 5488
2024-06-11 17:33:16.017 [OCTSSD(5488)]CRS-2407: The new Cluster Time Synchronization Service reference node is host xifenf2.
2024-06-11 17:33:16.018 [OCTSSD(5488)]CRS-2401: The Cluster Time Synchronization Service started on host xifenf1.
2024-06-11 17:33:16.105 [OCTSSD(5488)]CRS-2419: The clock on host xifenf1 differs from mean cluster time by 1031504618 microseconds. 
  The Cluster Time Synchronization Service will not perform time synchronization 
  because the time difference is beyond the permissible offset of 600 seconds. 
  Details in /u01/app/grid/diag/crs/xifenf1/crs/trace/octssd.trc.
2024-06-11 17:33:16.579 [OCTSSD(5488)]CRS-2402: The Cluster Time Synchronization Service aborted on host xifenf1. 
  Details at (:ctsselect_mstm4:) in /u01/app/grid/diag/crs/xifenf1/crs/trace/octssd.trc.

查看主机时间

[grid@xifenf1 ~]$ date ;ssh xifenf2 date
Tue Jun 11 17:54:09 CST 2024
Tue Jun 11 18:04:34 CST 2024

修改主机时间

[root@xifenf1 ~]# date -s "20240611 18:06:00"
Tue Jun 11 18:06:00 CST 2024
[root@xifenf1 ~]# su - grid
Last login: Tue Jun 11 17:37:53 CST 2024 on pts/0
[grid@xifenf1 ~]$ date ;ssh xifenf2 date
Tue Jun 11 18:06:09 CST 2024
Tue Jun 11 18:05:34 CST 2024

重启crs

[root@xifenf1 ~]# /u01/app/19.0/grid/bin/crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'xifenf1'
CRS-2673: Attempting to stop 'ora.storage' on 'xifenf1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'xifenf1'
CRS-2673: Attempting to stop 'ora.crf' on 'xifenf1'
CRS-2677: Stop of 'ora.storage' on 'xifenf1' succeeded
CRS-2673: Attempting to stop 'ora.evmd' on 'xifenf1'
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'xifenf1'
CRS-2677: Stop of 'ora.mdnsd' on 'xifenf1' succeeded
CRS-2677: Stop of 'ora.crf' on 'xifenf1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'xifenf1' succeeded
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'xifenf1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'xifenf1'
CRS-2677: Stop of 'ora.cssd' on 'xifenf1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'xifenf1'
CRS-2673: Attempting to stop 'ora.gipcd' on 'xifenf1'
CRS-2677: Stop of 'ora.gpnpd' on 'xifenf1' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'xifenf1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'xifenf1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@xifenf1 ~]# /u01/app/19.0/grid/bin/crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@xifenf1 ~]# /u01/app/19.0/grid/bin/crsctl status res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.crf
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.crsd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.cssd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.ctssd
      1        ONLINE  ONLINE       xifenf1                  ACTIVE:35600,STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.evmd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.gipcd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.gpnpd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.mdnsd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.storage
      1        ONLINE  ONLINE       xifenf1                  STABLE
--------------------------------------------------------------------------------