ORA-742 写丢失常见bug记录

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-742 写丢失常见bug记录

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

我们经常会遇到数据库异常down,然后启动的时候报ORA-00742错误,一般是在recover或者open的过程中遇到

SQL> RECOVER DATABASE;
ORA-00283: 恢复会话因错误而取消
ORA-00742: 日志读取在线程 1 序列 2097 块 296728 中检测到写入丢失情况
ORA-00312: 联机日志 3 线程 1: 'D:\APP\ADMINISTRATOR\ORADATA\HIS\REDO03.LOG'
SQL> recover database;
Media recovery complete.

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00742: Log read detects lost write in thread %d sequence %d block %d
ORA-00312: online log 3 thread 1: '/oradata/shrdh/redo03.log'

关于该错误,Oracle官方解释:由于操作系统或者存储或者Oracle导致redo发生写丢失

[oracle@www.xifenfei.com:/home/oracle]$ oerr ora 742
00742, 00000, "Log read detects lost write in thread %s sequence %s block %s"
// *Cause:  Either a write issued by Oracle was lost by the underlying
//          operating system or storage system or an Oracle internal error
//          occurred.
// *Action: The trace file shows the lost write location. Dump the problematic
//          log file to see whether it is a real lost write. Contact Oracle
//          Support Services.

Oracle官方关于ORA-00742的主要bug有:
ORA-742-BUG-1
ORA-742-BUG-2
由于redo写丢失已经发生,一般发生这种情况,有备份使用备份进行不完全恢复,没有备份考虑强制拉库,如果是dg库问题,可以考虑从主库把日志重新传过去

避免 19c 数据库性能问题需要考虑的事项 (Doc ID 3050476.1)

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:避免 19c 数据库性能问题需要考虑的事项 (Doc ID 3050476.1)

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

适用于:

Oracle Cloud Infrastructure – Exadata Cloud Service
Gen 2 Exadata Cloud at Customer
Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) – 版本 N/A 和更高版本
Oracle Database – Enterprise Edition – 版本 19.0.0.0 到 19.14.0.0.0 [发行版 19]
Oracle Database Cloud Exadata Service
本文档所含信息适用于所有平台
用途

本文档的目的是发布一些推荐的修复措施,以期避免 19c 中与数据库性能相关的问题。这也包括一些影响性能的 ORA-600/ORA-7445 及其他错误。

关于 SQL 性能,请参考以下文档:

Document 2773715.1 Things to Consider to Avoid SQL Performance Problems on 19c

适用范围

本文档列出了部分影响数据库性能的已知问题。一些问题的修复包含在数据库发布更新(DBRU)中,但有些尚未包含。

提醒: 数据库发布更新(RU)具有累积性。例如,2021 年 4 月的发布更新补丁包含之前所有发布更新的内容。因此,建议始终使用包含大多数已知问题修复的最新 DBRU(数据库发布更新)。
Patch 33515361 - Database Jan 2022 Release Update (19.14) - Latest

请参考以下文档获取最新的数据库发布更新(DBRU)信息。

Document 2521164.1 Oracle Database 19c Proactive Patch Information

对于尚未包含在发布更新(RU)中的错误,请下载并安装适用于您的数据库版本和操作系统的单独补丁。如果在MOS上找不到适用于您特定版本和操作系统的补丁,请提交服务请求,提供所需补丁的详细信息。请附上已应用补丁的列表(使用 opatch lsinventory -detail 命令),以及您打算应用的其他补丁。

有关自助升级和最佳实践,请参考以下内容:-

Document 1919.2 19c Database Self-Guided Upgrade with Best Practices
Document 555.1 Oracle Database 19c Important Recommended One-off Patches

详细信息

19c 数据库性能已知 Bug 修复列表:

Bug            描述 是否包含在RU? 备注
Document 30329209.8 High wait on row cache mutex after upgrading to 12.2.0.1 and above 是,从 19.8 起 应用 19.8 或以上
Document 31933451.8 High row cache mutex contention 是,从 19.13 起 应用 19.13 或以上
Document 29523216.8 Major performance bug for dc_users row cache 是,从 19.7 起 应用 19.7 或以上
Document 30712670.8 High row cache mutex contention for queries with dblink (dc_props / dc_cdbprops) 是,从 19.10 起 应用 19.10 或以上
Document 30431274.8 High row cache mutex contention (ktatminextsz) – regression of 22909260 是,从 19.7 起 应用 19.7 或以上
Document 29628647.8 High CPU for DESCRIBE command due to contention in dc_users rowcache 是,从 19.10 起 应用 19.10 或以上
Document 32043701.8 row cache lock for sequences in RAC due to S-optimization feature for dc_sequences 申请临时性补丁
Document 30489582.8 Hanganalyze trace unable to identify the blocker of “row cache lock” in RAC 是,从 19.10 起 应用 19.10 或以上
Document 30327149.8 GEN0 process in RAC waiting on ktatminextsz while reading rowcache 是,从 19.7 起 应用 19.7 或以上
Document 30720844.8 CLMN process waits on ‘library cache: mutex X’ and/or might cause ORA-600 [kglrfcl_1] 是,从 19.8 起 应用 19.8 或以上
Document 30384121.8 LCK process in RAC holds mutex in kglHandleMessage causing database hang 是,从 19.7 起 应用 19.7 或以上
Document 32356628.8 Huge waits on ‘library cache: mutex X’ with audit enabled for ‘select any table’ privilege 是,从 19.12 起 应用 19.12 或以上
Document 28889389.8 High waits on ‘cursor:mutex X’ after upgrade 是,从 19.10 起 应用 19.10 或以上
Document 31211220.8Document 33163187.8 High version count (cursor leaks) due to BIND_EQUIV_FAILURE mismatchChild Cursor Increase Due To “Bind Mismatch” Even When Using Same Bind Values 被替换是,从 19.14 起 应用 19.14 或以上
Document 34304965.8Document 35778398.8

Document 35925654.8

[ROW CACHE] 19c Tracking Bug for Row Cache Bug Fixes – RegressedFURTHER FIXES FOR ROW CACHE ON TOP OF 34304965 – Regressed & replaced with

SESSIONS DEADLOCK ON ROW CACHE MUTEX AND SHARED POOL LATCH

N/AN/A

是,从 19.24 起

N/AN/A

应用 19.24 或以上

Document 20319830.8 Latch Get and Free Functions Available for Shared Parent-Child Latches 是,从 19.11 起 应用 19.11 或以上
Document 31602782.8 ORA-12850/ORA-12872 or huge waits on ‘cursor: pin S wait on X’ with parallel execution 是,从 19.14 起 应用 19.14 或以上
Document 31753692.8 High waits on ‘cursor: pin S wait on X’ and ‘cursor: mutex X’ with PX_MISMATCH 是,从 19.10 起 应用 19.10 或以上
Document 30293345.8 Waits for latch: MGA Shared Context Latch After Migration to 18c 或以上 是,从 19.8 起 应用 19.8 或以上
Document 30614411.8 Huge delay in LGWR BOC (Broadcast-on-commit) processing in RAC 是,从 19.8 起 应用 19.8 或以上
Document 31827912.8 High waits for ‘log file sync’ & ‘remote write sync’ in RAC ADG with Fast-Start Failover (FSFO) 是,从 19.10 起 应用 19.10 或以上
Document 31176502.8 LGWR Hangs during log switch after Upgrade to 19c in RAC due to Cache Fusion Write Hang 是,从 19.11 起 应用 19.11 或以上
Document 31331038.8 ORA-600 error in ADG SYNC mode on Exadata with PMEMlog 如果是 Exadata 申请临时补丁
Document 32249371.8 High ‘log file parallel write’ waits on Exadata due to single HCA bottleneck 是,从 19.13 起 应用 19.13 或以上
Document 32498752.8 ORA-600 [ksu_get_available_pso: numa mismatch] signaled when using parallel LGWR 是,从 19.14 起 应用 19.14 或以上
Document 30978554.8 GC hang on read-mostly object or ORA-481 in RAC 是,从 19.9 起 应用 19.9 或以上
Document 32035536.8 Sessions Blocked by ‘gc current request’ in RAC 是,从 19.11 起 应用 19.11 或以上
Document 28697526.8 Session Hangs On ‘gc cr request’ And Other Sessions Wait On ‘cr request retry’ 是,从 19.9 起 应用 19.9 或以上
Document 32227352.8 High CPU with set role command 是,从 19.11 起 应用 19.11 或以上
Document 31812824.8 Slow performance with set role command 是,从 19.12 起 应用 19.12 或以上
Document 28889730.8 “Insert As Select” or a “Create Table As Select” command consume huge space 是,从 19.4 起 应用 19.4 或以上
Document 32379140.8 LCK process in RAC crashes due to ORA-07445 [kjcvmsn()+128] [SIGSEGV] 是,从 19.12 起 应用 19.12 或以上
Document 30240930.8 Scheduler Jobs Time Classified as Background Instead of Foreground 是,从 19.9 起 应用 19.9 或以上
Document 29932310.8 AWR Report Generation Takes Long time in 19c 是,从 19.10 起 应用 19.10 或以上
Document 31489731.8 ORA-600/ORA-7445 While Shared Pool Memory is Being Freed 是,从 19.18 起 应用 19.18 或以上
Document 31892767.8 Improvement to Temp Space Shrink (Affects on Cloud 19c) 申请临时补丁 (仅影响 Cloud 19c)
Document 32465193.8 ORA-4031 Due to high SQL Monitoring allocations in Shared Pool 是,从 19.13 起 应用 19.13 或以上
Document 31820859.8 ORA-4025 Due To 65535 Active Locks Limit Reached On Select NLS_CHARSET_ID 是,从 19.9 起 应用 19.9 或以上
Document 30887989.8 ORA-00001 While Generating AWR Snapshot in non-CDB with Resource Manager enabled 是,从 19.13 起 应用 19.13 或以上
Document 29423227.8 Drop Partition with global indexes hangs on library cache lock 是,从 19.11 起 应用 19.11 或以上
Document 32234161.8 Performance Slow due to High CPU post July 2020 DBRU (19.8) caused by Space Management slave processes (Wnnn) 是,从 19.10 起 应用 19.10 或以上
Document 29454450.8 High waits on “latch: cache buffers chains” in RAC 是,从 19.9 起 应用 19.9 或以上
Document 31563138.8 Securefile mutex waits when many inserts occurring on Securefile compressed LOB 是,从 19.11 起 应用 19.11 或以上
Document 32103628.8 High Latch Free Waits While Flushing Top Segment Statistics in AWR 是,从 19.15 起 应用 19.15 或以上
Document 33123985.8 DBW0 Process Generate Huge Traces With Dumping DBWR Process State After DBRU 19.11 是,从 19.13 起 应用 19.13 或以上
Document 32936537.8 Significant Contention on “library cache: mutex X” While accessing Interval or Auto-List Partitioned table Concurrently 是,从 19.16 起 应用 19.16 或以上
Document 32148419.8 High Row Cache Lock Waits on Alter Table Exchange Partition in RAC Environment 是,从 19.12 起 应用 19.12 或以上
Document 30662963.8Document 34284147.8 High contention for “latch: MGA shared context root latch” When Many Sessions are Logging outHigh contention for “latch: MGA shared context root latch” When Many Sessions are Logging out 被替换是,从 19.17 起 N/A应用 19.17 或以上
Document 32550751.8 ONLY FOR AIX: Performance issues due to MGA related operations in AIX 是,从 19.12 起 应用 19.12 或以上
Document 33352794.8 ONLY FOR AIX: High waits on ‘latch: MGA shared context root latch’ and ‘latch: MGA shared context latch’ even with Fix 32550751 是,从 19.13 起 应用 19.13 或以上
Document 32117253.8 High “enq: RO – fast object reuse” waits & active checkpoint queue latch gets 是,从 19.12 起 应用 19.12 或以上
Document 30710917.8 Cursor: mutex S waits due to High Version Count For SQL Statements using Bind variables and DBLINK From 12.2 是,从 19.14 起 应用 19.14 或以上
Document 33025005.8 Waits on ‘latch: cache buffers chains’ after Database Upgrade from 12.1.0.2 to 19c 是,从 19.15 起 应用 19.15 或以上
Document 31387123.8 High Waits on ‘enq: IV – contention’ observed in RAC Standby environment 是,从 19.13 起 应用 19.13 或以上
Document 32225742.8 Gathering Statistics in Primary DB Results in ORA-4061/ORA-4065 in Secondary DB 是,从 19.13 起 应用 19.13 或以上
Document 32069508.8 High Latch: Cache Buffers Chains Contention in Standby Database 是,从 19.13 起 应用 19.13 或以上
Document 33803836.8 Regression in 19.14 Due to Bug Fix 32119144 是,从 19.18 起 应用 19.18 或以上
Document 33163187.8 High Version Count due To “Bind Mismatch” Even When Using Same Bind Values 是,从 19.14 起 应用 19.14 或以上
Document 32755517.8 High Version count with USER_BIND_PEEK_MISMATCH for SQLs with bind peeking disabled 是,从 19.13 起 应用 19.13 或以上
Document 33121934.8 Library cache lock / load lock / mutex x during connection storm due to update user$ 是,从 19.16 起 应用 19.16 或以上
Document 36587533.8 RESULT CACHE: GLOBAL FLUSH SHOULD ALWAYS CLEAR BYPASS FLAG EVEN IF THE RESULT CACHE IS UNINITIALIZED 是,从 19.24 起 应用 19.24 或以上

常见已知问题

问题 1: 在 19c AWR 报告中缺少表空间级别的 IO 统计数据

解决方案: 该 Bug Document 25416731.8 已在 19.8 版本及以上修复。请应用 19.8 或更高版本,并执行以下步骤。

要在 19.X 版本的 AWR 报告中恢复表空间 IO 统计数据,请以 SYS 身份运行以下命令:

$ sqlplus / as sysdba

exec dbms_workload_repository.modify_table_settings(table_name => ‘WRH$_FILESTATXS’, flush_level => ‘TYPICAL’);
exec dbms_workload_repository.modify_table_settings(table_name => ‘WRH$_DATAFILE’, flush_level => ‘TYPICAL’);
exec dbms_workload_repository.modify_table_settings(table_name => ‘Tempfile Group’, flush_level => ‘TYPICAL’);
exec dbms_workload_repository.modify_table_settings(table_name => ‘WRH$_TEMPSTATXS’, flush_level => ‘TYPICAL’);
exec dbms_workload_repository.modify_table_settings(table_name  => ‘WRH$_TEMPFILE’, flush_level => ‘TYPICAL’);

当新的 AWR 快照生成时,您将开始获得用于检查 IO 性能的表空间 IO 统计数据。

供您参考:如果您在 PDB 层级生成 AWR 快照并且在 AWR 报告中缺少表空间 IO 统计数据,您可能还需要在 PDB 中运行这些命令。

该 Bug Document 34733173.8 从 19.22RU 起被修复了. 如果您应用了补丁 34733173 这些命令就不需要了。

Document 25416731.8 - Bug 25416731 – Tablespace IO Statistics Missing From AWR Report
Document 34733173.8 - Bug 34733173 – Tablespace IO Stats and File IO Stats Data Must Be Included in AWR Reports From Oracle 19C, 21C and 23ai
Document 3008056.1 - AWR Report Under File IO Stats Shows No Data Exists For This Section Of The Report.

问题 2: SQL 子游标的高版本计数持续超过 1024

解决方案:请参考以下文档以控制 SQL 子游标的版本计数,该计数持续超过 1024。

Document 2431353.1 High Version Counts For SQL Statements (>1024) Post Upgrade To 12.2 and Above Causing Database Slow Performance

问题 3: SQL 语句中列数过多导致的高 CPU 使用率。short stack 显示在 qosdGetOptDir、qosdInitDirCtx 和 qosdUpdExprExecStatsRws 上存在函数旋转。可能会出现高锁等待,包括 RAC 中的 GES 锁。

解决方案:在系统级别设置 _column_tracking_level=1。该参数是动态的。这样做是为了避免大量列使用跟踪,这可能会导致更高的 CPU 使用率。默认值在 11.2 和 12.1 中为 1,从 12.2 开始更改为 21 及以上。

问题 4: 如果数据库从 11.2 升级到 19c,那么从 12c 开始,LGWR 的架构发生了变化,采用了并行自适应 LGWR。这可能会导致 LGWR 吞吐量变慢,有时会在 19c 中导致前台会话出现“log file sync”等待。这是由于并行 LGWR 的自适应行为与串行 LGWR 之间存在一些缺陷所导致的。

解决方案:在生产环境之前,在用户验收测试(UAT)中评估新的 LGWR 架构(默认启用)。要使用旧的 LGWR 架构,请设置以下参数:

_use_single_log_writer=TRUE /* default value: ADAPTIVE */

注意:这不是一个动态参数。它需要重启数据库。

问题 5: 在19c数据库中进行分区维护时,高库缓存锁等待。

解决方案: Document 2619066.1 High Library Cache Lock Waits After Upgrading To 19C During Partition Index Maintenance

请参阅以下文档以排查与库缓存相关的等待问题:-

Document 444560.1 Troubleshooting Library Cache: Lock, Pin and Load Lock
Document 1353015.1 How to Identify Hard Parse Failures Causing Library Cache contention
Document 2746493.1 How To Trace Overall Library Cache Objects Invalidation Happening At Particular Period

19c 最佳实践

1. Database Testing

Oracle Real Application Testing 选项使您能够在进行生产环境升级之前,在开发或用户验收测试(UAT)中对 Oracle 数据库进行真实世界的测试。通过捕获生产工作负载并在生产部署之前评估系统更改对这些工作负载的影响,Oracle 实际应用测试最大限度地降低了与升级后系统更改相关的不稳定性风险。SQL 性能分析器和数据库重放是 Oracle 实际应用测试的关键组件。.

Documentation: https://docs.oracle.com/en/database/oracle/oracle-database/19/ratug/introduction-to-oracle-database-testing.html
Document 1464274.1 Primary Note for Real Application Testing Option

2. 使用 Performance Hub

EM Express 的性能中心(Performance Hub)功能提供了一个活跃报告,汇总了指定时间段内的所有性能数据。该报告是完全交互式的,其内容保存在 HTML 文件中,您可以通过网页浏览器离线访问。有关性能中心的更多信息,请参考以下教程。

使用 EM Express 的性能中心(Perf Hub): https://docs.oracle.com/en/database/oracle/oracle-database/tutorial-monitor-perf/index.html?opt-release-19c#UsePerformanceHub
不使用 EM Express 的性能中心(Perf Hub): Document 2436566.1 Monitoring Database Performance Using Performance Hub Report

3. 实时监控 ADDM

一种预测工具,用于主动预见 19c 生产环境中的性能问题,并采取纠正措施以避免任何系统停机情况。

Document 2763576.1 Proactively Detecting Database Performance Problem Using Real-Time ADDM

4. 下载最新的 ORAchk / EXAchk

建议下载并安装最新的 AHF 或 ORAchk/EXAchk,这可以用于检查任何异常并采取纠正措施。

Document 2550798.1 Autonomous Health Framework (AHF) – Including TFA and ORAchk/EXAChk

5. 安装 OS Watcher

建议在升级后安装最新版本的 OS Watcher,以便在需要时收集与操作系统相关的信息。

Document 301137.1 OS Watcher User Guide
Document 461053.1 OS Watcher Analyzer User Guide

有关数据库性能的更多信息:-

Document 402983.1 Primary Note: Database Performance Overview
Document 1306791.2 Information Center: Oracle Exadata Database Machine

免责声明: 为了避免在打补丁或打完补丁后出现任何问题,建议在 UAT 环境中应用上述补丁并进行验证,然后再应用到生产环境。本文件将在每个季度(在发布季度 RU/RUR 期间)与新的候选补丁进行修订。

转载:避免 19c 数据库性能问题需要考虑的事项 (Doc ID 3050476.1)

Bug 21915719 Database hang or may fail to OPEN in 12c IBM AIX or HPUX Itanium – ORA-742, DEADLOCK or ORA-600 [kcrfrgv_nextlwn_scn] ORA-600 [krr_process_read_error_2]

Applies to:
Oracle Database – Enterprise Edition – Version 12.1.0.1 to 12.1.0.2 [Release 12.1]
Oracle Database Cloud Schema Service – Version N/A and later
Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) – Version N/A and later
Oracle Cloud Infrastructure – Database Service – Version N/A and later
Oracle Database Cloud Exadata Service – Version N/A and later
IBM AIX on POWER Systems (64-bit)

Description
Oracle 12c introduces a new default feature of using multiple LGWRs which may lead to DEADLOCK / Database Hang or ORA-742 “Log read detects lost write” or ORA-600 [kcrfrgv_nextlwn_scn] during instance OPEN or ORA-600 [krr_process_read_error_2] during Recovery on IBM AIX and potentially on HPUX Itanium 64bit.
The database may become unusable and fail to be OPEN.

Occurrence
This issue is specific to RDBMS version 12c (12.1.0.1 or 12.1.0.2) where the new default feature of using multiple LGWRs is introduced.
It affects databases on IBM AIX and potentially on HPUX Itanium 64bit

Symptoms
ORACLE on IBM AIX or HPUX Itanium 64bit with RDBMS Version 12c.

DEADLOCK or ORA-742 “Log read detects lost write” or ORA-600 [kcrfrgv_nextlwn_scn] during instance OPEN or ORA-600 [krr_process_read_error_2] during Recovery caused by bug 21915719.

PMON may terminate the instance while extensive block recovery is being performed.

A DEADLOCK example is with LG0[n] waiting on ‘LGWR worker group ordering’. Example from a System State Dump trace file:

PROCESS 18: LG01
SO: 0x7000101f95ad720, type: 4, owner: 0x7000101f84195f8, flag: INIT/-/-/0x00
if: 0x3 c: 0x3
   proc=0x7000101f84195f8, name=session, file=ksu.h LINE:13590 ID:, pg=0
conuid=0
  (session) sid: 865 ser: 1 trans: 0x0, creator: 0x7000101f84195f
 Current Wait Stack:
   0: waiting for 'LGWR worker group ordering'
      lwn_id=0x58, phase=0x1, =0x0
      wait_id=4947 seq_num=4948 snap_id=1
      wait times: snap=13 min 21 sec, exc=13 min 21 sec, total=13 min 21 sec
      wait times: max=infinite, heur=13 min 21 sec
      wait counts: calls=1 os=267
      in_wait=1 iflags=0x5a0
  There is at least one session blocking this session.
    Dumping 1 direct blocker(s):
      inst: 1, sid: 817, ser: 1
    Dumping final blocker:
      inst: 1, sid: 817, ser: 1
  There are 730 sessions blocked by this session.
.
.
PROCESS 17: LG00
SO: 0x7000101f85bcc60, type: 4, owner: 0x7000101f93eeb20, flag: INIT/-/-/0x00
if: 0x3 c: 0x3
   proc=0x7000101f93eeb20, name=session, file=ksu.h LINE:13590 ID:, pg=0
conuid=0
  (session) sid: 817 ser: 1 trans: 0x0, creator: 0x7000101f93eeb20
  ksuxds FALSE at location: 0
  service name: SYS$BACKGROUND
  Current Wait Stack:
   0: waiting for 'LGWR worker group ordering'
      lwn_id=0x56, phase=0x1, =0x0
      wait_id=1630680 seq_num=57841 snap_id=1
      wait times: snap=13 min 21 sec, exc=13 min 21 sec, total=13 min 21 sec
      wait times: max=infinite, heur=13 min 21 sec
      wait counts: calls=2 os=268
      in_wait=1 iflags=0x15a0
  There is at least one session blocking this session.
    Dumping 1 direct blocker(s):
      inst: 1, sid: 865, ser: 1
    Dumping final blocker:
      inst: 1, sid: 865, ser: 1
 

The instance may fail to OPEN with errors ORA-600 [kcrfrgv_nextlwn_scn] and/or ORA-600 [krr_process_read_error_2]:

Recovery Session Failed with:

ORA-00283: recovery session canceled due to errors
ORA-00600: internal error code, arguments: [krr_process_read_error_2],


Alter database open fails with:

ORA-00600: internal error code, arguments: [kcrfrgv_nextlwn_scn] .....
ORA-600 signalled during: ALTER DATABASE OPEN...

Workaround
Disable the new feature of multiple LGWR worker processes by proactively setting _use_single_log_writer=true.

Setting _use_single_log_writer = true is a safe workaround; it is the behavior before 12c where multiple LGWR worker groups were not available.

ALTER SYSTEM SET "_use_single_log_writer"=TRUE SID='*' SCOPE=SPFILE;
-- Restart the database or all instances of the RAC database

Note that while _use_single_log_writer=true is not set, then error ORA-600 [kcrfrgv_nextlwn_scn] might be produced avoiding the database to OPEN. Once the problem is introduced, _use_single_log_writer=true may not fix it. _use_single_log_writer = true prevents inconsistencies in the redo log to be introduced which causes that error.
If the parameter does not help, because the problem was already introduced when _use_single_log_writer=true had not been proactively set, then Point in Time Recovery (PITR) or Flashback Database are the options to recover from this situation.
参考:ALERT: Bug 21915719 Database hang or may fail to OPEN in 12c IBM AIX or HPUX Itanium – ORA-742, DEADLOCK or ORA-600 [kcrfrgv_nextlwn_scn] ORA-600 [krr_process_read_error_2] (Doc ID 1957710.1)

Oracle 暂定和恢复功能

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle 暂定和恢复功能

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

以前一直没有注意到oracle有暂定和恢复功能(SUSPEND/RESUME)[从oracle 8i开始有的特性],一下是官方描述:

The Database Suspend/Resume feature provides a mechanism by which all disk I/O 
(datafile, controlfile and file header I/Os) in a database (in all instances) 
can be suspended making it easier to make a copy of the database.  When an 
ALTER SYSTEM SUSPEND command is issued, it will wait for any ongoing instance 
recovery to complete and then set a flag in all running instances to stop all 
new lock and I/O activity.  The command may return before the last I/O is 
issued because the check for the flag might have been before the suspend and 
the I/O might have been issued after the suspend.  So, reads, typically are not
allowed when the database is suspended but may still be active for a period of 
time.  However, this command does ensure that no new I/Os will be issued.  

Once all instances of a database are suspended, a copy of the database can be 
made by making a copy of all the files (i.e. the control file, online logs and 
all data files).  The copy can have uncommitted updates and therefore the only 
way a copy of the database can be used in this scenerio is to do an instance 
recovery and then open it.

The database can be suspended or resumed through an ALTER SYSTEM call.  You can
issue this statement as the user SYSTEM or SYS (the user must have DBA 
privileges).   

The syntax for these two commands is as follows:

    ALTER SYSTEM <options>;

    <options> = SUSPEND | RESUME | <existing options>

The database will remain in the suspended state until the ALTER SYSTEM RESUME 
command is issued.  The database will remain suspended even if the process 
issuing the ALTER SYSTEM SUSPEND command dies or exists.  However, if all 
instances are shutdown and started again, the database is no longer in a 
suspended state.  

The ALTER SYSTEM RESUME command has the effect of blocking the I/O since the 
SUSPEND command.  When the RESUME command is issued, it might cause a burst in 
the I/O, which may take a while to even out.  A message is written to the alert
log everytime the database is suspended or resumed, as shown by the example 
below:

    Mon Nov 29 11:32:22 1999
    Completed: alter database open
    Wed Dec  1 12:56:53 1999
    Starting ORACLE instance (normal)
    Wed Dec  1 22:03:50 1999
    Suspending database following alter system suspend command.
    Wed Dec  1 22:06:14 1999
    Resuming database following alter system resume command.
    Wed Dec  1 22:07:08 1999


The following is an example of using the SUSPEND and RESUME feature:

    SVRMGR> connect system/manager
    Connected.
    SVRMGR> alter system suspend;
    Statement processed.
    SVRMGR> select * from user_source;
    ^X^Cselect * from user_source   -----  (at this stage the statement will 
                                            just hang.  A Ctrl-X Ctrl-C was 
                                            issued to kill the statement)
                  *
    ORA-00604: error occurred at recursive SQL level 1
    ORA-01013: user requested cancel of current operation
    SVRMGR>
    SVRMGR> alter system resume;
    Statement processed.


Considerations and Restrictions:
--------------------------------
- The files in the copy database can not be used as backups of the original 
  database for media recovery.  (If the direct path option is in use at the 
  time, there may be corrupted blocks).

- A new instance cannot be started during the SUSPEND state of the database.  
  If one is started, it will not be included in the SUSPEND process and thus no 
  I/O suspension guarantees are provided in this case.

- Creation of backups or archived logs will not be affected by the 
  ALTER SYSTEM SUSPEND command.

- The two different commands can  be issued from two different instances or 
  processes.

- If the SUSPEND command during execution may fail for some reason yet 
  result in some of the instances being suspended, the command can be issued 
  again since the instances in suspend status will ignore the command.

- Also database queries will hang when the database is in suspend mode

按照描述SUSPEND 操作会挂起所有io,只要涉及到io操作就会挂起,如果操作的所有请求都可以在内存中完成(buffer cache/shared pool等),那这样的操作是可以直接完成的.

C:\Users\XFF>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Tue Jan 14 21:51:53 2025

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> alter system suspend;

System altered.

SQL> select database_status from v$instance;

DATABASE_STATUS
-----------------
SUSPENDED

SQL> create table t1 as select * from dba_users;
create table t1 as select * from dba_users
             *
ERROR at line 1:
ORA-00955: name is already used by an existing object


SQL> create table t_xff as select * from dba_users;
^C
C:\Users\XFF>

C:\Users\XFF>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Tue Jan 14 21:53:19 2025

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> alter system resume;

System altered.

SQL> select database_status from v$instance;

DATABASE_STATUS
-----------------
ACTIVE

SQL> create table t_xff as select * from dba_users;

Table created.


SQL>  alter system suspend;

System altered.

SQL> select count(1) from user$;

  COUNT(1)
----------
        94

SQL> select count(1) from t_xff;
^C
C:\Users\XFF>

在某些情况下,可以通过这类操作来挂起数据库,做一些特殊的操作.

迁移awr快照数据到自定义表空间

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:迁移awr快照数据到自定义表空间

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在19c中有些情况,考虑把awr的快照数据存储在非sysaux表空间,可以通过DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS来进行设置

sys@ORA19C 21:57:02> select BANNER_FULL from v$version;

BANNER_FULL
----------------------------------------------------------------------------------------------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.24.0.0.0


Elapsed: 00:00:00.01

PROCEDURE MODIFY_SNAPSHOT_SETTINGS
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 RETENTION                      NUMBER                  IN     DEFAULT
 INTERVAL                       NUMBER                  IN     DEFAULT
 TOPNSQL                        NUMBER                  IN     DEFAULT
 DBID                           NUMBER                  IN     DEFAULT
 TABLESPACE_NAME                VARCHAR2                IN     DEFAULT
PROCEDURE MODIFY_SNAPSHOT_SETTINGS
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 RETENTION                      NUMBER                  IN     DEFAULT
 INTERVAL                       NUMBER                  IN     DEFAULT
 TOPNSQL                        VARCHAR2                IN
 DBID                           NUMBER                  IN     DEFAULT
 TABLESPACE_NAME                VARCHAR2                IN     DEFAULT

这两个proc,主要是TOPNSQL一个是number类型,一个是varchar2类型
If NUMBER: Top N SQL size. The number of Top SQL to flush for each SQL criteria (Elapsed Time, CPU Time, Parse Calls, Shareable Memory, Version Count). The value for this setting will not be affected by the statistics/flush level and will override the system default behavior for the AWR SQL collection. The setting will have a minimum value of 30 and a maximum value of 50,000. Specifying NULL will keep the current setting.
If VARCHAR2: Users are allowed to specify the following values: (DEFAULT, MAXIMUM, N), where N is the number of Top SQL to flush for each SQL criteria. Specifying DEFAULT will revert the system back to the default behavior of Top 30 for statistics level TYPICAL and Top 100 for statistics level ALL. Specifying MAXIMUM will cause the system to capture the complete set of SQL in the cursor cache. Specifying the number N is equivalent to setting the Top N SQL with the NUMBER type. Specifying NULL for this argument will keep the current setting.
进行了简单的测试,确认是部分awr的分区表设置到新表空间中

sys@ORA19C 21:41:51> CREATE TABLESPACE AWRTBS DATAFILE '/data/oradata/ORA19C/awrtbs01.dbf' size 128M autoextend on;

Tablespace created.

Elapsed: 00:00:00.53
sys@ORA19C 21:42:21> exec dbms_workload_repository.modify_snapshot_settings(tablespace_name=> 'AWRTBS');

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.53

sys@ORA19C 21:53:56> execute dbms_workload_repository.create_snapshot();

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.44
sys@ORA19C 21:53:58> select segment_name,PARTITION_NAME,segment_type from dba_segments where tablespace_name='AWRTBS';

SEGMENT_NAME                   PARTITION_NAME                                               SEGMENT_TYPE
------------------------------ ------------------------------------------------------------ ---------------
WRH$_FILESTATXS                WRH$_FILESTATXS_1232450071_2690                              TABLE PARTITION
WRH$_SQLSTAT                   WRH$_SQLSTAT_1232450071_2690                                 TABLE PARTITION
WRH$_SYSTEM_EVENT              WRH$_SYSTEM_EVENT_1232450071_2690                            TABLE PARTITION
WRH$_WAITSTAT                  WRH$_WAITSTAT_1232450071_2690                                TABLE PARTITION
WRH$_LATCH                     WRH$_LATCH_1232450071_2690                                   TABLE PARTITION
WRH$_LATCH_MISSES_SUMMARY      WRH$_LATCH_MISSES_SUMMARY_1232450071_2690                    TABLE PARTITION
WRH$_DB_CACHE_ADVICE           WRH$_DB_CACHE_ADVICE_1232450071_2690                         TABLE PARTITION
WRH$_ROWCACHE_SUMMARY          WRH$_ROWCACHE_SUMMARY_1232450071_2690                        TABLE PARTITION
WRH$_SGASTAT                   WRH$_SGASTAT_1232450071_2690                                 TABLE PARTITION
WRH$_SYSSTAT                   WRH$_SYSSTAT_1232450071_2690                                 TABLE PARTITION
WRH$_PARAMETER                 WRH$_PARAMETER_1232450071_2690                               TABLE PARTITION
WRH$_SEG_STAT                  WRH$_SEG_STAT_1232450071_2690                                TABLE PARTITION
WRH$_SERVICE_STAT              WRH$_SERVICE_STAT_1232450071_2690                            TABLE PARTITION
WRH$_ACTIVE_SESSION_HISTORY    WRH$_ACTIVE_SESSION_HISTORY_1232450071_2690                  TABLE PARTITION
WRH$_SYSMETRIC_HISTORY         WRH$_SYSMETRIC_HISTORY_1232450071_2690                       TABLE PARTITION
WRH$_LATCH_CHILDREN            WRH$_LATCH_CHILDREN_1232450071_0                             TABLE PARTITION
WRH$_LATCH_PARENT              WRH$_LATCH_PARENT_1232450071_0                               TABLE PARTITION
WRH$_DLM_MISC                  WRH$_DLM_MISC_1232450071_0                                   TABLE PARTITION
WRH$_INST_CACHE_TRANSFER       WRH$_INST_CACHE_TRANSFER_1232450071_0                        TABLE PARTITION
WRH$_INTERCONNECT_PINGS        WRH$_INTERCONNECT_PINGS_1232450071_0                         TABLE PARTITION
WRH$_TABLESPACE_STAT           WRH$_TABLESPACE_STAT_1232450071_2690                         TABLE PARTITION
WRH$_OSSTAT                    WRH$_OSSTAT_1232450071_2690                                  TABLE PARTITION
WRH$_SYS_TIME_MODEL            WRH$_SYS_TIME_MODEL_1232450071_2690                          TABLE PARTITION
WRH$_SERVICE_WAIT_CLASS        WRH$_SERVICE_WAIT_CLASS_1232450071_2690                      TABLE PARTITION
WRH$_EVENT_HISTOGRAM           WRH$_EVENT_HISTOGRAM_1232450071_2690                         TABLE PARTITION
WRH$_MVPARAMETER               WRH$_MVPARAMETER_1232450071_2690                             TABLE PARTITION
WRH$_CELL_GLOBAL_SUMMARY       WRH$_CELL_GLOBAL_SUMMARY_1232450071_2690                     TABLE PARTITION
WRH$_CELL_DISK_SUMMARY         WRH$_CELL_DISK_SUMMARY_1232450071_2690                       TABLE PARTITION
WRH$_CELL_GLOBAL               WRH$_CELL_GLOBAL_1232450071_2690                             TABLE PARTITION
WRH$_CELL_IOREASON             WRH$_CELL_IOREASON_1232450071_2690                           TABLE PARTITION
WRH$_CELL_DB                   WRH$_CELL_DB_1232450071_2690                                 TABLE PARTITION
WRH$_CELL_OPEN_ALERTS          WRH$_CELL_OPEN_ALERTS_1232450071_2690                        TABLE PARTITION
WRH$_IM_SEG_STAT               WRH$_IM_SEG_STAT_1232450071_2690                             TABLE PARTITION
WRM$_PDB_IN_SNAP               WRM$_PDB_IN_SNAP_1232450071_2690                             TABLE PARTITION
WRH$_CON_SYSMETRIC_HISTORY     WRH$_CON_SYSMETRIC_HISTORY_1232450071_2690                   TABLE PARTITION
WRM$_ACTIVE_PDBS               WRM$_ACTIVE_PDBS_1232450071_2690                             TABLE PARTITION
WRH$_CON_SYSSTAT               WRH$_CON_SYSSTAT_1232450071_2690                             TABLE PARTITION
WRH$_CON_SYSTEM_EVENT          WRH$_CON_SYSTEM_EVENT_1232450071_2690                        TABLE PARTITION
WRH$_PROCESS_WAITTIME          WRH$_PROCESS_WAITTIME_1232450071_2690                        TABLE PARTITION
WRH$_ASM_DISK_STAT_SUMMARY     WRH$_ASM_DISK_STAT_SUMMARY_1232450071_2690                   TABLE PARTITION
WRH$_AWR_TEST_1                WRH$_AWR_TEST_1_1232450071_2690                              TABLE PARTITION
WRH$_SESS_NETWORK              WRH$_SESS_NETWORK_1232450071_2690                            TABLE PARTITION
WRH$_CON_SYS_TIME_MODEL        WRH$_CON_SYS_TIME_MODEL_1232450071_2690                      TABLE PARTITION

43 rows selected.

Elapsed: 00:00:00.01
sys@ORA19C 21:54:08> 

第一例Oracle 21c恢复咨询

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:第一例Oracle 21c恢复咨询

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

记录一个Oracle 21c故障的恢复请求(这个是第一个21c的恢复咨询),这个表明21C确实有客户在生产上使用了(不过这个是国外客户,国内的目前还没有遇到)
21c


故障原因是最初的数据文件不一致,数据库无法open,最终经过一系列折腾之后,有数据文件offline的情况下执行了resetlogs,导致部分文件resetlogs scn不一致
wrong-resetlogs

解决oracle数据文件路径有回车故障

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:解决oracle数据文件路径有回车故障

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最近遇到一个硬件恢复朋友的请求,oracle数据库文件恢复出来了,但是在linux上面启动的时候,有两个文件无法检测到,dbv检测正常.
checkpiont_err
dbv


通过分析是由于文件无法找到原因导致
file-not-found

进一步检查发现原库这两个文件结尾带有回车,但是恢复出来的文件不带回车
huiche

对于这个故障,我在测试环境进行了重现并且给予解决
1. 创建带回车键数据文件

SQL> create tablespace xifenfei datafile '/u01/app/oracle/oradata/xifenfei/xff01.dbf
  2  ' size 128m;

Tablespace created.

SQL> alter tablespace xifenfei add datafile '/u01/app/oracle/oradata/xifenfei/xff02.dbf' size 128M;

Tablespace altered.

SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------
/u01/app/oracle/oradata/xifenfei/system01.dbf
/u01/app/oracle/oradata/xifenfei/sysaux01.dbf
/u01/app/oracle/oradata/xifenfei/undotbs01.dbf
/u01/app/oracle/oradata/xifenfei/users01.dbf
/u01/app/oracle/oradata/xifenfei/xff01.dbf
/u01/app/oracle/oradata/xifenfei/xff02.dbf

6 rows selected.

2.操作系统层面查看文件(在我的ssh工具中,可以看到带回车键文件和不带回车文件不一样,使用的是crt工具,其他工具是否显示不确定)

[oracle@xifenfei ~]$ cd /u01/app/oracle/oradata/xifenfei/
[oracle@xifenfei xifenfei]$ ls -l xff*
-rw-r----- 1 oracle oinstall 134225920 Dec 14 08:05 xff01.dbf?
-rw-r----- 1 oracle oinstall 134225920 Dec 14 08:05 xff02.dbf

3. 操作系统层面重命名数据文件

[oracle@xifenfei xifenfei]$ mv xff01.dbf* xff01.dbf
[oracle@xifenfei xifenfei]$ ls -l xff*
-rw-r----- 1 oracle oinstall 134225920 Dec 14 08:05 xff01.dbf
-rw-r----- 1 oracle oinstall 134225920 Dec 14 08:05 xff02.dbf

3. 数据库层面重启看文件情况,发现文件不能被正常发现(当然不能,文件被os层面mv了)

SQL> shutdown abort;
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area  551165952 bytes
Fixed Size                  2255112 bytes
Variable Size             369100536 bytes
Database Buffers          171966464 bytes
Redo Buffers                7843840 bytes
Database mounted.
SQL> select file#, CHECKPOINT_CHANGE# from v$datafile_header;

     FILE# CHECKPOINT_CHANGE#
---------- ------------------
         1          306775013
         2          306775013
         3          306775013
         4          306775013
         5                  0
         6          306779423

6 rows selected.

RMAN> report schema;

Report of database schema for database with db_unique_name XIFENFEI

List of Permanent Datafiles
===========================
File Size(MB) Tablespace           RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1    770      SYSTEM               ***     /u01/app/oracle/oradata/xifenfei/system01.dbf
2    1950     SYSAUX               ***     /u01/app/oracle/oradata/xifenfei/sysaux01.dbf
3    70       UNDOTBS1             ***     /u01/app/oracle/oradata/xifenfei/undotbs01.dbf
4    12       USERS                ***     /u01/app/oracle/oradata/xifenfei/users01.dbf
5    0        XIFENFEI             ***     /u01/app/oracle/oradata/xifenfei/xff01.dbf

6    128      XIFENFEI             ***     /u01/app/oracle/oradata/xifenfei/xff02.dbf

4. 解决控制文件和数据文件实际名称不一致问题

RMAN> catalog datafilecopy '/u01/app/oracle/oradata/xifenfei/xff01.dbf';

using target database control file instead of recovery catalog
cataloged datafile copy
datafile copy file name=/u01/app/oracle/oradata/xifenfei/xff01.dbf RECID=1 STAMP=1187684217

RMAN> switch datafile 5 to copy;

datafile 5 switched to datafile copy "/u01/app/oracle/oradata/xifenfei/xff01.dbf"

RMAN> report schema;

Report of database schema for database with db_unique_name XIFENFEI

List of Permanent Datafiles
===========================
File Size(MB) Tablespace           RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1    770      SYSTEM               ***     /u01/app/oracle/oradata/xifenfei/system01.dbf
2    1950     SYSAUX               ***     /u01/app/oracle/oradata/xifenfei/sysaux01.dbf
3    70       UNDOTBS1             ***     /u01/app/oracle/oradata/xifenfei/undotbs01.dbf
4    12       USERS                ***     /u01/app/oracle/oradata/xifenfei/users01.dbf
5    128      XIFENFEI             ***     /u01/app/oracle/oradata/xifenfei/xff01.dbf
6    128      XIFENFEI             ***     /u01/app/oracle/oradata/xifenfei/xff02.dbf

List of Temporary Files
=======================
File Size(MB) Tablespace           Maxsize(MB) Tempfile Name
---- -------- -------------------- ----------- --------------------
1    123      TEMP                 32767       /u01/app/oracle/oradata/xifenfei/temp01.dbf


RMAN> alter database open;

database opened

ORA-12514: TNS: 监听进程不能解析在连接描述符中给出的SERVICE_NAME

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-12514: TNS: 监听进程不能解析在连接描述符中给出的SERVICE_NAME

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一个10g的库应用访问报ORA-12514: TNS: 监听进程不能解析在连接描述符中给出的SERVICE_NAME
ORA-12514


通过分析alert日志,确认是数据库启动报ORA-600 4194错误

Mon Sep 23 16:12:42 2024
SMON: enabling cache recovery
Mon Sep 23 16:12:43 2024
Successfully onlined Undo Tablespace 1.
Mon Sep 23 16:12:43 2024
SMON: enabling tx recovery
Mon Sep 23 16:12:43 2024
Database Characterset is ZHS16GBK
Mon Sep 23 16:12:43 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\udump\xifenfei_ora_7832.trc:
ORA-00600: 内部错误代码, 参数: [4194], [66], [50], [], [], [], [], []

DEBUG: Replaying xcb 0xae312888, pmd 0x9058f4d4 for failed op 8
Doing block recovery for file 2 block 5547
No block recovery was needed
Mon Sep 23 16:13:31 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\udump\xifenfei_ora_7832.trc:
ORA-00600: 内部错误代码, 参数: [4194], [66], [50], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [4194], [66], [50], [], [], [], [], []

Mon Sep 23 16:13:32 2024
DEBUG: Replaying xcb 0xae312888, pmd 0x9058f4d4 for failed op 8
Mon Sep 23 16:13:32 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\udump\xifenfei_ora_7832.trc:
ORA-00600: 内部错误代码, 参数: [4194], [66], [50], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [4194], [66], [50], [], [], [], [], []

Doing block recovery for file 2 block 5547
No block recovery was needed
Mon Sep 23 16:13:33 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\udump\xifenfei_ora_7832.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [4194], [66], [50], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [66], [50], [], [], [], [], []

Mon Sep 23 16:14:18 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_smon_5880.trc:
ORA-00600: internal error code, arguments: [4194], [66], [50], [], [], [], [], []

Mon Sep 23 16:14:19 2024
DEBUG: Replaying xcb 0xae312888, pmd 0x9058f4d4 for failed op 8
Mon Sep 23 16:14:19 2024
Non-fatal internal error happenned while SMON was doing shrinking of rollback segments.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Mon Sep 23 16:14:19 2024
Doing block recovery for file 2 block 5547
No block recovery was needed
Mon Sep 23 16:15:06 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_pmon_6952.trc:
ORA-00600: internal error code, arguments: [4194], [66], [50], [], [], [], [], []

Mon Sep 23 16:15:06 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_pmon_6952.trc:
ORA-00600: internal error code, arguments: [4194], [66], [50], [], [], [], [], []

Mon Sep 23 16:15:06 2024
PMON: terminating instance due to error 472
Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_psp0_2104.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_lgwr_3200.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_dbw1_448.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_dbw0_7436.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_mman_1704.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_dbw2_5072.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_ckpt_6628.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_reco_7924.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_smon_5880.trc:
ORA-00472: PMON  process terminated with error

Instance terminated by PMON, pid = 6952

这个比较简单一般就是undo异常,对undo设置为人工管理,然后重建undo完成本次恢复任务

数据库启动报ORA-27102 OSD-00026 O/S-Error: (OS 1455)

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:数据库启动报ORA-27102 OSD-00026 O/S-Error: (OS 1455)

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一个win上运行的11.2.0.4库启动的时候报ORA-27102 OSD-00026 O/S-Error: (OS 1455)错误

SQL> STARTUP
ORA-27102: out of memory
OSD-00026: 附加错误信息
O/S-Error: (OS 1455) 页面文件太小,无法完成操作。

数据库alert日志信息

Fri Sep 13 17:17:39 2024
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Initial number of CPU is 24
Number of processor cores in the system is 12
Number of processor sockets in the system is 1
Picked latch-free SCN scheme 3
Error: Failed to allocate SGA granule addr 0000000760000000 size 268435456 
 mode 131073 locality 0
Errors in file D:\app\Administrator\diag\rdbms\xff\xff\trace\xff_ora_77728.trc:
ORA-27102: out of memory
OSD-00026: 附加错误信息
O/S-Error: (OS 1455) 页面文件太小,无法完成操作。
Error: Failed to allocate SGA granule addr 0000000750000000 size 268435456 
 mode 131073 locality 0
Errors in file D:\app\Administrator\diag\rdbms\xff\xff\trace\xff_ora_77728.trc:

看报错信息,第一感觉和内存有关系,可能内存不足无法满足sga分配需求,查看系统空闲内存情况
22


系统明显有足够内存,出现该问题的原因可能和win操作系统本身有关系,由于要快速恢复业务,直接重启系统数据库启动成功

硬件故障导致ORA-01242 ORA-01122等错误

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:硬件故障导致ORA-01242 ORA-01122等错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户多个节点rac,早上反馈说有两个节点实例异常,需要分析原因,查看其中一个节点的数据库alert日志,发现是由于访问1399号文件异常报ORA-01242 ORA-01122等错误,导致实例crash

Mon Aug 19 20:48:02 2024
Read of datafile '+DATA/xifenfei_01-157.dbf' (fno 1399) header failed with ORA-01207
Rereading datafile 1399 header failed with ORA-01207
Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_ckpt_75582.trc:
ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf'
ORA-01207: file is more recent than control file - old control file
Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_ckpt_75582.trc:
ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf'
ORA-01207: file is more recent than control file - old control file
CKPT (ospid: 75582): terminating the instance due to error 1242
Mon Aug 19 20:48:02 2024
System state dump requested by (instance=6, osid=75582 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_diag_75520.trc
Termination issued to instance processes. Waiting for the processes to exit
Mon Aug 19 20:48:13 2024
ORA-1092 : opitsk aborting process

继续分析日志发现集群尝试拉起该实例,遭遇ORA-01186,ORA-01122无法启动成功

ALTER DATABASE OPEN /* db agent *//* {0:6:39} */
Mon Aug 19 20:49:34 2024
SUCCESS: diskgroup DATA was mounted
Mon Aug 19 20:49:34 2024
NOTE: dependency between database xff and diskgroup resource ora.DATA.dg is established
Mon Aug 19 20:50:41 2024
Picked broadcast on commit scheme to generate SCNs
Mon Aug 19 20:50:42 2024
Read of datafile '+DATA/xifenfei_01-157.dbf' (fno 1399) header failed with ORA-01207
Rereading datafile 1399 header failed with ORA-01207
Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_dbw0_29208.trc:
ORA-01186: file 1399 failed verification tests
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf'
ORA-01207: file is more recent than control file - old control file
File 1399 not verified due to error ORA-01122

这个错误是数据库文件访问异常导致,根据经验,出现这种问题一般是由于底层异常导致,查看系统messages日志,发现有硬件磁盘报错

Aug 19 20:41:58 xff6 fcoemon: FC_HOST_EVENT 6894 at 1724071318 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:41:58 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:41:58 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:41:58 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:41:58 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:03 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:42:03 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:03 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:42:03 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:03 xff6 fcoemon: FC_HOST_EVENT 6895 at 1724071323 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:07 xff6 fcoemon: FC_HOST_EVENT 6896 at 1724071327 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:07 xff6 kernel: sd 1:0:0:44: [sdat]  
Aug 19 20:42:07 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:07 xff6 kernel: sd 1:0:0:44: [sdat]  
Aug 19 20:42:07 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:12 xff6 fcoemon: FC_HOST_EVENT 6897 at 1724071332 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:12 xff6 kernel: sd 1:0:0:44: [sdat]  
Aug 19 20:42:12 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:12 xff6 kernel: sd 1:0:0:44: [sdat]  
Aug 19 20:42:12 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:25 xff6 fcoemon: FC_HOST_EVENT 6898 at 1724071345 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:25 xff6 kernel: sd 1:0:0:42: [sdar]  
Aug 19 20:42:25 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:25 xff6 kernel: sd 1:0:0:42: [sdar]  
Aug 19 20:42:25 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6899 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:42: [sdar]  
Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:42: [sdar]  
Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0xd0 ASCQ=0x6ASC=0xd0 ASCQ=0x6
Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6900 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0xd0 ASCQ=0x6ASC=0xd0 ASCQ=0x6
Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6901 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:53 xff6 fcoemon: FC_HOST_EVENT 6902 at 1724071373 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:53 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:53 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:53 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:53 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
Aug 19 20:43:03 xff6 kernel: sd 1:0:0:40: [sdap]  
Aug 19 20:43:03 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:43:03 xff6 kernel: sd 1:0:0:40: [sdap]  
Aug 19 20:43:03 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6903 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6904 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6905 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:43:03 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:43:03 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:43:03 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:43:03 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
Aug 19 20:49:26 xff6 kernel: scsi_verify_blk_ioctl: 683 callbacks suppressed

客户进一步分析是由于昨天存储坏了一块盘,然后热备盘顶上了,但是不知道什么原因出现了文件访问异常,可能和当时的rebuild过程有关系.由于客户是rac环境,还有部分剩余节点运行正常,对于异常节点直接启动库成功
20240820-182825


节点写入数据报ORA-01187: cannot read from file because it failed verification tests错误
ora-01187

在所有节点通过执行ALTER SYSTEM CHECK DATAFILES,然后所有节点操作正常
check_datafile