Patch_SCN快速解决ORA-600 2663故障

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Patch_SCN快速解决ORA-600 2663故障

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一个维保客户和我说他们测试库删除了日志文件导致库无法启动,让我帮忙看看
客户现场现况
1. 磁盘空间使用100%

[oracle@We1-db_Test ~]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        3.9G     0  3.9G   0% /dev
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           3.9G  880K  3.9G   1% /run
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/vda1        40G   38G   92M 100% /
tmpfs           783M     0  783M   0% /run/user/0

2. 数据库redo被删除了部分,而且是active状态的被删除

[oracle@We1-db_Test ~]$ ls -l /opt/app/oracle/oradata/orcl/redo0*
-rw-r----- 1 oracle oinstall 52429312 Jan 15 15:29 /opt/app/oracle/oradata/orcl/redo04.log
-rw-r----- 1 oracle oinstall 52429312 Jan 15 16:26 /opt/app/oracle/oradata/orcl/redo05.log

SQL> select group#,SEQUENCE#,STATUS FROM V$lOG;

    GROUP#  SEQUENCE# STATUS
---------- ---------- ----------------
         1       8989 CURRENT
         2          0 UNUSED
         5          0 UNUSED
         4          0 UNUSED
         3       8988 ACTIVE

SQL> select member from v$logfile;

MEMBER
-----------------------------------------------------
/opt/app/oracle/oradata/orcl/redo03.log
/opt/app/oracle/oradata/orcl/redo02.log
/opt/app/oracle/oradata/orcl/redo01.log
/opt/app/oracle/oradata/orcl/redo04.log
/opt/app/oracle/oradata/orcl/redo05.log

基于当前情况,直接open库无望,但是空间不足问题需要先解决,不然恢复过程中创建redo空间不足依旧会报错卡死,所以先清理了监听和alert等日志,系统空闲了3G多空间,可以进行恢复操作

[oracle@We1-db_Test trace]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        3.9G     0  3.9G   0% /dev
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           3.9G  880K  3.9G   1% /run
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/vda1        40G   34G  3.9G  90% /
tmpfs           783M     0  783M   0% /run/user/0

恢复数据库
1. 由于active redo丢失,毫无疑问,直接强制拉库,使用_allow_resetlogs_corruption参数开干

SQL> startup mount pfile='/tmp/pfile';
ORACLE instance started.

Total System Global Area 2455228416 bytes
Fixed Size                  2255712 bytes
Variable Size             905970848 bytes
Database Buffers         1526726656 bytes
Redo Buffers               20275200 bytes
Database mounted.
SQL> recover database until cancel;
ORA-00283: recovery session canceled due to errors
ORA-01610: recovery using the BACKUP CONTROLFILE option must be done


SQL> recover database using backup controlfile;
ORA-00279: change 311982775 generated at 12/31/2025 17:35:11 needed for thread
1
ORA-00289: suggestion :
/opt/app/oracle/fast_recovery_area/ORCL/archivelog/2026_01_16/o1_mf_1_8988_%u_.a
rc
ORA-00280: change 311982775 for thread 1 is in sequence #8988


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
Media recovery cancelled.
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2663], [0], [311982792], [0],
[311982833], [], [], [], [], [], [], []
Process ID: 11917
Session ID: 576 Serial number: 3

alert日志报错

Fri Jan 16 21:25:31 2026
Assigning activation ID 1750515127 (0x6856bdb7)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /opt/app/oracle/oradata/orcl/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Jan 16 21:25:31 2026
SMON: enabling cache recovery
Errors in file /opt/app/oracle/diag/rdbms/orcl/we1db/trace/we1db_ora_11917.trc  (incident=81753):
ORA-00600: internal error code, arguments: [2663], [0], [311982792], [0], [311982833], [], []
Incident details in: /opt/app/oracle/diag/rdbms/orcl/we1db/incident/incdir_81753/we1db_ora_11917_i81753.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /opt/app/oracle/diag/rdbms/orcl/we1db/trace/we1db_ora_11917.trc:
ORA-00600: internal error code, arguments: [2663], [0], [311982792], [0], [311982833], [], []
Errors in file /opt/app/oracle/diag/rdbms/orcl/we1db/trace/we1db_ora_11917.trc:
ORA-00600: internal error code, arguments: [2663], [0], [311982792], [0], [311982833], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 11917): terminating the instance due to error 600
Instance terminated by USER, pid = 11917
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (11917) as a result of ORA-1092
Fri Jan 16 21:25:33 2026
ORA-1092 : opitsk aborting process

不幸数据库遇到ORA-600 2663错误,这个故障在以前的文章中描述过,基本上和ORA-600 2662的处理思路类似,这里直接使用:Patch_SCN for Linux进行恢复
2. 使用Patch_SCN处理数据库SCN

SQL> startup nomount pfile='/tmp/pfile';
ORACLE instance started.

Total System Global Area 2455228416 bytes
Fixed Size                  2255712 bytes
Variable Size             905970848 bytes
Database Buffers         1526726656 bytes
Redo Buffers               20275200 bytes
SQL>@rectl

Control file created.

SQL> recover database;
Media recovery complete.

patch_scn


SQL> alter database open;

Database altered.
SQL> SELECT CURRENT_SCN FROM V$DATABASE;

     CURRENT_SCN
----------------
       322002903

到这里完成数据库open操作,后续逻辑导出完成恢复任务

在生产环境错误执行dd命令破坏asm磁盘故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:在生产环境错误执行dd命令破坏asm磁盘故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

由于ssh登录错误,客户对生产环境进行了误操作把系统的一块磁盘dd到另外两个磁盘上,由于及时发现立马进行了终止操作,但是还是分别破坏了一点数据(一块盘破坏了2G多,另外一块盘破坏了1G多)
dd


通过分析udev的绑定关系确认被破坏的asm disk名称
udev

再通过asm alert日志确认破坏磁盘在asm disk中情况
asmdisk

通过上述信息基本上可以确认,asmdisk13被分别dd到了asmdisk11和asmdisk26中了部分11
26

基于这种情况,由于asm disk被破坏了1-2G多,直接修复然后正常mount磁盘组基本上没有希望,经过分析以及与客户沟通,确认他们改系统是4节点组成的集群,1/2节点上面跑2套库,3/4节点上跑2套库,数据整体放在data_dg磁盘组中,需要恢复的库是第二个顺序创建的1套库(4套库中只需恢复一套即可),由于破坏的数据本身不多,而且需要恢复的数据不是最初写入asm磁盘组,基于这样的情况,需要恢复的数据库机会比较大.

由于现在三个磁盘头信息一致(一个磁盘被dd到另外两个磁盘上),因此第一步先把损坏的两个磁盘头进行简单修复,为了便于amdu(找回ASM中数据文件)等之类数据可以识别到正确的磁盘头信息,然后进行后续的数据文件提取恢复.使用工具对数据文件进行了批量提取,提取数据完成之后,尝试recover和open库
open

虽然数据库正常打开了,不过很不幸,后台还是有一些坏块报错,通过dbv检查发现有文件有一部分坏块,类似dbv报错信息
1_1

通过分析该文件在磁盘组中各个磁盘的分布情况
map

确认该文件确实有部分block分布在被dd磁盘的破坏的范围内这个部分的数据丢失无法挽回,只能是定位到具体对象然后由业务想办法处理.相对以往的各种dd破坏的案例恢复而言,这个应该是效果比较好的一个,而且也是恢复比较容易的一个,没有使用到asm disk 基于asm au/oracle block 扫描的级别,而且system表空间没有任何损坏,数据库甚至直接open成功了,以往的一些dd案例列举:
asm磁盘dd破坏恢复
dd破坏asm磁盘头恢复
asm disk 磁盘部分被清空恢复

obet实现对数据文件坏块检测功能

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:obet实现对数据文件坏块检测功能

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

通过一段时间的测试和使用,obet修复了不少bug,关于obet的以往功能和特性的文章:
Oracle数据块编辑工具( Oracle Block Editor Tool)-obet
obet(Oracle Block Editor Tool)第二版发布
并且也在客户的生产环境上进行了实战:obet快速修改scn/resetlogs恢复数据库(缺少归档,ORA-00308).利用周末的时间又对obet的工具进行了功能增强,增加了dbv(数据块校验)功能.

Oracle dbv的不足
1. oracle dbv需要在安装oracle服务端的环境下才能执行
2. oracle dbv对于文件大小不正确(文件头记录block数和实际文件大小不匹配),文件头损坏等情况都可能导致dbv无法执行,类似下面的报错

C:\Users\XFF>dbv file=H:\BaiduNetdisk\kingdee\system01.dbf

DBVERIFY: Release 11.2.0.4.0 - Production on 星期日 1月 11 17:30:29 2026

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.


DBV-00107: 未知标头格式 (0) (2054913149)

C:\Users\XFF>
C:\Users\XFF>dbv file=H:\BaiduNetdisk\kingdee\users01.dbf

DBVERIFY: Release 11.2.0.4.0 - Production on 星期日 1月 11 20:10:05 2026

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.


DBV-00102: FILE (H:\BAIDUNETDISK\KINGDEE\USERS01.DBF) 在 end read 操作 (-1) 期间出现文件 I/O 错误

C:\Users\XFF>

3. oracle dbv一条命令执行检测一个数据文件,如果数据文件多,检查起来很繁琐
4. oracle dbv么有检查进度,对于io性能较慢,数据文件较大的情况,无法跟踪检查进度.

obet的dbv功能使用
1. 配置listfile.txt文件,格式为file# name

1 H:\BaiduNetdisk\kingdee\system01.dbf
5 H:\BaiduNetdisk\kingdee\EAS_D_EAS_STANDARD.ORA

2. 启动obet,执行open listfile.txt(如果不在obet目录提供完整路径)

OBET> open listfile.txt
Loaded 2 files from config file 'listfile.txt'.

OBET> info

Loaded files (2 total):
----------------------------------------
Number  Path
----------------------------------------
     1  H:\BaiduNetdisk\kingdee\system01.dbf
     5  H:\BaiduNetdisk\kingdee\EAS_D_EAS_STANDARD.ORA
----------------------------------------

3. 执行dbv命令(logfile 部分为可选),默认会记录日志在obet目录下面dbv_年月日时分秒.log的日志

OBET> dbv

===============================================
DBV (Data Block Verification)
Block Size: 8192 bytes
===============================================

Verifying file #1: H:\BaiduNetdisk\kingdee\system01.dbf (131841 blocks) - Started: 2026-01-11 18:03:58
file 1, block 0: checksum error (expected 0xC4DA, got 0xC478), bad block
file 1, block 1: checksum error (expected 0xF835, got 0xB835), bad block
  Progress: 10000 / 131841 blocks checked...
  Progress: 20000 / 131841 blocks checked...
……………….
  Progress: 120000 / 131841 blocks checked...
  Progress: 130000 / 131841 blocks checked...
  File #1 completed: 0 all zero, 0 soft corrupted, 0 tailchk error, 2 checksum error, 0 rdba error
Verifying file #5: H:\BaiduNetdisk\kingdee\EAS_D_EAS_STANDARD.ORA (4194303 blocks) - Started: 2026-01-11 18:04:08
  Progress: 10000 / 4194303 blocks checked...
  Progress: 20000 / 4194303 blocks checked...
……………………
  Progress: 260000 / 4194303 blocks checked...
  Progress: 270000 / 4194303 blocks checked...
file 5, block 277678: tailchk error (expected 0x0228BB21, got 0x0228AD21), bad block
file 5, block 277679: rdba error (expected 277679, got 277615), bad block
file 5, block 277680: rdba error (expected 277680, got 277616), bad block
file 5, block 277681: rdba error (expected 277681, got 277617), bad block
………………
file 5, block 277692: rdba error (expected 277692, got 277628), bad block
file 5, block 277693: rdba error (expected 277693, got 277629), bad block
file 5, block 277694: rdba error (expected 277694, got 277630), bad block
file 5, block 279406: tailchk error (expected 0x02281E22, got 0x00000700), bad block
file 5, block 279407: rdba error (expected 279407, got 448), bad block
file 5, block 279408: rdba error (expected 279408, got 0), bad block
………………
  Progress: 280000 / 4194303 blocks checked...
  Progress: 290000 / 4194303 blocks checked...
  Progress: 300000 / 4194303 blocks checked...
  Progress: 310000 / 4194303 blocks checked...
file 5, block 312932: tailchk error (expected 0x0106C0B8, got 0x010629B8), bad block
file 5, block 312933: rdba error (expected 312933, got 312869), bad block
file 5, block 312934: rdba error (expected 312934, got 312870), bad block
file 5, block 312935: rdba error (expected 312935, got 312871), bad block
file 5, block 312936: rdba error (expected 312936, got 312872), bad block
file 5, block 312937: rdba error (expected 312937, got 312873), bad block
file 5, block 312938: rdba error (expected 312938, got 312874), bad block
file 5, block 312939: rdba error (expected 312939, got 312875), bad block
………………
  Progress: 4180000 / 4194303 blocks checked...
  Progress: 4190000 / 4194303 blocks checked...
  File #5 completed: 1 all zero, 0 soft corrupted, 13 tailchk error, 1 checksum error, 255 rdba error

DBV completed at: 2026-01-11 18:09:30

===============================================
DBV Summary:
Total blocks checked: 4325872
Total all zero blocks found: 1
Total all rdba error blocks found: 255
Total all tailchk error blocks found: 13
Total all soft corrupted blocks found: 0
Total all checksum error blocks found: 3
Total bad blocks found: 272
===============================================

Detailed report saved to: dbv_20260111180358.log

OBET>

检测效果截图
obet-dbv


oracle linux 8.10注意pmlogger导致空间被大量占用

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:oracle linux 8.10注意pmlogger导致空间被大量占用

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最近在多个客户的oracle linux 8.10的机器上发现/var/log/pcp/pmlogger目录占用空间过大,比如

[root@oracledb lib]# df -h
文件系统             容量  已用  可用 已用% 挂载点
devtmpfs              16G     0   16G    0% /dev
tmpfs                 16G     0   16G    0% /dev/shm
tmpfs                 16G  202M   16G    2% /run
tmpfs                 16G     0   16G    0% /sys/fs/cgroup
/dev/mapper/ol-root   60G   60G   63M  100% /
/dev/mapper/ol-home   30G  241M   29G    1% /home
/dev/vda1           1014M  360M  655M   36% /boot
/dev/vdb1            688G  224G  430G   35% /u01
tmpfs                3.1G   12K  3.1G    1% /run/user/42
tmpfs                 60M     0   60M    0% /var/log/rtlog
tmpfs                3.1G  8.0K  3.1G    1% /run/user/0
[root@oracledb ~]# cd /var/log/pcp/
[root@oracledb pcp]# du -h -d 1
76K	./pmcd
0	./pmfind
476K	./pmie
34G	./pmlogger
0	./pmproxy
0	./sa
34G	.
[root@oracledb pcp]# cd pmlogger/
[root@oracledb pmlogger]# du -h -d 1
34G	./localhost.localdomain
108K	./oracledb
34G	.

目录里面具体类容类似:
logpcp


操作系统环境为:

[root@oracledb localhost.localdomain]# uname -a
Linux oracledb 5.15.0-206.153.7.1.el8uek.x86_64 #2 SMP Wed May 22 20:49:34 PDT 2024 x86_64 x86_64 x86_64 GNU/Linux
[root@oracledb localhost.localdomain]# cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.10 (Ootpa)
[root@oracledb localhost.localdomain]# cat /etc/os-release 
NAME="Oracle Linux Server"
VERSION="8.10"
ID="ol"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="8.10"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Oracle Linux Server 8.10"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:oracle:linux:8:10:server"
HOME_URL="https://linux.oracle.com/"
BUG_REPORT_URL="https://github.com/oracle/oracle-linux"

ORACLE_BUGZILLA_PRODUCT="Oracle Linux 8"
ORACLE_BUGZILLA_PRODUCT_VERSION=8.10
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=8.10

pmlogger 是 Performance Co-Pilot(PCP)的核心组件,用于采集、归档系统 / 应用性能指标,生成可回放的 PCP 归档日志,支撑离线追溯分析与长期性能基线管理。它通过 PMCD(Performance Metrics Collector Daemon)获取指标,按配置策略记录并维护多卷归档,配合 PCP 工具链实现全链路性能分析。
这个本身是一个系统性能指标采集监控的重要工具,但是由于某种原因导致占用磁盘空间过大,而且大部分情况下有另外的监控来取代,引起大部分情况下,可以考虑关闭相关服务,并清空相关日志,避免出现磁盘空间不足的情况

[root@oracledb ~]# systemctl status pmlogger
● pmlogger.service - Performance Metrics Archive Logger
   Loaded: loaded (/usr/lib/systemd/system/pmlogger.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2025-11-03 22:25:48 CST; 1 months 27 days ago
     Docs: man:pmlogger(1)
 Main PID: 5580 (pmlogger)
    Tasks: 1 (limit: 818103)
   Memory: 27.9G
   CGroup: /system.slice/pmlogger.service
           └─5580 /usr/libexec/pcp/bin/pmlogger -N -P -r -T24h10m -c config.default -v 100mb -mreexec %Y%m%d.%H.%M

Nov 03 22:25:46 localhost.localdomain systemd[1]: Starting Performance Metrics Archive Logger...
Nov 03 22:25:48 localhost.localdomain systemd[1]: Started Performance Metrics Archive Logger.
[root@oracledb ~]# systemctl stop pmlogger
[root@oracledb ~]# cd /var
[root@oracledb var]# rm -rf /var/log/pcp/pmlogger/*
[root@oracledb var]# du -sh
665M	.
[root@oracledb var]# systemctl disable pmlogger
Removed /etc/systemd/system/multi-user.target.wants/pmlogger.service.

磁盘空间恢复正常

[root@oracledb var]# df -h
文件系统             容量  已用  可用 已用% 挂载点
devtmpfs              16G     0   16G    0% /dev
tmpfs                 16G     0   16G    0% /dev/shm
tmpfs                 16G  202M   16G    2% /run
tmpfs                 16G     0   16G    0% /sys/fs/cgroup
/dev/mapper/ol-root   60G   27G   34G   44% /
/dev/mapper/ol-home   30G  241M   29G    1% /home
/dev/vda1           1014M  360M  655M   36% /boot
/dev/vdb1            688G  224G  430G   35% /u01
tmpfs                3.1G   12K  3.1G    1% /run/user/42
tmpfs                 60M     0   60M    0% /var/log/rtlog
tmpfs                3.1G  8.0K  3.1G    1% /run/user/0
[root@oracledb var]# 

obet快速修改scn/resetlogs恢复数据库(缺少归档,ORA-00308)

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:obet快速修改scn/resetlogs恢复数据库(缺少归档,ORA-00308)

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最初是由于业务系统报ORA-01578导致
blk


然后相关人员上来进行了一系列操作,包括但不限于offline文件,create datafile 44,recover datafile,还尝试做resetlogs,使用备份控制文件还原等等.我接手该故障的时候状态
resetlog
44号文件状态是12月1日的,而且resetlogs信息也不对.通过和客户沟通,确认是他们在没有备份44号文件的前提下直接执行了类似alter database create datafile 44的命令,但是在应用了写归档之后,发现提示有归档不存在

ALTER DATABASE RECOVER    CONTINUE DEFAULT
Media Recovery Log /data/app/oracle/archive/orcl/1_181285_1103629544.dbf
ORA-279 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
ALTER DATABASE RECOVER    CONTINUE DEFAULT
Media Recovery Log /data/app/oracle/archive/orcl/1_181286_1103629544.dbf
Tue Jan 06 12:48:12 2026
Errors with log /data/app/oracle/archive/orcl/1_181286_1103629544.dbf
Errors in file /data/app/oracle/diag/rdbms/orcl8/orcl/trace/orcl_pr00_8338.trc:
ORA-00308: cannot open archived log '/data/app/oracle/archive/orcl/1_181286_1103629544.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

分析现存归档,确实发现12月1日之后丢失了部分归档日志(归档不连续)
arch


基于这种情况(故障数据文件直接被覆盖,归档又出现了多天的连续断档,而且没有有效备份),只能先打开数据库,然后根据情况导出数据,然后导入到新库中.这里我直接使用obet来修改resetlogs和checkpoint 信息,实现数据库快速打开

OBET> help
OBET (Oracle Block Editor Tool) commands:
  open <config_file>    - Load file list from config file (format: <num> <path>)
  info                  - Show loaded file list (from open command)
  set filename <path>   - Set target file path (required)
  set file <num>        - Set filename using loaded file number (from open list)
  set blocksize <size>  - Set block size (2048,4096,[8192],16384,32768)
  set block <num>       - Set block number (starts from 0, default: 1)
  set offset <offset>   - Set offset within block (< blocksize, default: 0)
  set count <bytes>     - Set number of bytes to read (default: 32)
  set mode edit/browse  - Enable edit/browse mode
  d/dump [options]      - Display data (options: block X, offset Y, count N)
  m/modify <hex> [opts] - Modify data with hex (opts: block X, offset Y)
  undo                  - Undo last modification
  sum [block X]         - Calculate checksum for block (default: current block)
  sum apply [block X]   - Apply checksum: write calculated value to block
  tailchk [block X]     - Calculate tailchk for block (default: current block)
  tailchk apply [block X] - Apply tailchk: write calculated value to block
  repair [block X]      - Repair bad block (fix seq_kcbh, tailchk, checksum)
  copy data <src> to <dest>  - Copy data between files
    <src> format: file,block,offset,count (e.g., 1,1,10,64)
    <dest> format: file[,block][,offset] (e.g., 3 or 3,1 or 3,1,128)
  copy block file#,block# to file#,block#     - Copy entire data block
  copy chkscn file n to file m         - Copy datafile header checkpoint SCN info
  copy resetlogscn file n to file m    - Copy datafile header resetlogs info
  backup [file n] [block m] - Backup data block (optional file and block parameters)
  corrupt [block X]     - Mark block as corrupted (default: current block)
  show                  - Display current settings (filename, blocksize, block, offset, count, mode)
  license               - Show/manage software license (registration code required)
  version               - Show software version and developer information
  p/print <param>       - Print Oracle structure,Use the 'p/print' command to see details
  undo                  - Undo the last copy chkscn or copy resetlogscn operation
  spool <file>          - Start logging to file (spool off to stop)
  help                  - Show this help message
  quit/exit             - Exit OBET

OBET> open 1.txt                 
Loaded 2 files from config file '1.txt'.

OBET> info

Loaded files (2 total):
----------------------------------------
Number  Path
----------------------------------------
     1  /data/app/oracle/oradata/orcl/system01.dbf
    44  /data/app/oracle/oradata/orcl/xff26.dbf
----------------------------------------

OBET> set file 44
filename set to: /data/app/oracle/oradata/orcl/xff26.dbf (file#44)

OBET> d

File: /data/app/oracle/oradata/orcl/xff26.dbf
Block: 1                Offsets:     0 to    31
--------------------------------------------------------------------------------
0xifenfei000 0BA20000 0100000B 00000000 00000104 B05F0000 00000000 0004200B 65583D61 

<32 bytes read>


OBET> backup file 44 block 1
Created backup directory: backup_blk
Successfully backed up block 1 from file#44 to /tmp/backup_blk/xff26.dbf_1.20260106182405

OBET> copy resetlogscn file 1 to file 44
Error: Edit mode not enabled. Use 'set mode edit' first.

OBET> set mode edit     
mode set to: edit

OBET> copy resetlogscn file 1 to file 44

Confirm Modify resetlogscn:
Source: file#1 (/data/app/oracle/oradata/orcl/system01.dbf)
Target: file#44 (/data/app/oracle/oradata/orcl/xff26.dbf)
Proceed? (Y/YES to confirm): y
Successfully copied resetlog SCN information from file#1 to file#44.

OBET> copy chkscn file 1 to file 44

Confirm Modify chkscn:
Source: file#1 (/data/app/oracle/oradata/orcl/system01.dbf)
Target: file#44 (/data/app/oracle/oradata/orcl/xff26.dbf)
Proceed? (Y/YES to confirm): y
Successfully copied checkpoint SCN information from file#1 to file#44.

OBET> tailchk
Check tailchk for File /data/app/oracle/oradata/orcl/xff26.dbf, Block 1:
current = 0x010B0000, required = 0x010B0000

OBET> sum
Check value for File /data/app/oracle/oradata/orcl/xff26.dbf, Block 1:
current = 0xB05F, required = 0x6DA3

OBET> sum apply

Confirm applying checksum:
File: /data/app/oracle/oradata/orcl/xff26.dbf
Block: 1
Offset in block: 16 (file offset: 0x0xifenfei010)
Original value: 0xB05F
New value:      0x6DA3
Confirm? (Y/YES to proceed): y
Verification successful: Stored checksum matches calculated value (0x6DA3).
Checksum applied successfully.

然后重建ctl正常打开数据库

[oracle@xifenfei check_db]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Tue Jan 6 18:29:22 2026

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup mount pfile='/tmp/pfile';
ORACLE instance started.

Total System Global Area 6.8413E+10 bytes
Fixed Size		    2270360 bytes
Variable Size		 3.3554E+10 bytes
Database Buffers	 3.4762E+10 bytes
Redo Buffers		   93585408 bytes
Database mounted.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01190: control file or data file 1 is from before the last RESETLOGS
ORA-01110: data file 1: '/data/app/oracle/oradata/orcl/system01.dbf'


SQL> shutdown immediate;
ORA-01109: database not open


Database dismounted.
ORACLE instance shut down.
SQL> 
SQL> 
SQL> startup nomount pfile='/tmp/pfile
';
ORACLE instance started.

Total System Global Area 6.8413E+10 bytes
Fixed Size		    2270360 bytes
Variable Size		 3.3554E+10 bytes
Database Buffers	 3.4762E+10 bytes
Redo Buffers		   93585408 bytes
SQL> @ctl.sql

Control file created.

SQL> SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-00264: no recovery required


SQL> alter database open;

Database altered.

后续根据情况选择导出表,对于导出报错的表再进行二次处理.

使用DBMS_PDB.RECOVER抢救单个pdb

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:使用DBMS_PDB.RECOVER抢救单个pdb

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

由于存储异常导致数据库启动报ORA-1172错误
ORA-1172


尝试recover datafile 1 报ORA-600 kdBlkCheckError错误

Wed Jan 07 14:31:17 2026
Decreasing number of real time LMS from 5 to 0
ALTER DATABASE RECOVER  datafile 1
Wed Jan 07 14:31:21 2026
Media Recovery Start
Wed Jan 07 14:31:21 2026
Serial Media Recovery started
Wed Jan 07 14:31:21 2026
Recovery of Online Redo Log: Thread 2 Group 24 Seq 2584 Reading mem 0
  Mem# 0: +DATA/CDBDG/ONLINELOG/group_24.393.1122613933
  Mem# 1: +ARCH/CDBDG/ONLINELOG/group_24.276.1122613933
Wed Jan 07 14:31:21 2026
Recovery of Online Redo Log: Thread 1 Group 18 Seq 13823 Reading mem 0
  Mem# 0: +DATA/CDBDG/ONLINELOG/group_18.387.1122613923
  Mem# 1: +ARCH/CDBDG/ONLINELOG/group_18.270.1122613925
Wed Jan 07 14:31:22 2026
Recovery of Online Redo Log: Thread 1 Group 19 Seq 13824 Reading mem 0
  Mem# 0: +DATA/CDBDG/ONLINELOG/group_19.388.1122613925
  Mem# 1: +ARCH/CDBDG/ONLINELOG/group_19.271.1122613925
Errors in file /u01/app/oracle/diag/rdbms/cdbdg/cdb1/trace/cdb1_ora_2753010.trc (incident=806358) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [kdBlkCheckError], [1], [105538], [6401], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/cdbdg/cdb1/incident/incdir_806358/cdb1_ora_2753010_i806358.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Jan 07 14:31:25 2026
Media Recovery failed with error 10562
ORA-283 signalled during: ALTER DATABASE RECOVER  datafile 1  ...

通过dbv检查file 1(system)数据文件,发现大量坏块

bash-4.2$ dbv file=system.271.1122584815

DBVERIFY: Release 12.1.0.2.0 - Production on Wed Jan 7 20:15:26 2026

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /data/bak/system.271.1122584815

DBV-00200: Block, DBA 4195214, already marked corrupt

DBV-00200: Block, DBA 4210332, already marked corrupt

DBV-00200: Block, DBA 4210333, already marked corrupt
……………………
DBV-00200: Block, DBA 4312889, already marked corrupt

DBV-00200: Block, DBA 4313119, already marked corrupt

DBV-00200: Block, DBA 4313122, already marked corrupt


DBVERIFY - Verification complete

Total Pages Examined         : 524288
Total Pages Processed (Data) : 79989
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 20781
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 408840
Total Pages Processed (Seg)  : 1
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 14678
Total Pages Marked Corrupt   : 33
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 3135037281 (4374.3135037281)

分析发现这个库是包含了多个pdb,客户的业务主要在con_id=3的pdb中

bash-4.2$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Wed Jan 7 20:27:42 2026

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> show pdbs;

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       MOUNTED
         3 XFF                            MOUNTED
         4 DYCW                           MOUNTED
SQL>

尝试对除cdb的system/sysaux之外的文件进行recover操作均成功
2


对于这种情况,比较好的处理思路就是通过对单独的pdb文件使用DBMS_PDB.RECOVER处理,然后插入到一个新的cdb中,第一步使用dbca创建一个和当前库字符集一样的cdb

dbca -silent -ignorePreReqs -createDatabase -templateName $ORACLE_HOME/assistants/dbca/templates/General_Purpose.dbc \
-responseFile NO_VALUE \
-gdbname oratest  -sid oratest \
-createAsContainerDatabase TRUE \
-sysPassword oracle -systemPassword oracle -pdbAdminPassword oracle -dbsnmpPassword oracle \
-datafileDestination '/data/oradata' \
-redoLogFileSize 200 \
-characterset ZHS16GBK -nationalCharacterSet AL16UTF16 \
-totalMemory 4096 \
-databaseType MULTIPURPOSE \
-emConfiguration NONE

然后利用老的pdb生成需要的xml文件

BEGIN
DBMS_PDB.RECOVER(PDB_DESCR_FILE=> '/tmp/XFF.xml', PDB_NAME=>'XFF', 
FILENAMES=> '/data/bak/system.268.1122584817,
/data/bak/system.441.1125224053,
/data/bak/sysaux.269.1122584817,
…………
/data/bak/odin.312.1122593153,
/data/bak/dsg.314.1122593223');
END; 
/

利用该xml文件,直接把这个pdb插入到新cdb中

Wed Jan 07 14:40:52 2026
create pluggable database XFF using '/tmp/XFF.xml' nocopy
Wed Jan 07 14:41:29 2026
Session (221,18860): EXTERNAL SCN ALERT: Advanced SCN by 3185580 minutes worth to 0x1116.bade672c,
    by Pluggable Database plug-in
Session (221,18860): EXTERNAL SCN SOURCE: Inbound connection from client
Session (221,18860): EXTERNAL SCN SOURCE: DB Logon User: SYS, Remote Machine: p770a,
  Program: sqlplus@p770a (TNS V1-V3),OS User: oracle
****************************************************************
Pluggable Database XFF with pdb id - 3 is created as UNUSABLE.
If any errors are encountered before the pdb is marked as NEW,
then the pdb must be dropped
****************************************************************
Database Characterset for XFF is ZHS16GBK
Deleting old file#8 from file$ 
Deleting old file#9 from file$ 
Deleting old file#10 from file$ 
…………
Deleting old file#148 from file$ 
Deleting old file#199 from file$ 
Deleting old file#200 from file$ 
Adding new file#9 to file$(old file#8) 
Adding new file#8 to file$(old file#145) 
Adding new file#10 to file$(old file#9) 
…………
Adding new file#19 to file$(old file#148) 
Adding new file#18 to file$(old file#199) 
Adding new file#17 to file$(old file#200) 
Marking tablespace #2 invalid since it is not present   in the describe file 
Successfully created internal service XFF at open
ALTER SYSTEM: Flushing buffer cache inst=0 container=3 local
Wed Jan 07 14:41:49 2026
****************************************************************
Post plug operations are now complete.
Pluggable database XFF with pdb id - 3 is now marked as NEW.
****************************************************************
Completed: create pluggable database XFF using '/tmp/XFF.xml' nocopy
alter database open
Wed Jan 07 14:43:21 2026
Pluggable database XFF dictionary check beginning
Tablespace 'TEMP' #2 found in data dictionary,
but not in the controlfile. Adding to controlfile.
Pluggable Database XFF Dictionary check complete
Database Characterset for XFF is ZHS16GBK
*********************************************************************
WARNING: The following temporary tablespaces in container(XFF)
         contain no files.
         This condition can occur when a backup controlfile has
         been restored.  It may be necessary to add files to these
         tablespaces.  That can be done using the SQL statement:
 
         ALTER TABLESPACE <tablespace_name> ADD TEMPFILE
 
         Alternatively, if these temporary tablespaces are no longer
         needed, then they can be dropped.
           Empty temporary tablespace: TEMP
*********************************************************************
***************************************************************
WARNING: Pluggable Database XFF with pdb id - 3 is
         altered with errors or warnings. Please look into
         PDB_PLUG_IN_VIOLATIONS view for more details.
***************************************************************
Opening pdb XFF (3) with no Resource Manager plan active
Pluggable database XFF opened read write
Completed: alter database open
alter tablespace temp add tempfile '/data/bak/temp01.dbf' size 16m autoextend on
Completed: alter tablespace temp add tempfile '/data/bak/temp01.dbf' size 16m autoextend on
ALTER PLUGGABLE DATABASE CLOSE IMMEDIATE
Wed Jan 07 14:46:08 2026
Immediate Kill Session#: 366, Serial#: 2995
Immediate Kill Session: sess: 7000100bafb5f58  OS pid: 5637924
ALTER SYSTEM: Flushing buffer cache inst=0 container=3 local
Wed Jan 07 14:46:22 2026
Pluggable database XFF closed
Completed: ALTER PLUGGABLE DATABASE CLOSE IMMEDIATE
alter database open
Wed Jan 07 14:46:48 2026
Database Characterset for XFF is ZHS16GBK
***************************************************************
WARNING: Pluggable Database XFF with pdb id - 3 is
         altered with errors or warnings. Please look into
         PDB_PLUG_IN_VIOLATIONS view for more details.
***************************************************************
Opening pdb XFF (3) with no Resource Manager plan active
Pluggable database XFF opened read write
Completed: alter database open

3


这次4T多的aix环境rac数据库的单个pdb通过DBMS_PDB.RECOVER方法非常完美的恢复,DBMS_PDB.RECOVER是拯救单个pdb的神兵利器.

aix环境写入大文件设置combehin提高效率

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:aix环境写入大文件设置combehin提高效率

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有一段时间没有弄aix系统了,这次有一个aix的rac数据库故障,需要把数据文件做一个备份,由于硬盘本身性能不好,通过rman的copy或者asmcmd的cp命令都会引起crs的表决盘超时,然后主机重启导致拷贝失败,最后在Db.recovery的建议下,通过amdu进行拷贝,由于拷贝的目录是挂载nfs文件系统,虽然通过amdu拷贝绕过了crs(关闭掉crs,不会触发主机重启的问题),但是在拷贝过程中发现稍微大一点的文件,会出现比较超时间的io类似卡死的情况,类似这样:
1


查看mount相关参数2

而且通过观察,文件越大速度越慢,一个bigfile文件1T,中间出现类似这样io卡死的时间更长.导致整体时间会很长

-rw-r--r--    1 root     system   8589942784 Jan 03 06:54 data.270.1122584819
-rw-r--r--    1 root     system   1073741832192 Jan 04 20:59 users.263.1122584819

通过观察上一个文件的完成时间1月3日06:54,1T的文件整体拷贝结束时间1月4日20:59,整体时间为38小时05分钟,最终的拷贝速度平均为:1000*1024/(38*60+5)大概1分钟448MB,也就是每秒7.5MB不到的速度.通过查询资料发现在aix平台的nfs有特殊性,拷贝大文件可能会速度慢很多:改善 NFS 客户机大文件写操作性能,确认了可以考虑加上-o combehin参数来提高效率.
combehind(Complete Behind)直译是 “后置完成”,它是 AIX 为 NFS 客户端优化 写操作(write) 性能的核心参数,作用是:
1. 当 NFS 客户端向服务端发起写请求时,不等待服务端返回 “写完成” 确认,直接向应用层返回 “写成功”;
2. 写请求的最终完成由 AIX 内核在后台异步处理,应用层无需阻塞等待服务端响应。
简单来说:这个参数让 NFS 写操作从 “同步等待” 变成 “异步后置确认”,减少应用程序的等待时间,提升写性能。

通过设置该参数之后
4


备份速度基本上稳定在18-20M之间,没有再次出现io类似挂起的现象
3

对于在aix环境,如果使用nfs写入较大文件,可以考虑设置combehind参数,提高效率,但是同时这个参数也是一个比较有风险的参数,因为应用层收到 “写成功” 时,数据可能还没真正写入 NFS 服务端磁盘,若此时客户端 / 服务端宕机、网络中断,未完成的写数据会丢失。

数据库启动报 maximum number of processes () exceeded分析

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:数据库启动报 maximum number of processes () exceeded分析

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

比较少遇到打开一个数据库报ORA-00020: maximum number of processes () exceeded,导致open数据库失败

SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00450: background process 'QMNC' did not start
ORA-00444: background process "QMNC" failed while starting
ORA-00020: maximum number of processes () exceeded
进程 ID: 6340
会话 ID: 109 序列号: 3

看这个报错应该是由于进程数达到了限制,导致在open数据库过程中启动后台进程QMNC失败,导致open过程失败,分析进程数不足的原因,查看启动时候的参数配置

Fri Dec 26 16:48:27 2025
Adjusting the default value of parameter parallel_max_servers
from 2560 to 135 due to the value of parameter processes (150)
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
Fri Dec 26 16:48:38 2025
Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DEST
Autotune of undo retention is turned on. 
IMODE=BR
ILAT =28
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options.
Windows NT Version V6.2  
CPU                 : 64 - type 8664, 32 Physical Cores
Process Affinity    : 0x0x0000000000000000
Memory (Avail/Total): Ph:82501M/130752M, Ph+PgF:103658M/150208M 
Using parameter settings in client-side pfile D:\PFILE.TXT on machine HISSERVER
System parameters with non-default values:
  processes                = 150
  sga_target               = 39296M
  control_files            = "D:\ORACLE64\ORADATA\ORA817\CONTROL01.CTL"
  control_files            = "D:\ORACLE64\RECOVERY_AREA\ORA817\CONTROL02.CTL"
  db_block_size            = 8192
  compatible               = "11.2.0.0.0"
  db_recovery_file_dest    = "D:\ORACLE64\recovery_area"
  db_recovery_file_dest_size= 4122M
  undo_tablespace          = "UNDOTBS1"
  remote_login_passwordfile= "EXCLUSIVE"
  db_domain                = ""
  dispatchers              = "(PROTOCOL=TCP) (SERVICE=ora817XDB)"
  audit_file_dest          = "D:\ORACLE64\ADMIN\ORA817\ADUMP"
  audit_trail              = "DB"
  db_name                  = "ora817"
  open_cursors             = 300
  pga_aggregate_target     = 13075M
  diagnostic_dest          = "D:\ORACLE64"
Fri Dec 26 16:49:10 2025
PMON started with pid=2, OS id=1992 
Fri Dec 26 16:49:10 2025
PSP0 started with pid=3, OS id=6744 
Fri Dec 26 16:49:12 2025
VKTM started with pid=4, OS id=7592 at elevated priority
Fri Dec 26 16:49:12 2025
GEN0 started with pid=5, OS id=7780 
Fri Dec 26 16:49:12 2025
DBRM started with pid=7, OS id=5372 
Fri Dec 26 16:49:12 2025
MMAN started with pid=9, OS id=6700 
Fri Dec 26 16:49:12 2025
DBW0 started with pid=10, OS id=1920 
Fri Dec 26 16:49:12 2025
DBW1 started with pid=11, OS id=4476 
Fri Dec 26 16:49:12 2025
DBW2 started with pid=12, OS id=6788 
Fri Dec 26 16:49:12 2025
DBW3 started with pid=13, OS id=4608 
Fri Dec 26 16:49:12 2025
DBW4 started with pid=14, OS id=6716 
Fri Dec 26 16:49:12 2025
DBW5 started with pid=15, OS id=6252 
Fri Dec 26 16:49:12 2025
DBW6 started with pid=16, OS id=7948 
Fri Dec 26 16:49:12 2025
DBW7 started with pid=17, OS id=5868 
Fri Dec 26 16:49:12 2025
LGWR started with pid=18, OS id=1644 
Fri Dec 26 16:49:12 2025
CKPT started with pid=19, OS id=5704 
Fri Dec 26 16:49:12 2025
SMON started with pid=20, OS id=4532 
Fri Dec 26 16:49:12 2025
RECO started with pid=21, OS id=380 
Fri Dec 26 16:49:12 2025
MMON started with pid=22, OS id=1316 
Fri Dec 26 16:49:12 2025
MMNL started with pid=23, OS id=6108 
Fri Dec 26 16:49:12 2025
DIAG started with pid=6, OS id=7904 
Fri Dec 26 16:49:12 2025
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
starting up 1 shared server(s) ...
ORACLE_BASE from environment = D:\ORACLE64

这里可以看出来数据库版本是11.2.0.3,processes配置为默认值150.进一步查看启动报错时候的alert日志

ALTER DATABASE RECOVER  database  
Media Recovery Start
 started logmerger process
Fri Dec 26 17:03:44 2025
Media Recovery failed with error 264
Slave exiting with ORA-283 exception
Errors in file D:\ORACLE64\diag\rdbms\ora817\ora817\trace\ora817_pr00_344.trc:
ORA-00283: recovery session canceled due to errors
ORA-00264: no recovery required
Recovery Slave PR00 previously exited with exception 283
ORA-283 signalled during: ALTER DATABASE RECOVER  database  ...
alter database open
Fri Dec 26 17:03:48 2025
Thread 1 opened at log sequence 37301
  Current log# 2 seq# 37301 mem# 0: D:\ORACLE64\ORADATA\ORA817\REDO02.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Dec 26 17:03:49 2025
SMON: enabling cache recovery
[6340] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:5143437 end:5143718 diff:281 (2 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Fri Dec 26 17:03:49 2025
Database Characterset is ZHS16GBK
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
ORA-00020: maximum number of processes (150) exceeded
 ORA-20 errors will not be written to the alert log for
 the next minute. Please look at trace files to see all
 the ORA-20 errors.
Process P123 submission failed with error = 20
Starting background process QMNC
Process QMNC submission failed with error = 20
Errors in file D:\ORACLE64\diag\rdbms\ora817\ora817\trace\ora817_ora_6340.trc:
ORA-00444: 后台进程 "QMNC" 启动失败
ORA-00020: 超出最大进程数 ()
Errors in file D:\ORACLE64\diag\rdbms\ora817\ora817\trace\ora817_ora_6340.trc:
ORA-00450: 后台进程 'QMNC' 未启动
ORA-00444: 后台进程 "QMNC" 启动失败
ORA-00020: 超出最大进程数 ()
Errors in file D:\ORACLE64\diag\rdbms\ora817\ora817\trace\ora817_ora_6340.trc:
ORA-00450: 后台进程 'QMNC' 未启动
ORA-00444: 后台进程 "QMNC" 启动失败
ORA-00020: 超出最大进程数 ()
Error 450 happened during db open, shutting down database
USER (ospid: 6340): terminating the instance due to error 450
Fri Dec 26 17:03:50 2025
Errors in file D:\ORACLE64\diag\rdbms\ora817\ora817\trace\ora817_p036_8044.trc  (incident=4105):
ORA-01578: ORACLE data block corrupted (file # 9, block # 504643)
ORA-01110: data file 9: 'D:\ORACLE64\ORADATA\ORA817\HRPRUN_DATA05.DBF'
Incident details in: D:\ORACLE64\diag\rdbms\ora817\ora817\incident\incdir_4105\ora817_p036_8044_i4105.trc
Fri Dec 26 17:03:50 2025
Process m000 submission failed with error = 20
Fri Dec 26 17:04:07 2025
Instance terminated by USER, pid = 6340
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (6340) as a result of ORA-1092
Fri Dec 26 17:04:09 2025
ORA-1092 : opitsk aborting process

看alert日志比较明显有Process P123 submission failed with error = 20提示,这里表示启动P123进程时候也遭遇到了进程数不足的限制导致该进程无法启动,而Pxxx进程是oracle的并行进程(而且在后面还报了由于坏块导致P036进程报错的信息),也就是说这个数据库在启动过程中启动了大量的并行进程(至少122个已经启动),这也就是说明了为什么150个进程不足的原因.在数据库完成实例恢复,完全open之前启动并行进程进行恢复的,很可能就是并行回滚操作.为了进一步验证猜想,把processes调整到1500,然后尝试启动库

Fri Dec 26 17:06:44 2025
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 32 processes
Started redo scan
Completed redo scan
 read 20 KB redo, 5 data blocks need recovery
Started redo application at
 Thread 1: logseq 37301, block 18209
Recovery of Online Redo Log: Thread 1 Group 2 Seq 37301 Reading mem 0
  Mem# 0: D:\ORACLE64\ORADATA\ORA817\REDO02.LOG
Completed redo application of 0.02MB
Completed crash recovery at
 Thread 1: logseq 37301, block 18249, scn 327081807
 5 data blocks read, 5 data blocks written, 20 redo k-bytes read
Fri Dec 26 17:06:46 2025
Thread 1 advanced to log sequence 37302 (thread open)
Thread 1 opened at log sequence 37302
  Current log# 3 seq# 37302 mem# 0: D:\ORACLE64\ORADATA\ORA817\REDO03.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Dec 26 17:06:47 2025
SMON: enabling cache recovery
[7104] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:5321812 end:5322031 diff:219 (2 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Fri Dec 26 17:06:49 2025
Errors in file D:\ORACLE64\diag\rdbms\ora817\ora817\trace\ora817_p091_5924.trc  (incident=5745):
ORA-01578: ORACLE data block corrupted (file # 9, block # 504643)
ORA-01110: data file 9: 'D:\ORACLE64\ORADATA\ORA817\HRPRUN_DATA05.DBF'
Incident details in: D:\ORACLE64\diag\rdbms\ora817\ora817\incident\incdir_5745\ora817_p091_5924_i5745.trc
Starting background process QMNC
Fri Dec 26 17:06:49 2025
QMNC started with pid=156, OS id=8012 
Errors in file D:\ORACLE64\diag\rdbms\ora817\ora817\trace\ora817_p091_5924.trc  (incident=5746):
ORA-01578: ORACLE data block corrupted (file # , block # )
Incident details in: D:\ORACLE64\diag\rdbms\ora817\ora817\incident\incdir_5746\ora817_p091_5924_i5746.trc
Errors in file D:\ORACLE64\diag\rdbms\ora817\ora817\trace\ora817_smon_7424.trc  (incident=4961):
ORA-01578: ORACLE data block corrupted (file # 9, block # 504643)
ORA-01110: data file 9: 'D:\ORACLE64\ORADATA\ORA817\HRPRUN_DATA05.DBF'
Incident details in: D:\ORACLE64\diag\rdbms\ora817\ora817\incident\incdir_4961\ora817_smon_7424_i4961.trc
Completed: alter database open
SMON: Restarting fast_start parallel rollback
Fri Dec 26 17:06:51 2025
Errors in file D:\ORACLE64\diag\rdbms\ora817\ora817\trace\ora817_p000_5472.trc  (incident=5009):
ORA-01578: ORACLE data block corrupted (file # 9, block # 504643)
ORA-01110: data file 9: 'D:\ORACLE64\ORADATA\ORA817\HRPRUN_DATA05.DBF'
Incident details in: D:\ORACLE64\diag\rdbms\ora817\ora817\incident\incdir_5009\ora817_p000_5472_i5009.trc
Errors in file D:\ORACLE64\diag\rdbms\ora817\ora817\trace\ora817_p000_5472.trc  (incident=5010):
ORA-01578: ORACLE data block corrupted (file # , block # )
Errors in file D:\ORACLE64\diag\rdbms\ora817\ora817\trace\ora817_smon_7424.trc  (incident=4962):
ORA-01578: ORACLE data block corrupted (file # 9, block # 504643)
ORA-01110: data file 9: 'D:\ORACLE64\ORADATA\ORA817\HRPRUN_DATA05.DBF'
Incident details in: D:\ORACLE64\diag\rdbms\ora817\ora817\incident\incdir_5010\ora817_p000_5472_i5010.trc

这次就比较清晰了,数据库在open过程中由于smon启动了并行回滚并遇到了坏块,导致P091进程和smon进程错误。因为数据库已经open,这个坏块导致无法回滚,直接对这个对象进行处理即可解决掉这个回滚报错问题.

ORA-600 [ksunfy : too few sessions]

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 [ksunfy : too few sessions]

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

Oracle 修改processes参数之后,数据库启动报ORA-600 [ksunfy : too few sessions]错误
ksunfy too few sessions


对应的alert日志信息

Fri Dec 26 22:39:18 2025
Starting ORACLE instance (normal)
Errors in file D:\ORACLE64\diag\rdbms\ora817\ora817\trace\ora817_ora_2532.trc  (incident=52848):
ORA-00600: internal error code, arguments: [ksunfy : too few sessions], [9536],[75072],[],[],[],[],[],[],[],[],[]
Incident details in: D:\ORACLE64\diag\rdbms\ora817\ora817\incident\incdir_52848\ora817_ora_2532_i52848.trc
Sweep [inc][52848]: completed
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

trace信息

Dump file D:\ORACLE64\diag\rdbms\ora817\ora817\incident\incdir_52848\ora817_ora_2532_i52848.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Windows NT Version V6.2  
CPU                 : 64 - type 8664, 32 Physical Cores
Process Affinity    : 0x0x0000000000000000
Memory (Avail/Total): Ph:119070M/130752M, Ph+PgF:138338M/150208M 
Instance name: ora817
Redo thread mounted by this instance: 0 <none>
Oracle process number: 0
Windows thread id: 2532, image: ORACLE.EXE (SHAD)


*** 2025-12-26 22:39:20.746
Dump continued from file: D:\ORACLE64\diag\rdbms\ora817\ora817\trace\ora817_ora_2532.trc
ORA-00600: internal error code, arguments: [ksunfy : too few sessions], [9536],[75072],[],[],[],[],[],[],[],[],[]

========= Dump for incident 52848 (ORA 600 [ksunfy : too few sessions]) ========

*** 2025-12-26 22:39:20.770
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- SQL Statement (None) -----
Current SQL information unavailable - no session.

----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst1()+129        CALL???  skdstdst()           00190E43A 000000000 000000000
                                                   00000001D
ksedst()+69          CALL???  ksedst1()            003030000 000000000 1477C1B20
                                                   25A00000007
dbkedDefDump()+4541  CALL???  ksedst()             000000000 000000000 000000000
                                                   0464DFF90
ksedmp()+43          CALL???  dbkedDefDump()       000000003 000000002 000000000
                                                   000000000
ksfdmp()+87          CALL???  ksedmp()             000000000 000000000 000000000
                                                   000000000
dbgexPhaseII()+1819  CALL???  ksfdmp()             000000000 000000000 000000000
                                                   000000000
dbgexProcessError()  CALL???  dbgexPhaseII()       0009C0518 0009CD3F0 00000CE70
+3008                                              000000002
dbgeExecuteForError  CALL???  dbgexProcessError()  0009C0518 0009C74E8 000000001
()+65                                              000000000
dbgePostErrorKGE()+  CALL???  dbgeExecuteForError  000000000 000000030 000000201
2255                          ()                   0464D9280
dbkePostKGE_kgsf()+  CALL???  dbgePostErrorKGE()   00ABF9730 000000000 000000258
77                                                 767287A10789
kgeadse()+349        CALL???  dbkePostKGE_kgsf()   000092A00 14008CE12 14777712C
                                                   100000000
kgerinv_internal()+  CALL???  kgeadse()            000000001 000000000 1476C3B6C
76                                                 14008D953
kgerinv()+49         CALL???  kgerinv_internal()   000000004 A0000027318
                                                   100000004 000004A20
kgeasnmierr()+64     CALL???  kgerinv()            000000000 7FFEEB3DAFB0
                                                   1492BD1C0 00000003F
ksunfy()+6598        CALL???  kgeasnmierr()        1477094CC 00ABF9170 0000011F8
                                                   000000002
kscnfy()+1091        CALL???  ksunfy()             000666EF8 00161F45D 000DD7800
                                                   800000000000000
ksmcsg()+7168        CALL???  kscnfy()             100000004 1492BD1C0 0009C0518
                                                   0009C0518
opistr_real()+1225   CALL???  ksmcsg()             000000000 000000000 000000000
                                                   000000000
opistr()+350         CALL???  opistr_real()        000000030 000000002 0464DE080
                                                   000000000
opiodr()+1644        CALL???  opistr()             000000030 000000002 0464DE080
                                                   000A75FC8
ttcpip()+1325        CALL???  opiodr()             000000030 000000002 0464DE080
                                                   000000000
opitsk()+2092        CALL???  ttcpip()             00AC15970 000000000 000000000
                                                   000000000
opiino()+1246        CALL???  opitsk()             00000001E 000000000 100000000
                                                   003EBCD8D
opiodr()+1644        CALL???  opiino()             00000003C 000000004 0464DFAB0
                                                   000000000
opidrv()+862         CALL???  opiodr()             00000003C 000000004 0464DFAB0
                                                   000000000
sou2o()+98           CALL???  opidrv()+150         00000003C 000000004 0464DFAB0
                                                   000000000
opimai_real()+158    CALL???  sou2o()              1DC76756A6A6157 000000000
                                                   000000000 0464DFBA4
opimai()+191         CALL???  opimai_real()        000000000 000000000 000000000
                                                   000000000
OracleThreadStart()  CALL???  opimai()             000000000 000000002 148D22CA0
+724                                               000000000
00007FFFFA008364     CALL???  OracleThreadStart()  0461DFF18 000000000 000000000
                                                   000000000
00007FFFFAE37091     CALL???  00007FFFFA008350     000000000 000000000 000000000
                                                   000000000
 

--------------------- Binary Stack Dump ---------------------

查询mos发现出现ORA-600 ksunfy : too few sessions错误官方给出来的解释是由于session超过了oracle限制的65536
session-65536


但是在win系统中由于系统级别对资源的限制,一般不会超过16k个进程,所以还没有到oracle的本身的session设置限制已经达到上限所以出现该错误

由于数据块scn大于数据库scn导致ORA-600 kcbzib_kcrsds_1错误

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:由于数据块scn大于数据库scn导致ORA-600 kcbzib_kcrsds_1错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

以前遇到ORA-600 kcbzib_kcrsds_1一般都是数据库强制打开的时候,这次在一个open的库中遇到该问题,做一个记录
在expdp过程中有一个表报ORA-01555错误

. . 导出了 "SRM_DEV"."AAAAAA"             3.256 MB    4636 行
ORA-31693: 表数据对象 "SRM_DEV"."XXXXXXX" 无法加载/卸载并且被跳过, 错误如下:
ORA-02354: 导出/导入数据时出错
ORA-01555: 快照过旧: 回退段号  (名称为 "") 过小

这种错误,根据以往经验,大部分情况是由于lob字段异常引起
查看表结构

SQL> desc "SRM_DEV"."XXXXXXX"
 名称                                      是否为空? 类型
 ----------------------------------------- -------- ----------------------------

 DELIVERY_ID                               NOT NULL NUMBER(19)
 ARRIVAL_CUSTOMER_ID                                NUMBER(19)
 ARRIVAL_LOCATION_TYPE                              VARCHAR2(255 CHAR)
 ARRIVAL_SUPPLIER_ID                                NUMBER(19)
 ARRIVAL_TIME                                       TIMESTAMP(6)
 ARRIVAL_USER                                       NUMBER(19)
 CREATE_BY                                          VARCHAR2(32 CHAR)
 CREATE_TIME                                        TIMESTAMP(6)
 DELIVERY_DATE                                      VARCHAR2(32 CHAR)
 DELIVERY_NO                                        VARCHAR2(32 CHAR)
 DELIVERY_TYPE                                      VARCHAR2(32 CHAR)
 INVOICE_TYPE                                       VARCHAR2(2)
 IS_CONFIRM                                         VARCHAR2(2)
 IS_ONEKEY_IO                                       VARCHAR2(2)
 IS_SUPPORTING                                      VARCHAR2(32 CHAR)
 MODI_BY                                            VARCHAR2(32 CHAR)
 MODI_TIME                                          TIMESTAMP(6)
 OP_ATTRIBUTE                                       VARCHAR2(1 CHAR)
 PRINT_TIMES                                        NUMBER(10)
 RECEIVE_COMPANY                                    VARCHAR2(64 CHAR)
 RECEIVE_LOCATION                                   VARCHAR2(64 CHAR)
 RECEIVE_SUPPLIER_ID                                NUMBER(19)
 REMARK                                             VARCHAR2(256 CHAR)
 STATUS                                             VARCHAR2(5 CHAR)
 SUPPLIER_CODE                                      VARCHAR2(64 CHAR)
 SUPPLIER_ID                                        NUMBER(19)
 OVERDUE_STATUS                                     VARCHAR2(255 CHAR)

确认没有lob字段而引起了该问题,进一步分析
对表进行全扫描看看报什么错误

SQL> select /*+full(t)*/ count(1) from "SRM_DEV"."XXXXXXX" t;
select /*+full(t)*/ count(1) from "SRM_DEV"."XXXXXXX" t
                    *
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [kcbzib_kcrsds_1], [], [], [], [], [], [],
[], [], [], [], []

根据以往经验,怀疑是由于这个表中的某个block的scn大于数据库scn导致
对全表扫描过程进行跟踪

C:\Users\Administrator>sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on 星期日 12月 21 22:18:46 2025
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


连接到:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> oradebug setmypid
已处理的语句
SQL> alter session set db_file_multiblocK_read_count=1;

会话已更改。

SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
已处理的语句
SQL> oradebug TRACEFILE_NAME
F:\APP\diag\rdbms\srmdev\srmdev\trace\srmdev_ora_14768.trc
SQL>
SQL>
SQL> select /*+full(t)*/ count(1) from "SRM_DEV"."XXXXXXX" t;
select /*+full(t)*/ count(1) from "SRM_DEV"."XXXXXXX" t
                    *
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [kcbzib_kcrsds_1], [], [], [], [], [], [],
[], [], [], [], []

分析trace文件确认对应的block为20548

WAIT #758663789368: nam='db file sequential read' ela= 40 file#=7 block#=20545 blocks=1 obj#=73756 tim=47881962193
WAIT #758663789368: nam='db file sequential read' ela= 41 file#=7 block#=20546 blocks=1 obj#=73756 tim=47881962259
WAIT #758663789368: nam='db file sequential read' ela= 40 file#=7 block#=20547 blocks=1 obj#=73756 tim=47881962325
WAIT #758663789368: nam='db file sequential read' ela= 41 file#=7 block#=20548 blocks=1 obj#=73756 tim=47881962389
Encountered exception while getting args for function:0x00007FF6F0BC39A0
2025-12-21T22:19:17.768815+08:00
Incident 1562659 created, dump file: F:\APP\diag\rdbms\srmdev\srmdev\incdir_1562659\srmdev_ora_14768_i1562659.trc
ORA-00600: 内部错误代码, 参数: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []

对异常block进行dump分析

PARSE #758663744160:c=0,e=97,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=0,tim=48020763236
Start dump data blocks tsn: 4 file#:7 minblk 20548 maxblk 20548
Block dump from cache:
Dump of buffer cache at level 3 for pdb=0 tsn=4 rdba=29380676
WAIT #758663744160: nam='db file sequential read' ela= 88 file#=7 block#=20548 blocks=1 obj#=73756 tim=48020763396
Block dump from disk:
buffer tsn: 4 rdba: 0x01c05044 (7/20548)
scn: 0x6a37b2cb8 seq: 0x02 flg: 0x06 tail: 0x2cb80602
frmt: 0x02 chkval: 0xe203 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x000000B0A274D000 to 0x000000B0A274F000
B0A274D000 0000A206 01C05044 A37B2CB8 06020006  [....DP...,{.....]
B0A274D010 0000E203 001B0001 0001201C A37B2CB4  [......... ...,{.]
B0A274D020 00068000 00321F02 01C05001 00160010  [......2..P......]
B0A274D030 0409F0A2 010001A8 000C0776 00002001  [........v.... ..]

通过这里可以确认当前block的scn为0x6a37b2cb8==>28512562360
查询当前数据库scn

SQL> select CURRENT_SCN  from v$database;

  CURRENT_SCN  
--------------
  28512556338

通过分析确认当前数据库的scn小于该block的scn,所以报ORA-600 kcbzib_kcrsds_1错误.解决该问题的方法有两种
1. 调整数据库是scn,让数据库的scn 大于该block scn即可
2. 通过把该block的scn修改小一些(目的就是数据库scn大于block scn)
通过处理之后实现表正常查询

SQL> alter system checkpoint;

系统已更改。

SQL> select checkpoint_change# from v$database;

  CHECKPOINT_CHANGE#
--------------------
         28512563830

SQL>  select /*+full(t)*/ count(1) from "SRM_DEV"."XXXXXXX" t;

            COUNT(1)
--------------------
               19560

对于这种情况除了ora-600 kcbzib_kcrsds_1的报错之外,还可能会遇到ora-600 kcbzibmlt_kcrsds_1的错误