一键恢复ORA-00704 ORA-00702故障—202512

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:一键恢复ORA-00704 ORA-00702故障—202512

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

今天又遇到一个客户遭遇ORA-00702错误
ORA-702


alert日志相关报错信息

Mon Dec 15 14:10:56 2025
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 5 processes
Started redo scan
Completed redo scan
 read 0 KB redo, 0 data blocks need recovery
Started redo application at
 Thread 1: logseq 654, block 2, scn 16937055
Recovery of Online Redo Log: Thread 1 Group 3 Seq 654 Reading mem 0
  Mem# 0: E:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO03.LOG
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 654, block 3, scn 16957057
 0 data blocks read, 0 data blocks written, 0 redo k-bytes read
Thread 1 advanced to log sequence 655 (thread open)
Thread 1 opened at log sequence 655
  Current log# 1 seq# 655 mem# 0: E:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon Dec 15 14:10:57 2025
SMON: enabling cache recovery
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_7952.trc:
ORA-00704: 引导程序进程失败
ORA-00702: 引导程序版本 '' 与版本 '8.0.0.0.0' 不一致
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_7952.trc:
ORA-00704: 引导程序进程失败
ORA-00702: 引导程序版本 '' 与版本 '8.0.0.0.0' 不一致
Error 704 happened during db open, shutting down database
USER (ospid: 7952): terminating the instance due to error 704
Instance terminated by USER, pid = 7952
ORA-1092 signalled during: ALTER DATABASE OPEN...
opiodr aborting process unknown ospid (7952) as a result of ORA-1092
Mon Dec 15 14:11:00 2025
ORA-1092 : opitsk aborting process
Mon Dec 15 14:39:18 2025

这种故障处理比较多,主要是bootstrap$基表损坏,后来为了解决这种问题方便,专门写了一个小工具(bootstrap$异常修复小工具)来一键修复该问题
path-ora-702


通过bootstrap$异常修复小工具修复之后,然后直接启动数据库成功,完成本次恢复任务

PostgreSQL查询一个表相关的所有oid

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:PostgreSQL查询一个表相关的所有oid

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在pg中查询一个表相关的所有oid(表OID、TOAST表OID、索引OID、TOAST表的索引OID、约束OID)

-- 替换这里的表名和模式名
\set search_path = public;
\set table_name '\'AO_6384AB_DISCOVERED\''
\set schema_name '\'public\''

WITH table_info AS (
    -- 基础表信息:表OID、TOAST表OID
    SELECT
        c.oid AS table_oid,
        c.relname AS table_name,
        c.reltoastrelid AS toast_oid,
        n.nspname AS schema_name
    FROM
        pg_catalog.pg_class c
    JOIN
        pg_catalog.pg_namespace n ON c.relnamespace = n.oid
    WHERE
        c.relname = :table_name
        AND n.nspname = :schema_name
        AND c.relkind = 'r'
),
index_info AS (
    -- 索引信息:索引OID
    SELECT
        t.table_oid,
        i.oid AS object_oid,
        i.relname AS object_name,
        'index' AS object_type
    FROM
        pg_catalog.pg_class i
    JOIN
        pg_catalog.pg_index idx ON i.oid = idx.indexrelid
    JOIN
        table_info t ON idx.indrelid = t.table_oid
    WHERE
        i.relkind = 'i'
),
toast_info AS (
    -- TOAST表信息:TOAST表OID
    SELECT
        t.table_oid,
        t.toast_oid AS object_oid,
        c.relname AS object_name,
        'toast_table' AS object_type
    FROM
        table_info t
    JOIN
        pg_catalog.pg_class c ON t.toast_oid = c.oid
    WHERE
        t.toast_oid <> 0
),
toast_index_info AS (
    -- TOAST索引信息:TOAST表的索引OID
    SELECT
        t.table_oid,
        i.oid AS object_oid,
        i.relname AS object_name,
        'toast_index' AS object_type
    FROM
        table_info t
    JOIN
        pg_catalog.pg_class c ON t.toast_oid = c.oid
    JOIN
        pg_catalog.pg_index idx ON c.oid = idx.indrelid
    JOIN
        pg_catalog.pg_class i ON idx.indexrelid = i.oid
    WHERE
        t.toast_oid <> 0
        AND i.relkind = 'i'
),
constraint_info AS (
    -- 约束信息:约束OID
    SELECT
        t.table_oid,
        con.oid AS object_oid,
        con.conname AS object_name,
        CASE con.contype
            WHEN 'p' THEN 'primary_key'
            WHEN 'u' THEN 'unique_constraint'
            WHEN 'f' THEN 'foreign_key'
            WHEN 'c' THEN 'check_constraint'
            ELSE 'other_constraint'
        END AS object_type
    FROM
        pg_catalog.pg_constraint con
    JOIN
        table_info t ON con.conrelid = t.table_oid
)
-- 合并所有结果,先输出表本身的信息,再输出其他关联对象
SELECT
    t.table_oid AS object_oid,
    t.table_name AS object_name,
    'table' AS object_type,
    t.schema_name AS schema_name
FROM
    table_info t
UNION ALL
SELECT
    object_oid,
    object_name,
    object_type,
    (SELECT schema_name FROM table_info) AS schema_name
FROM
    index_info
UNION ALL
SELECT
    object_oid,
    object_name,
    object_type,
    (SELECT schema_name FROM table_info) AS schema_name
FROM
    toast_info
UNION ALL
SELECT
    object_oid,
    object_name,
    object_type,
    (SELECT schema_name FROM table_info) AS schema_name
FROM
    toast_index_info
UNION ALL
SELECT
    object_oid,
    object_name,
    object_type,
    (SELECT schema_name FROM table_info) AS schema_name
FROM
    constraint_info
ORDER BY
    object_type;

PostgreSQL oid文件替换实现数据访问

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:PostgreSQL oid文件替换实现数据访问

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

对于postgresql数据库,我们都知道他的数据在独立的oid文件里面,而字典主要在pg_type/1247,pg_class/1259,pg_attribute/1260等对象中,而数据是存储在独立的oid和toast oid文件中.如果通过提供oid文件,可以实现狸猫换太子的恢复数据.这里做一个简单测试,大概思路是这样的库里面有一个表a,然后创建一个空表b,然后把a表相关的oid文件复制成b表的oid文件,然后验证数据是否可以正常查询
创建一个空的b表,结构和a表一致

[postgres@xifenfei 13676]$ psql
psql (12.8)
Type "help" for help.

postgres=# CREATE TABLE "public"."AO_6384AB_DISCOVERED1" (
postgres(#   "DATE" "pg_catalog"."timestamp",
postgres(#   "ID" "pg_catalog"."int4",
postgres(#   "KEY" "pg_catalog"."varchar" COLLATE "pg_catalog"."default",
postgres(#   "PLUGIN_KEY" "pg_catalog"."varchar" COLLATE "pg_catalog"."default",
postgres(#   "USER_KEY" "pg_catalog"."varchar" COLLATE "pg_catalog"."default"
postgres(# )
postgres-# ;
CREATE TABLE

检查a/b的情况

postgres=# select count(1) from "AO_6384AB_DISCOVERED";
 count 
-------
   544
(1 row)

postgres=# select count(1) from "AO_6384AB_DISCOVERED1";
 count 
-------
     0
(1 row)

postgres=# \d "AO_6384AB_DISCOVERED"
                    Table "public.AO_6384AB_DISCOVERED"
   Column   |            Type             | Collation | Nullable | Default 
------------+-----------------------------+-----------+----------+---------
 DATE       | timestamp without time zone |           |          | 
 ID         | integer                     |           |          | 
 KEY        | character varying           |           |          | 
 PLUGIN_KEY | character varying           |           |          | 
 USER_KEY   | character varying           |           |          | 

postgres=# \d "AO_6384AB_DISCOVERED1"
                   Table "public.AO_6384AB_DISCOVERED1"
   Column   |            Type             | Collation | Nullable | Default 
------------+-----------------------------+-----------+----------+---------
 DATE       | timestamp without time zone |           |          | 
 ID         | integer                     |           |          | 
 KEY        | character varying           |           |          | 
 PLUGIN_KEY | character varying           |           |          | 
 USER_KEY   | character varying           |           |          | 

存在数据的a/b表相关oid信息



postgres=# select oid,relname,relfilenode from pg_class where relname like '%AO_6384AB_DISCOVERED%';
  oid  |        relname        | relfilenode 
-------+-----------------------+-------------
 16516 | AO_6384AB_DISCOVERED  |       16516
 17069 | AO_6384AB_DISCOVERED1 |       17069
(2 rows)

postgres=# select oid,relname,relfilenode from pg_class where relname like '%16516%' or  relname like '%17069%'  ;
  oid  |       relname        | relfilenode 
-------+----------------------+-------------
 16519 | pg_toast_16516       |       16519
 16521 | pg_toast_16516_index |       16521
 17072 | pg_toast_17069       |       17072
 17074 | pg_toast_17069_index |       17074
(4 rows)

关闭数据库把a表相关的oid文件拷贝替换b表的oid文件

[postgres@xifenfei 13676]$ pg_ctl stop -D /pgdata
waiting for server to shut down....2025-12-12 20:00:58.804 HKT [21445] LOG:  received fast shutdown request
2025-12-12 20:00:58.805 HKT [21445] LOG:  aborting any active transactions
2025-12-12 20:00:58.805 HKT [21471] FATAL:  terminating connection due to administrator command
2025-12-12 20:00:58.805 HKT [21473] FATAL:  terminating connection due to administrator command
2025-12-12 20:00:58.805 HKT [21856] FATAL:  terminating connection due to administrator command
2025-12-12 20:00:58.806 HKT [21472] FATAL:  terminating connection due to administrator command
2025-12-12 20:00:58.806 HKT [21445] LOG: background worker"logical replication launcher"(PID 252)exited with exit code 1
2025-12-12 20:00:58.807 HKT [21447] LOG:  shutting down
2025-12-12 20:00:58.811 HKT [21445] LOG:  database system is shut down
 done
server stopped
[postgres@xifenfei 13676]$ cp  16516 17069
[postgres@xifenfei 13676]$ cp  16519 17072
[postgres@xifenfei 13676]$ cp  16521 17074
[postgres@xifenfei 13676]$ pg_ctl start -D /pgdata
waiting for server to start....2025-12-12 20:02:05.985 HKT [22241] LOG:  
starting PostgreSQL 12.8 on x86_64-pc-linux-gnu, 
compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-26.0.1), 64-bit
2025-12-12 20:02:05.985 HKT [22241] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2025-12-12 20:02:05.986 HKT [22241] LOG:  listening on IPv6 address "::", port 5432
2025-12-12 20:02:05.986 HKT [22241] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2025-12-12 20:02:05.993 HKT [22242] LOG:  database system was shut down at 2025-12-12 20:00:58 HKT
2025-12-12 20:02:05.995 HKT [22241] LOG:  database system is ready to accept connections
 done
server started

查询a/b表数据

[postgres@xifenfei 13676]$ psql
psql (12.8)
Type "help" for help.

postgres=# select count(1) from "AO_6384AB_DISCOVERED1";
 count 
-------
   544
(1 row)

postgres=# select count(1) from "AO_6384AB_DISCOVERED";
 count 
-------
   544
(1 row)

通过上述从测试,证明创建一个新的表结构,完全可以访问老的oid文件中的数据

模拟sql server故障备份完成恢复实现数据0丢失

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:模拟sql server故障备份完成恢复实现数据0丢失

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在sql server数据库中,使用备份还原,可以用来恢复的备份文件只能是全备、增量备份、事务日志备份,在很多情况下,sql server生产库异常,最后的事务日志没有来得及备份(但是ldf文件存在),这种情况下如何最大限度恢复数据实现数据0丢失,其实最重要的关键点就是对最后异常库的ldf文件的事务日志进行备份,在sql server里面给这个操作起了一个专业的词叫做:备份结尾日志(Tail of log).我这边通过一个实验来模拟整个操作过程.
1. 创建一个库,恢复模式为完整
s1


2.创建表并插入一条记录,然后进行全备
s2
s3

3. 再插入一条记录,并做事务日志备份
s4
s5

4. 再插入一条记录,并在脱机情况下删除掉数据库的mdf文件(模拟故障),在实际生产中如果原机器损坏,可以把在新机器上安装同版本sql,然后创建同名库,把ldf文件替换
s6
s8

5. 进行最后的ldf(事务日志的备份),在sql server中这个备份操作叫做“备份结尾日志(Tail of log)”,先尝试直接使用图形化工具进行备份
s9

6. 图形化这种情况无法直接备份,使用命令行形式成功备份
s10

这里备份命令带WITH NORECOVERY、WITH NO_TRUNCATE、WITH CONTINUE_AFTER_ERROR相关内容官方描述:

如果数据库处于联机状态并且您计划对数据库执行还原操作,则从备份日志结尾开始。 
要避免联机数据库出错,必须使用 BACKUP Transact-SQL 语句的 WITH NORECOVERY 选项。

如果数据库处于脱机状态而无法启动,则需要还原数据库,从备份日志结尾开始。 
由于此时不会发生任何事务,因此请使用 WITH NO_TRUNCATE 选项。 
NO_TRUNCATE 实际上与仅复制事务日志备份相同。 由于此时不会发生任何事务,因此可以选择使用 WITH NORECOVERY。

如果数据库损坏,则尝试使用 WITH CONTINUE_AFTER_ERROR 语句的 BACKUP 选项执行结尾日志备份。

在损坏的数据库上,仅当日志文件未受损、数据库处于支持结尾日志备份的状态并且数据库不包含任何大容量日志更改时,
日志尾部备份才会成功。 如果无法创建结尾日志备份,则最新日志备份后提交的任何事务都将丢失。

7. 进行数据库还原操作(这里先删除了故障库和对应ldf文件),也可以还原成其他库名和不同路径,也可以通过命令行进行还原(全备和除最后一个事务日志之前的备份需要使用WITH NORECOVERY)
s11
s12
s13


基于上述测试,完美的恢复故障之前的所有数据

sql server 事务日志备份异常恢复案例

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:sql server 事务日志备份异常恢复案例

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户的sql server数据库运行在双机环境中,由于心跳网络异常导致双机频繁切换最终数据库损坏DBCC检查报大量错误

DBCC CHECKDB('OLTP') WITH NO_INFOMSGS, ALL_ERRORMSGS

Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 28147935764938752 (type Unknown), page ID (1:33059984) contains an incorrect page ID in its page header. The PageId in the page header = (68:3276868).
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 11540680206712832 (type Unknown), page ID (1:33059985) contains an incorrect page ID in its page header. The PageId in the page header = (102:6488116).
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 16888988233302016 (type Unknown), page ID (1:33059986) contains an incorrect page ID in its page header. The PageId in the page header = (93:6619252).
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 16888988233302016 (type Unknown), page ID (1:33059987) contains an incorrect page ID in its page header. The PageId in the page header = (93:6619252).
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 16888988233302016 (type Unknown), page ID (1:33059988) contains an incorrect page ID in its page header. The PageId in the page header = (93:6619252).
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 28147836977938432 (type Unknown), page ID (1:33059989) contains an incorrect page ID in its page header. The PageId in the page header = (73:6619248).
Msg 8909, Level 16, State 1, Line 1
……………………
Object ID 1961110077, index ID 0, partition ID 72057594217627648, alloc unit ID 72057594256687104 (type In-row data): Page (1:36535484) could not be processed.  See other errors for details.
Msg 8928, Level 16, State 1, Line 1
Object ID 1961110077, index ID 0, partition ID 72057594217627648, alloc unit ID 72057594256687104 (type In-row data): Page (1:36535485) could not be processed.  See other errors for details.
Msg 8928, Level 16, State 1, Line 1
Object ID 1961110077, index ID 0, partition ID 72057594217627648, alloc unit ID 72057594256687104 (type In-row data): Page (1:36535486) could not be processed.  See other errors for details.
Msg 8928, Level 16, State 1, Line 1
Object ID 1961110077, index ID 0, partition ID 72057594217627648, alloc unit ID 72057594256687104 (type In-row data): Page (1:36535487) could not be processed.  See other errors for details.
CHECKDB found 0 allocation errors and 24 consistency errors in table 'CIOMessage' (object ID 1961110077).
CHECKDB found 0 allocation errors and 17955 consistency errors in database 'OLTP'.

Completion time: 2025-11-19T17:13:03.2762122+08:00

客户每天做全库备份,每4小时做事务日志备份,备份类似这样的情况
sql0


客户尝试使用全备进行恢复,结果发现只有13日的全备是好的,可以还原出来数据库,其他备份还原直接报错,基于这样的情况,可以希望把数据恢复到11月19日.我接手这个故障之后,先尝试还原13日的备份
sql6

然后尝试人工应用事务日志备份,类似命令

RESTORE LOG OLTP1121 FROM DISK = 'D:\share\OLTP_backup_2025_11_13_030001_7745248.trn' WITH NORECOVERY
RESTORE LOG OLTP1121 FROM DISK = 'D:\share\OLTP_backup_2025_11_13_060001_3581210.trn' WITH NORECOVERY
RESTORE LOG OLTP1121 FROM DISK = 'D:\share\OLTP_backup_2025_11_13_090001_2856408.trn' WITH NORECOVERY
RESTORE LOG OLTP1121 FROM DISK = 'D:\share\OLTP_backup_2025_11_13_120002_0713663.trn' WITH NORECOVERY
RESTORE LOG OLTP1121 FROM DISK = 'D:\share\OLTP_backup_2025_11_13_150001_7305524.trn' WITH NORECOVERY
RESTORE LOG OLTP1121 FROM DISK = 'D:\share\OLTP_backup_2025_11_13_180000_9123036.trn' WITH NORECOVERY
RESTORE LOG OLTP1121 FROM DISK = 'D:\share\OLTP_backup_2025_11_13_210001_3663138.trn' WITH NORECOVERY
RESTORE LOG OLTP1121 FROM DISK = 'D:\share\OLTP_backup_2025_11_14_000001_1605695.trn' WITH NORECOVERY
RESTORE LOG OLTP1121 FROM DISK = 'D:\share\OLTP_backup_2025_11_14_030001_7280782.trn' WITH NORECOVERY
………………
RESTORE LOG OLTP1121 FROM DISK = 'D:\share\OLTP_backup_2025_11_17_180001_1343952.trn' WITH NORECOVERY

结果在OLTP_backup_2025_11_17_180001_1343952文件位置报错

Processed 0 pages for database 'OLTP_1121', file 'OLTP' on file 1.
Processed 10388 pages for database 'OLTP1121', file 'OLTP_log' on file 1.
Msg 9004, Level 16, State 3, Line 1
An error occurred while processing the log for database 'OLTP_1121'.  If possible, restore from backup. 
If a backup is not available, it might be necessary to rebuild the log.
Msg 3013, Level 16, State 1, Line 1
RESTORE LOG is terminating abnormally.

Completion time: 2025-11-21T13:41:54.2352031+08:00

通过图形化界面进行事务日志恢复也报错
sql7


基于这样的情况,数据库层面的正常恢复途径只能恢复到11月17日18时左右数据,因为后面的日志发生了损坏,无法继续正常恢复,对于这种情况,我们这边使用日志解析工具对剩余事务日志备份进行解析,生成.sql文件
sql3
sql1

然后客户把解析出来的.sql文件依次在会到11月17日18时的库上面去执行,这样顺利吧客户整体数据库恢复到最新状态,完成本次恢复任务(注意后续可能一些类似序列值需要调整)

win平台挂起Oracle数据库启动进程

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:win平台挂起Oracle数据库启动进程

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

对于oracle的有些故障,如果数据库完全open会报错,导致启动失败,这个时候,我们可以尝试在数据库open到一半的过程中把启动过程挂起,然后登录数据库对字典进行修复,然后就可以正常启动数据库。非win平台可以通过gdb挂起数据库启动过程,因为win系统是进程模式,无法通过gdb挂起来实现,通过自己写程序修改oracle内存来实现这个功能

Oracle Recovery Tools工具中,点击修改内存—>检索Oracle数据库进程—>选择需要操作的数据库进程—>挂起Oracle进程 就可以实现oracle数据库在open的过程中挂住
QQ20251208-180202


然后其他会话登录数据库,进行需要的操作,再重新点击取消挂起数据库可以继续open成功
QQ20251208-174237

linux异常磁盘lvm恢复操作演示

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:linux异常磁盘lvm恢复操作演示

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户exsi系统被勒索病毒加密,拷贝出来磁盘文件,通过工具分析,磁盘文件均有部分被破坏,恢复工具无法自动识别
QQ20251206-001609


无法扫描到任何分区信息
QQ20251206-001942

和客户沟通确认三快盘采用的是lvm方式管理,尝试检索lvm信息
lvm1
lvm2

检索lv信息
lvm3

选择合适的lv,进行读取
lvm4

人工选择合适的磁盘作为pv
lvm5

直接查看lv中数据
lvm6

open数据库报ora-600 kdsgrp1故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:open数据库报ora-600 kdsgrp1故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户服务器异常关机之后,数据库无法正常启动,查看alert日志发现如下报ORA-600 kdsgrp1报错信息导致数据库无法正常open

2025-12-05T16:17:23.315325+08:00
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
stopping change tracking
Undo initialization recovery: err:0 start: 7620046 end: 7620062 diff: 16 ms (0.0 seconds)
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\his\his\trace\his_ora_9672.trc  (incident=1243378):
ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\his\his\incident\incdir_1243378\his_ora_9672_i1243378.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2025-12-05T16:17:25.341421+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2025-12-05T16:17:25.533576+08:00
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\his\his\trace\his_ora_9672.trc  (incident=1243379):
ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\his\his\incident\incdir_1243379\his_ora_9672_i1243379.trc
2025-12-05T16:17:27.007101+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2025-12-05T16:17:27.008097+08:00
Undo initialization online undo segments: err:600 start: 7620062 end: 7623343 diff: 3281 ms (3.3 seconds)
2025-12-05T16:17:27.008097+08:00
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\his\his\trace\his_ora_9672.trc:
ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
2025-12-05T16:17:27.008097+08:00
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\his\his\trace\his_ora_9672.trc:
ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\his\his\trace\his_ora_9672.trc  (incident=1243380):
ORA-00603: ORACLE 服务器会话因致命错误而终止
ORA-01092: ORACLE 实例终止。强制断开连接
ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\his\his\incident\incdir_1243380\his_ora_9672_i1243380.trc
2025-12-05T16:17:28.540363+08:00
opiodr aborting process unknown ospid (9672) as a result of ORA-603

尝试人工启动数据库,报错比较明显也是ORA-600 kdsgrp1错误

C:\Users\Administrator>sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Fri Dec 5 16:45:48 2025
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> recover datafile 1;
Media recovery complete.
SQL> recover database;
Media recovery complete.
SQL> alter database open ;
alter database open
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [],
[], [], [], [], []
Process ID: 10800
Session ID: 5025 Serial number: 34108

查看报错trace文件

*** CLIENT DRIVER:(SQL*PLUS) 2025-12-05T16:17:23.924647+08:00
 
[TOC00000]
Jump to table of contents
Dump continued from file: D:\APP\ADMINISTRATOR\diag\rdbms\his\his\trace\his_ora_9672.trc
[TOC00001]
ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []

[TOC00001-END]
[TOC00002]
========= Dump for incident 1243378 (ORA 600 [kdsgrp1]) ========
[TOC00003]
----- Beginning of Customized Incident Dump(s) -----
kdsDumpState: cdb: 0 dspdb: 0 type: 1 info: 0x5 flag: 0x0
kdsDumpState: sample type: 0 layer: 0
* kdsgrp1-1: ***********************************************
            row 0x00c02754.24 continuation at: 0x00c02754.24 file# 3 block# 10068 slot 36 not found (dscnt: 0)
 state kdscurrid points to 0x00c28206.17
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 36 ..... nrows: 60
Dumping kcb descriptor:
kcbds 0x1e08fe71510: pdb 0, tsn 1, rdba 0x00c02754, afn 3, objd 11331, cls 1, tidflg 0x8 0x80 0x0

在这里基本上可以确认报ORA-600 错误的是file 3 block 10068,dataobj 为11331,根据经验初步怀疑是sysaux中的wrh$_某个对象异常.对启动过程做10046进一步确认

EXEC #2063975471112:c=17341,e=35163,p=54,cr=283,cu=0,mis=1,r=0,dep=1,og=4,plh=794648223,tim=9717774784
WAIT #2063975471112: nam='db file sequential read' ela= 115 file#=3 block#=11434 blocks=1 obj#=11579 tim=9717774961
WAIT #2063975471112: nam='db file sequential read' ela= 93 file#=3 block#=11435 blocks=1 obj#=11579 tim=9717775098
WAIT #2063975471112: nam='db file sequential read' ela= 89 file#=3 block#=11436 blocks=1 obj#=11579 tim=9717775217
WAIT #2063975471112: nam='db file sequential read' ela= 98 file#=3 block#=11437 blocks=1 obj#=11579 tim=9717775361
WAIT #2063975471112: nam='db file sequential read' ela= 23087 file#=3 block#=11438 blocks=1 obj#=11579 tim=9717798471
WAIT #2063975471112: nam='db file sequential read' ela= 83 file#=3 block#=11439 blocks=1 obj#=11579 tim=9717798681
WAIT #2063975471112: nam='db file sequential read' ela= 183 file#=3 block#=10075 blocks=1 obj#=11332 tim=9717799100
WAIT #2063975471112: nam='db file sequential read' ela= 124 file#=3 block#=10078 blocks=1 obj#=11332 tim=9717799291
WAIT #2063975471112: nam='db file sequential read' ela= 115 file#=3 block#=165702 blocks=1 obj#=11332 tim=9717799460
WAIT #2063975471112: nam='db file sequential read' ela= 109 file#=3 block#=83355 blocks=1 obj#=11332 tim=9717799621
WAIT #2063975471112: nam='db file sequential read' ela= 139 file#=3 block#=165703 blocks=1 obj#=11332 tim=9717799816
WAIT #2063975471112: nam='db file sequential read' ela= 139 file#=3 block#=10079 blocks=1 obj#=11332 tim=9717800074
WAIT #2063975471112: nam='db file sequential read' ela= 112 file#=3 block#=165696 blocks=1 obj#=11332 tim=9717800248
WAIT #2063975471112: nam='db file sequential read' ela= 110 file#=3 block#=165701 blocks=1 obj#=11332 tim=9717800502
WAIT #2063975471112: nam='db file sequential read' ela= 107 file#=3 block#=83354 blocks=1 obj#=11332 tim=9717800665
WAIT #2063975471112: nam='db file sequential read' ela= 109 file#=3 block#=83356 blocks=1 obj#=11332 tim=9717800823
WAIT #2063975471112: nam='db file sequential read' ela= 107 file#=3 block#=83358 blocks=1 obj#=11332 tim=9717800978
WAIT #2063975471112: nam='db file sequential read' ela= 106 file#=3 block#=165698 blocks=1 obj#=11332 tim=9717801133
WAIT #2063975471112: nam='db file sequential read' ela= 106 file#=3 block#=164358 blocks=1 obj#=11331 tim=9717801298
WAIT #2063975471112: nam='db file sequential read' ela= 108 file#=3 block#=10068 blocks=1 obj#=11331 tim=9717801509
2025-12-05T16:52:21.382064+08:00
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []

FETCH #2063975471112:c=1599462,e=1655487,p=20,cr=26,cu=0,mis=0,r=0,dep=1,og=4,plh=794648223,tim=9719430286
STAT #2063975471112 id=1 cnt=0 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=0 pr=0 pw=0 str=1 time=8 us)'
STAT #2063975471112 id=2 cnt=0 pid=1 pos=1 obj=0 op='HASH JOIN  (cr=0 pr=0 pw=0 str=1 time=5 us cost=9 size=34 card=1)'
STAT #2063975471112 id=3 cnt=1 pid=2 pos=1 obj=11579 op='TABLE ACCESS FULL WRM$_SNAPSHOT
 (cr=12 pr=6 pw=0 str=1 time=23966 us cost=4 size=17 card=1)'
STAT #2063975471112 id=4 cnt=1 pid=3 pos=1 obj=0 op='SORT AGGREGATE (cr=6 pr=3 pw=0 str=1 time=23513 us)'
STAT #2063975471112 id=5 cnt=1 pid=4 pos=1 obj=11579 op='TABLE ACCESS FULL WRM$_SNAPSHOT
(cr=6 pr=3 pw=0 str=1 time=23497 us cost=4 size=13 card=1)'
STAT #2063975471112 id=6 cnt=6 pid=2 pos=2 obj=11331 op='TABLE ACCESS BY INDEX ROWID BATCHED WRH$_UNDOSTAT
(cr=13 pr=13 pw=0 str=1 time=2455 us cost=4 size=102 card=6)'
STAT #2063975471112 id=7 cnt=7 pid=6 pos=1 obj=11332 op='INDEX SKIP SCAN WRH$_UNDOSTAT_PK
(cr=12 pr=12 pw=0 str=1 time=2304 us cost=2 size=0 card=6)'
2025-12-05T16:52:23.005928+08:00
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []

<error barrier> at 0x000000275BED14D0 placed dbsdrv.c@4959
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
<error barrier> at 0x000000275BED14D0 placed dbsdrv.c@4959
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []

到这里基本上可以确认是WRH$_UNDOSTAT_PK去找WRH$_UNDOSTAT中午对应记录,从而出现该问题,对应的具体sql为

PARSING IN CURSOR #2063975471112 len=233 dep=1 uid=0 oct=3 lid=0 tim=9717739563
  hv=517686153 ad='1e1efda1f20' sqlid='d4m7ss0gdqhw9'
select max(maxconcurrency) from sys.wrh$_undostat where instance_number = :1 and dbid = :2 and snap_id in  
(select snap_id from dba_hist_snapshot where end_interval_time >(select max(end_interval_time)-7 from dba_hist_snapshot))

BINDS #2063975471112:

 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0000 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=8e8646e8  bln=22  avl=02  flg=05
  value=1
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0000 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=8e864700  bln=22  avl=06  flg=01
  value=971429521

对于这种问题,处理起来相对比较简单,绕过oracle open过程对该sql的执行,然后open库对该表的index进行rebuild

SQL> recover database;
Media recovery complete.
SQL> alter database open;

Database altered.

SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
  3   WHERE FILE_ID = &FILE_ID
  4     AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
Enter value for file_id: 3
old   3:  WHERE FILE_ID = &FILE_ID
new   3:  WHERE FILE_ID = 3
Enter value for block_id: 10068
old   4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
new   4:    AND 10068 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1

OWNER
--------------------------------------------------------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE       TABLESPACE_NAME
------------------ ------------------------------
PARTITION_NAME
--------------------------------------------------------------------------------
SYS
WRH$_UNDOSTAT
TABLE              SYSAUX



SQL> select index_name from dba_indexes where table_name='WRH$_UNDOSTAT';

INDEX_NAME
--------------------------------------------------------------------------------
WRH$_UNDOSTAT_PK

SQL> alter index WRH$_UNDOSTAT_PK REBUILD ONLINE;

Index altered.

至此该数据库完成恢复任务,重启之后一切正常.

expdp dmp 导出不完整导入ORA-39059 ORA-39246 故障抢救数据

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:expdp dmp 导出不完整导入ORA-39059 ORA-39246 故障抢救数据

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户一套nc系统,由于安装时候把库建在了比较小的分区上,运行一些时间之后,出现空间不足,现场技术人员对oracle不太熟悉,经过一系列操作(删除业务表空间,复制pdb,创建表空间等等操作),无法恢复数据库,准备使用备份的dmp进行还原,结果分析发现仅保留的最后一份dmp,是一份导出不完全的dmp文件,无法正常导入(以前处理过一个类似case:ORA-39773: parse of metadata stream failed故障处理,尝试导入报ORA-39246错:

C:\Users\XFF>impdp system/oracle@127.0.0.1/orapdb directory=expdp_dir dumpfile=xxxxx_2025-12-01_0230.dmp logfile=1.log

Import: Release 19.0.0.0.0 - Production on 星期三 12月 3 21:00:19 2025
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

连接到: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
ORA-39002: 操作无效
ORA-39059: 转储文件集不完整
ORA-39246: 无法在提供的转储文件中定位主表

分析当时当初的dmp日志,由于expdp的job表所在表空间不足导致expdp导出失败
dmp1


TABLE:"XIFENFEI"."EOM_MEASURE_POINT"
ORA-30032: 挂起的 (可恢复) 语句已超时
ORA-01691: Lob 段 XIFENFEI.SYS_LOB0000161267C00111$$ 无法通过 32 (在表空间 NNC_DATA01 中) 扩展
ORA-06512: 在 "SYS.DBMS_SYS_ERROR", line 105
ORA-06512: 在 "SYS.KUPW$WORKER", line 12620
ORA-06512: 在 "SYS.DBMS_SYS_ERROR", line 105
ORA-06512: 在 "SYS.KUPW$WORKER", line 11414
----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
0xda5dae50     33476  package body SYS.KUPW$WORKER.WRITE_ERROR_INFORMATION
0xda5dae50     12641  package body SYS.KUPW$WORKER.DETERMINE_FATAL_ERROR
0xda5dae50     11602  package body SYS.KUPW$WORKER.CREATE_OBJECT_ROWS
0xda5dae50     15268  package body SYS.KUPW$WORKER.FETCH_XML_OBJECTS
0xda5dae50      3907  package body SYS.KUPW$WORKER.UNLOAD_METADATA
0xda5dae50     13736  package body SYS.KUPW$WORKER.DISPATCH_WORK_ITEMS
0xda5dae50      2429  package body SYS.KUPW$WORKER.MAIN
0x6524a4f0         2  anonymous block
KUPW: Object row index into parse items is: 1
KUPW: Parse item count is: 19
KUPW: In function CHECK_FOR_REMAP_NETWORK
KUPW: Nothing to remap
KUPW: In procedure BUILD_OBJECT_STRINGS - non-base info
KUPW: In procedure BUILD_SUBNAME_LIST with TABLE:XIFENFEI.EOM_MEASURE_POINT
KUPW: In function NEXT_PO_NUMBER
KUPW: PO number assigned: 34198
FORALL
KUPW: In procedure DETERMINE_FATAL_ERROR with ORA-30032: 挂起的 (可恢复) 语句已超时
ORA-01691: Lob 段 XIFENFEI.SYS_LOB0000161267C00111$$ 无法通过 32 (在表空间 NNC_DATA01 中) 扩展
作业 "XIFENFEI"."SYS_EXPORT_SCHEMA_01" 因致命错误于 星期一 12月 1 06:33:21 2025 elapsed 0 04:03:18 停止

从导出日志看,在导出大量”0 KB 0 行”记录之后提示表空间不足,expdp的job表无法扩展导致导出挂起然后超时导出终止(这个导出操作没有完全完成),从而在导入的时候出现了ORA-39059: 转储文件集不完整 ORA-39246: 无法在提供的转储文件中定位主表 的错误.对于这种故障,分析导出日志,发现运气不错,所有有数据的表都导出完成,基于这个心中就有了第一层底气,所有表数据不会丢失(因为都导出到了这个dmp中),但是非表的字典数据不完整,要想业务完整跑起来,需要找到一个完整的业务字典信息.对于大量的备份dmp被删除,然后对应分区还写入了很多数据,只能尝试看运气,通过对磁盘文件镜像,然后进行反删除恢复,找出来一个11月26日的dmp的压缩文件是完整的
good-dmp


通过这个dmp导入业务字典信息,然后再利用expdp dmp解析工具(expdp dmp被加密破坏恢复)把所有表数据出来,经过这两者组合,顺利完成数恢复,可以测试业务完全正常