Bug 21915719 Database hang or may fail to OPEN in 12c IBM AIX or HPUX Itanium – ORA-742, DEADLOCK or ORA-600 [kcrfrgv_nextlwn_scn] ORA-600 [krr_process_read_error_2]

Applies to:
Oracle Database – Enterprise Edition – Version 12.1.0.1 to 12.1.0.2 [Release 12.1]
Oracle Database Cloud Schema Service – Version N/A and later
Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) – Version N/A and later
Oracle Cloud Infrastructure – Database Service – Version N/A and later
Oracle Database Cloud Exadata Service – Version N/A and later
IBM AIX on POWER Systems (64-bit)

Description
Oracle 12c introduces a new default feature of using multiple LGWRs which may lead to DEADLOCK / Database Hang or ORA-742 “Log read detects lost write” or ORA-600 [kcrfrgv_nextlwn_scn] during instance OPEN or ORA-600 [krr_process_read_error_2] during Recovery on IBM AIX and potentially on HPUX Itanium 64bit.
The database may become unusable and fail to be OPEN.

Occurrence
This issue is specific to RDBMS version 12c (12.1.0.1 or 12.1.0.2) where the new default feature of using multiple LGWRs is introduced.
It affects databases on IBM AIX and potentially on HPUX Itanium 64bit

Symptoms
ORACLE on IBM AIX or HPUX Itanium 64bit with RDBMS Version 12c.

DEADLOCK or ORA-742 “Log read detects lost write” or ORA-600 [kcrfrgv_nextlwn_scn] during instance OPEN or ORA-600 [krr_process_read_error_2] during Recovery caused by bug 21915719.

PMON may terminate the instance while extensive block recovery is being performed.

A DEADLOCK example is with LG0[n] waiting on ‘LGWR worker group ordering’. Example from a System State Dump trace file:

PROCESS 18: LG01
SO: 0x7000101f95ad720, type: 4, owner: 0x7000101f84195f8, flag: INIT/-/-/0x00
if: 0x3 c: 0x3
   proc=0x7000101f84195f8, name=session, file=ksu.h LINE:13590 ID:, pg=0
conuid=0
  (session) sid: 865 ser: 1 trans: 0x0, creator: 0x7000101f84195f
 Current Wait Stack:
   0: waiting for 'LGWR worker group ordering'
      lwn_id=0x58, phase=0x1, =0x0
      wait_id=4947 seq_num=4948 snap_id=1
      wait times: snap=13 min 21 sec, exc=13 min 21 sec, total=13 min 21 sec
      wait times: max=infinite, heur=13 min 21 sec
      wait counts: calls=1 os=267
      in_wait=1 iflags=0x5a0
  There is at least one session blocking this session.
    Dumping 1 direct blocker(s):
      inst: 1, sid: 817, ser: 1
    Dumping final blocker:
      inst: 1, sid: 817, ser: 1
  There are 730 sessions blocked by this session.
.
.
PROCESS 17: LG00
SO: 0x7000101f85bcc60, type: 4, owner: 0x7000101f93eeb20, flag: INIT/-/-/0x00
if: 0x3 c: 0x3
   proc=0x7000101f93eeb20, name=session, file=ksu.h LINE:13590 ID:, pg=0
conuid=0
  (session) sid: 817 ser: 1 trans: 0x0, creator: 0x7000101f93eeb20
  ksuxds FALSE at location: 0
  service name: SYS$BACKGROUND
  Current Wait Stack:
   0: waiting for 'LGWR worker group ordering'
      lwn_id=0x56, phase=0x1, =0x0
      wait_id=1630680 seq_num=57841 snap_id=1
      wait times: snap=13 min 21 sec, exc=13 min 21 sec, total=13 min 21 sec
      wait times: max=infinite, heur=13 min 21 sec
      wait counts: calls=2 os=268
      in_wait=1 iflags=0x15a0
  There is at least one session blocking this session.
    Dumping 1 direct blocker(s):
      inst: 1, sid: 865, ser: 1
    Dumping final blocker:
      inst: 1, sid: 865, ser: 1
 

The instance may fail to OPEN with errors ORA-600 [kcrfrgv_nextlwn_scn] and/or ORA-600 [krr_process_read_error_2]:

Recovery Session Failed with:

ORA-00283: recovery session canceled due to errors
ORA-00600: internal error code, arguments: [krr_process_read_error_2],


Alter database open fails with:

ORA-00600: internal error code, arguments: [kcrfrgv_nextlwn_scn] .....
ORA-600 signalled during: ALTER DATABASE OPEN...

Workaround
Disable the new feature of multiple LGWR worker processes by proactively setting _use_single_log_writer=true.

Setting _use_single_log_writer = true is a safe workaround; it is the behavior before 12c where multiple LGWR worker groups were not available.

ALTER SYSTEM SET "_use_single_log_writer"=TRUE SID='*' SCOPE=SPFILE;
-- Restart the database or all instances of the RAC database

Note that while _use_single_log_writer=true is not set, then error ORA-600 [kcrfrgv_nextlwn_scn] might be produced avoiding the database to OPEN. Once the problem is introduced, _use_single_log_writer=true may not fix it. _use_single_log_writer = true prevents inconsistencies in the redo log to be introduced which causes that error.
If the parameter does not help, because the problem was already introduced when _use_single_log_writer=true had not been proactively set, then Point in Time Recovery (PITR) or Flashback Database are the options to recover from this situation.
参考:ALERT: Bug 21915719 Database hang or may fail to OPEN in 12c IBM AIX or HPUX Itanium – ORA-742, DEADLOCK or ORA-600 [kcrfrgv_nextlwn_scn] ORA-600 [krr_process_read_error_2] (Doc ID 1957710.1)

Oracle 暂定和恢复功能

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle 暂定和恢复功能

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

以前一直没有注意到oracle有暂定和恢复功能(SUSPEND/RESUME)[从oracle 8i开始有的特性],一下是官方描述:

The Database Suspend/Resume feature provides a mechanism by which all disk I/O 
(datafile, controlfile and file header I/Os) in a database (in all instances) 
can be suspended making it easier to make a copy of the database.  When an 
ALTER SYSTEM SUSPEND command is issued, it will wait for any ongoing instance 
recovery to complete and then set a flag in all running instances to stop all 
new lock and I/O activity.  The command may return before the last I/O is 
issued because the check for the flag might have been before the suspend and 
the I/O might have been issued after the suspend.  So, reads, typically are not
allowed when the database is suspended but may still be active for a period of 
time.  However, this command does ensure that no new I/Os will be issued.  

Once all instances of a database are suspended, a copy of the database can be 
made by making a copy of all the files (i.e. the control file, online logs and 
all data files).  The copy can have uncommitted updates and therefore the only 
way a copy of the database can be used in this scenerio is to do an instance 
recovery and then open it.

The database can be suspended or resumed through an ALTER SYSTEM call.  You can
issue this statement as the user SYSTEM or SYS (the user must have DBA 
privileges).   

The syntax for these two commands is as follows:

    ALTER SYSTEM <options>;

    <options> = SUSPEND | RESUME | <existing options>

The database will remain in the suspended state until the ALTER SYSTEM RESUME 
command is issued.  The database will remain suspended even if the process 
issuing the ALTER SYSTEM SUSPEND command dies or exists.  However, if all 
instances are shutdown and started again, the database is no longer in a 
suspended state.  

The ALTER SYSTEM RESUME command has the effect of blocking the I/O since the 
SUSPEND command.  When the RESUME command is issued, it might cause a burst in 
the I/O, which may take a while to even out.  A message is written to the alert
log everytime the database is suspended or resumed, as shown by the example 
below:

    Mon Nov 29 11:32:22 1999
    Completed: alter database open
    Wed Dec  1 12:56:53 1999
    Starting ORACLE instance (normal)
    Wed Dec  1 22:03:50 1999
    Suspending database following alter system suspend command.
    Wed Dec  1 22:06:14 1999
    Resuming database following alter system resume command.
    Wed Dec  1 22:07:08 1999


The following is an example of using the SUSPEND and RESUME feature:

    SVRMGR> connect system/manager
    Connected.
    SVRMGR> alter system suspend;
    Statement processed.
    SVRMGR> select * from user_source;
    ^X^Cselect * from user_source   -----  (at this stage the statement will 
                                            just hang.  A Ctrl-X Ctrl-C was 
                                            issued to kill the statement)
                  *
    ORA-00604: error occurred at recursive SQL level 1
    ORA-01013: user requested cancel of current operation
    SVRMGR>
    SVRMGR> alter system resume;
    Statement processed.


Considerations and Restrictions:
--------------------------------
- The files in the copy database can not be used as backups of the original 
  database for media recovery.  (If the direct path option is in use at the 
  time, there may be corrupted blocks).

- A new instance cannot be started during the SUSPEND state of the database.  
  If one is started, it will not be included in the SUSPEND process and thus no 
  I/O suspension guarantees are provided in this case.

- Creation of backups or archived logs will not be affected by the 
  ALTER SYSTEM SUSPEND command.

- The two different commands can  be issued from two different instances or 
  processes.

- If the SUSPEND command during execution may fail for some reason yet 
  result in some of the instances being suspended, the command can be issued 
  again since the instances in suspend status will ignore the command.

- Also database queries will hang when the database is in suspend mode

按照描述SUSPEND 操作会挂起所有io,只要涉及到io操作就会挂起,如果操作的所有请求都可以在内存中完成(buffer cache/shared pool等),那这样的操作是可以直接完成的.

C:\Users\XFF>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Tue Jan 14 21:51:53 2025

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> alter system suspend;

System altered.

SQL> select database_status from v$instance;

DATABASE_STATUS
-----------------
SUSPENDED

SQL> create table t1 as select * from dba_users;
create table t1 as select * from dba_users
             *
ERROR at line 1:
ORA-00955: name is already used by an existing object


SQL> create table t_xff as select * from dba_users;
^C
C:\Users\XFF>

C:\Users\XFF>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Tue Jan 14 21:53:19 2025

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> alter system resume;

System altered.

SQL> select database_status from v$instance;

DATABASE_STATUS
-----------------
ACTIVE

SQL> create table t_xff as select * from dba_users;

Table created.


SQL>  alter system suspend;

System altered.

SQL> select count(1) from user$;

  COUNT(1)
----------
        94

SQL> select count(1) from t_xff;
^C
C:\Users\XFF>

在某些情况下,可以通过这类操作来挂起数据库,做一些特殊的操作.

迁移awr快照数据到自定义表空间

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:迁移awr快照数据到自定义表空间

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在19c中有些情况,考虑把awr的快照数据存储在非sysaux表空间,可以通过DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS来进行设置

sys@ORA19C 21:57:02> select BANNER_FULL from v$version;

BANNER_FULL
----------------------------------------------------------------------------------------------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.24.0.0.0


Elapsed: 00:00:00.01

PROCEDURE MODIFY_SNAPSHOT_SETTINGS
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 RETENTION                      NUMBER                  IN     DEFAULT
 INTERVAL                       NUMBER                  IN     DEFAULT
 TOPNSQL                        NUMBER                  IN     DEFAULT
 DBID                           NUMBER                  IN     DEFAULT
 TABLESPACE_NAME                VARCHAR2                IN     DEFAULT
PROCEDURE MODIFY_SNAPSHOT_SETTINGS
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 RETENTION                      NUMBER                  IN     DEFAULT
 INTERVAL                       NUMBER                  IN     DEFAULT
 TOPNSQL                        VARCHAR2                IN
 DBID                           NUMBER                  IN     DEFAULT
 TABLESPACE_NAME                VARCHAR2                IN     DEFAULT

这两个proc,主要是TOPNSQL一个是number类型,一个是varchar2类型
If NUMBER: Top N SQL size. The number of Top SQL to flush for each SQL criteria (Elapsed Time, CPU Time, Parse Calls, Shareable Memory, Version Count). The value for this setting will not be affected by the statistics/flush level and will override the system default behavior for the AWR SQL collection. The setting will have a minimum value of 30 and a maximum value of 50,000. Specifying NULL will keep the current setting.
If VARCHAR2: Users are allowed to specify the following values: (DEFAULT, MAXIMUM, N), where N is the number of Top SQL to flush for each SQL criteria. Specifying DEFAULT will revert the system back to the default behavior of Top 30 for statistics level TYPICAL and Top 100 for statistics level ALL. Specifying MAXIMUM will cause the system to capture the complete set of SQL in the cursor cache. Specifying the number N is equivalent to setting the Top N SQL with the NUMBER type. Specifying NULL for this argument will keep the current setting.
进行了简单的测试,确认是部分awr的分区表设置到新表空间中

sys@ORA19C 21:41:51> CREATE TABLESPACE AWRTBS DATAFILE '/data/oradata/ORA19C/awrtbs01.dbf' size 128M autoextend on;

Tablespace created.

Elapsed: 00:00:00.53
sys@ORA19C 21:42:21> exec dbms_workload_repository.modify_snapshot_settings(tablespace_name=> 'AWRTBS');

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.53

sys@ORA19C 21:53:56> execute dbms_workload_repository.create_snapshot();

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.44
sys@ORA19C 21:53:58> select segment_name,PARTITION_NAME,segment_type from dba_segments where tablespace_name='AWRTBS';

SEGMENT_NAME                   PARTITION_NAME                                               SEGMENT_TYPE
------------------------------ ------------------------------------------------------------ ---------------
WRH$_FILESTATXS                WRH$_FILESTATXS_1232450071_2690                              TABLE PARTITION
WRH$_SQLSTAT                   WRH$_SQLSTAT_1232450071_2690                                 TABLE PARTITION
WRH$_SYSTEM_EVENT              WRH$_SYSTEM_EVENT_1232450071_2690                            TABLE PARTITION
WRH$_WAITSTAT                  WRH$_WAITSTAT_1232450071_2690                                TABLE PARTITION
WRH$_LATCH                     WRH$_LATCH_1232450071_2690                                   TABLE PARTITION
WRH$_LATCH_MISSES_SUMMARY      WRH$_LATCH_MISSES_SUMMARY_1232450071_2690                    TABLE PARTITION
WRH$_DB_CACHE_ADVICE           WRH$_DB_CACHE_ADVICE_1232450071_2690                         TABLE PARTITION
WRH$_ROWCACHE_SUMMARY          WRH$_ROWCACHE_SUMMARY_1232450071_2690                        TABLE PARTITION
WRH$_SGASTAT                   WRH$_SGASTAT_1232450071_2690                                 TABLE PARTITION
WRH$_SYSSTAT                   WRH$_SYSSTAT_1232450071_2690                                 TABLE PARTITION
WRH$_PARAMETER                 WRH$_PARAMETER_1232450071_2690                               TABLE PARTITION
WRH$_SEG_STAT                  WRH$_SEG_STAT_1232450071_2690                                TABLE PARTITION
WRH$_SERVICE_STAT              WRH$_SERVICE_STAT_1232450071_2690                            TABLE PARTITION
WRH$_ACTIVE_SESSION_HISTORY    WRH$_ACTIVE_SESSION_HISTORY_1232450071_2690                  TABLE PARTITION
WRH$_SYSMETRIC_HISTORY         WRH$_SYSMETRIC_HISTORY_1232450071_2690                       TABLE PARTITION
WRH$_LATCH_CHILDREN            WRH$_LATCH_CHILDREN_1232450071_0                             TABLE PARTITION
WRH$_LATCH_PARENT              WRH$_LATCH_PARENT_1232450071_0                               TABLE PARTITION
WRH$_DLM_MISC                  WRH$_DLM_MISC_1232450071_0                                   TABLE PARTITION
WRH$_INST_CACHE_TRANSFER       WRH$_INST_CACHE_TRANSFER_1232450071_0                        TABLE PARTITION
WRH$_INTERCONNECT_PINGS        WRH$_INTERCONNECT_PINGS_1232450071_0                         TABLE PARTITION
WRH$_TABLESPACE_STAT           WRH$_TABLESPACE_STAT_1232450071_2690                         TABLE PARTITION
WRH$_OSSTAT                    WRH$_OSSTAT_1232450071_2690                                  TABLE PARTITION
WRH$_SYS_TIME_MODEL            WRH$_SYS_TIME_MODEL_1232450071_2690                          TABLE PARTITION
WRH$_SERVICE_WAIT_CLASS        WRH$_SERVICE_WAIT_CLASS_1232450071_2690                      TABLE PARTITION
WRH$_EVENT_HISTOGRAM           WRH$_EVENT_HISTOGRAM_1232450071_2690                         TABLE PARTITION
WRH$_MVPARAMETER               WRH$_MVPARAMETER_1232450071_2690                             TABLE PARTITION
WRH$_CELL_GLOBAL_SUMMARY       WRH$_CELL_GLOBAL_SUMMARY_1232450071_2690                     TABLE PARTITION
WRH$_CELL_DISK_SUMMARY         WRH$_CELL_DISK_SUMMARY_1232450071_2690                       TABLE PARTITION
WRH$_CELL_GLOBAL               WRH$_CELL_GLOBAL_1232450071_2690                             TABLE PARTITION
WRH$_CELL_IOREASON             WRH$_CELL_IOREASON_1232450071_2690                           TABLE PARTITION
WRH$_CELL_DB                   WRH$_CELL_DB_1232450071_2690                                 TABLE PARTITION
WRH$_CELL_OPEN_ALERTS          WRH$_CELL_OPEN_ALERTS_1232450071_2690                        TABLE PARTITION
WRH$_IM_SEG_STAT               WRH$_IM_SEG_STAT_1232450071_2690                             TABLE PARTITION
WRM$_PDB_IN_SNAP               WRM$_PDB_IN_SNAP_1232450071_2690                             TABLE PARTITION
WRH$_CON_SYSMETRIC_HISTORY     WRH$_CON_SYSMETRIC_HISTORY_1232450071_2690                   TABLE PARTITION
WRM$_ACTIVE_PDBS               WRM$_ACTIVE_PDBS_1232450071_2690                             TABLE PARTITION
WRH$_CON_SYSSTAT               WRH$_CON_SYSSTAT_1232450071_2690                             TABLE PARTITION
WRH$_CON_SYSTEM_EVENT          WRH$_CON_SYSTEM_EVENT_1232450071_2690                        TABLE PARTITION
WRH$_PROCESS_WAITTIME          WRH$_PROCESS_WAITTIME_1232450071_2690                        TABLE PARTITION
WRH$_ASM_DISK_STAT_SUMMARY     WRH$_ASM_DISK_STAT_SUMMARY_1232450071_2690                   TABLE PARTITION
WRH$_AWR_TEST_1                WRH$_AWR_TEST_1_1232450071_2690                              TABLE PARTITION
WRH$_SESS_NETWORK              WRH$_SESS_NETWORK_1232450071_2690                            TABLE PARTITION
WRH$_CON_SYS_TIME_MODEL        WRH$_CON_SYS_TIME_MODEL_1232450071_2690                      TABLE PARTITION

43 rows selected.

Elapsed: 00:00:00.01
sys@ORA19C 21:54:08> 

第一例Oracle 21c恢复咨询

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:第一例Oracle 21c恢复咨询

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

记录一个Oracle 21c故障的恢复请求(这个是第一个21c的恢复咨询),这个表明21C确实有客户在生产上使用了(不过这个是国外客户,国内的目前还没有遇到)
21c


故障原因是最初的数据文件不一致,数据库无法open,最终经过一系列折腾之后,有数据文件offline的情况下执行了resetlogs,导致部分文件resetlogs scn不一致
wrong-resetlogs

解决oracle数据文件路径有回车故障

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:解决oracle数据文件路径有回车故障

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最近遇到一个硬件恢复朋友的请求,oracle数据库文件恢复出来了,但是在linux上面启动的时候,有两个文件无法检测到,dbv检测正常.
checkpiont_err
dbv


通过分析是由于文件无法找到原因导致
file-not-found

进一步检查发现原库这两个文件结尾带有回车,但是恢复出来的文件不带回车
huiche

对于这个故障,我在测试环境进行了重现并且给予解决
1. 创建带回车键数据文件

SQL> create tablespace xifenfei datafile '/u01/app/oracle/oradata/xifenfei/xff01.dbf
  2  ' size 128m;

Tablespace created.

SQL> alter tablespace xifenfei add datafile '/u01/app/oracle/oradata/xifenfei/xff02.dbf' size 128M;

Tablespace altered.

SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------
/u01/app/oracle/oradata/xifenfei/system01.dbf
/u01/app/oracle/oradata/xifenfei/sysaux01.dbf
/u01/app/oracle/oradata/xifenfei/undotbs01.dbf
/u01/app/oracle/oradata/xifenfei/users01.dbf
/u01/app/oracle/oradata/xifenfei/xff01.dbf
/u01/app/oracle/oradata/xifenfei/xff02.dbf

6 rows selected.

2.操作系统层面查看文件(在我的ssh工具中,可以看到带回车键文件和不带回车文件不一样,使用的是crt工具,其他工具是否显示不确定)

[oracle@xifenfei ~]$ cd /u01/app/oracle/oradata/xifenfei/
[oracle@xifenfei xifenfei]$ ls -l xff*
-rw-r----- 1 oracle oinstall 134225920 Dec 14 08:05 xff01.dbf?
-rw-r----- 1 oracle oinstall 134225920 Dec 14 08:05 xff02.dbf

3. 操作系统层面重命名数据文件

[oracle@xifenfei xifenfei]$ mv xff01.dbf* xff01.dbf
[oracle@xifenfei xifenfei]$ ls -l xff*
-rw-r----- 1 oracle oinstall 134225920 Dec 14 08:05 xff01.dbf
-rw-r----- 1 oracle oinstall 134225920 Dec 14 08:05 xff02.dbf

3. 数据库层面重启看文件情况,发现文件不能被正常发现(当然不能,文件被os层面mv了)

SQL> shutdown abort;
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area  551165952 bytes
Fixed Size                  2255112 bytes
Variable Size             369100536 bytes
Database Buffers          171966464 bytes
Redo Buffers                7843840 bytes
Database mounted.
SQL> select file#, CHECKPOINT_CHANGE# from v$datafile_header;

     FILE# CHECKPOINT_CHANGE#
---------- ------------------
         1          306775013
         2          306775013
         3          306775013
         4          306775013
         5                  0
         6          306779423

6 rows selected.

RMAN> report schema;

Report of database schema for database with db_unique_name XIFENFEI

List of Permanent Datafiles
===========================
File Size(MB) Tablespace           RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1    770      SYSTEM               ***     /u01/app/oracle/oradata/xifenfei/system01.dbf
2    1950     SYSAUX               ***     /u01/app/oracle/oradata/xifenfei/sysaux01.dbf
3    70       UNDOTBS1             ***     /u01/app/oracle/oradata/xifenfei/undotbs01.dbf
4    12       USERS                ***     /u01/app/oracle/oradata/xifenfei/users01.dbf
5    0        XIFENFEI             ***     /u01/app/oracle/oradata/xifenfei/xff01.dbf

6    128      XIFENFEI             ***     /u01/app/oracle/oradata/xifenfei/xff02.dbf

4. 解决控制文件和数据文件实际名称不一致问题

RMAN> catalog datafilecopy '/u01/app/oracle/oradata/xifenfei/xff01.dbf';

using target database control file instead of recovery catalog
cataloged datafile copy
datafile copy file name=/u01/app/oracle/oradata/xifenfei/xff01.dbf RECID=1 STAMP=1187684217

RMAN> switch datafile 5 to copy;

datafile 5 switched to datafile copy "/u01/app/oracle/oradata/xifenfei/xff01.dbf"

RMAN> report schema;

Report of database schema for database with db_unique_name XIFENFEI

List of Permanent Datafiles
===========================
File Size(MB) Tablespace           RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1    770      SYSTEM               ***     /u01/app/oracle/oradata/xifenfei/system01.dbf
2    1950     SYSAUX               ***     /u01/app/oracle/oradata/xifenfei/sysaux01.dbf
3    70       UNDOTBS1             ***     /u01/app/oracle/oradata/xifenfei/undotbs01.dbf
4    12       USERS                ***     /u01/app/oracle/oradata/xifenfei/users01.dbf
5    128      XIFENFEI             ***     /u01/app/oracle/oradata/xifenfei/xff01.dbf
6    128      XIFENFEI             ***     /u01/app/oracle/oradata/xifenfei/xff02.dbf

List of Temporary Files
=======================
File Size(MB) Tablespace           Maxsize(MB) Tempfile Name
---- -------- -------------------- ----------- --------------------
1    123      TEMP                 32767       /u01/app/oracle/oradata/xifenfei/temp01.dbf


RMAN> alter database open;

database opened

ORA-12514: TNS: 监听进程不能解析在连接描述符中给出的SERVICE_NAME

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-12514: TNS: 监听进程不能解析在连接描述符中给出的SERVICE_NAME

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一个10g的库应用访问报ORA-12514: TNS: 监听进程不能解析在连接描述符中给出的SERVICE_NAME
ORA-12514


通过分析alert日志,确认是数据库启动报ORA-600 4194错误

Mon Sep 23 16:12:42 2024
SMON: enabling cache recovery
Mon Sep 23 16:12:43 2024
Successfully onlined Undo Tablespace 1.
Mon Sep 23 16:12:43 2024
SMON: enabling tx recovery
Mon Sep 23 16:12:43 2024
Database Characterset is ZHS16GBK
Mon Sep 23 16:12:43 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\udump\xifenfei_ora_7832.trc:
ORA-00600: 内部错误代码, 参数: [4194], [66], [50], [], [], [], [], []

DEBUG: Replaying xcb 0xae312888, pmd 0x9058f4d4 for failed op 8
Doing block recovery for file 2 block 5547
No block recovery was needed
Mon Sep 23 16:13:31 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\udump\xifenfei_ora_7832.trc:
ORA-00600: 内部错误代码, 参数: [4194], [66], [50], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [4194], [66], [50], [], [], [], [], []

Mon Sep 23 16:13:32 2024
DEBUG: Replaying xcb 0xae312888, pmd 0x9058f4d4 for failed op 8
Mon Sep 23 16:13:32 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\udump\xifenfei_ora_7832.trc:
ORA-00600: 内部错误代码, 参数: [4194], [66], [50], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [4194], [66], [50], [], [], [], [], []

Doing block recovery for file 2 block 5547
No block recovery was needed
Mon Sep 23 16:13:33 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\udump\xifenfei_ora_7832.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [4194], [66], [50], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [66], [50], [], [], [], [], []

Mon Sep 23 16:14:18 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_smon_5880.trc:
ORA-00600: internal error code, arguments: [4194], [66], [50], [], [], [], [], []

Mon Sep 23 16:14:19 2024
DEBUG: Replaying xcb 0xae312888, pmd 0x9058f4d4 for failed op 8
Mon Sep 23 16:14:19 2024
Non-fatal internal error happenned while SMON was doing shrinking of rollback segments.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Mon Sep 23 16:14:19 2024
Doing block recovery for file 2 block 5547
No block recovery was needed
Mon Sep 23 16:15:06 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_pmon_6952.trc:
ORA-00600: internal error code, arguments: [4194], [66], [50], [], [], [], [], []

Mon Sep 23 16:15:06 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_pmon_6952.trc:
ORA-00600: internal error code, arguments: [4194], [66], [50], [], [], [], [], []

Mon Sep 23 16:15:06 2024
PMON: terminating instance due to error 472
Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_psp0_2104.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_lgwr_3200.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_dbw1_448.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_dbw0_7436.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_mman_1704.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_dbw2_5072.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_ckpt_6628.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_reco_7924.trc:
ORA-00472: PMON  process terminated with error

Mon Sep 23 16:15:07 2024
Errors in file d:\oracle\product\10.2.0\admin\xifenfei\bdump\xifenfei_smon_5880.trc:
ORA-00472: PMON  process terminated with error

Instance terminated by PMON, pid = 6952

这个比较简单一般就是undo异常,对undo设置为人工管理,然后重建undo完成本次恢复任务

数据库启动报ORA-27102 OSD-00026 O/S-Error: (OS 1455)

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:数据库启动报ORA-27102 OSD-00026 O/S-Error: (OS 1455)

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一个win上运行的11.2.0.4库启动的时候报ORA-27102 OSD-00026 O/S-Error: (OS 1455)错误

SQL> STARTUP
ORA-27102: out of memory
OSD-00026: 附加错误信息
O/S-Error: (OS 1455) 页面文件太小,无法完成操作。

数据库alert日志信息

Fri Sep 13 17:17:39 2024
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Initial number of CPU is 24
Number of processor cores in the system is 12
Number of processor sockets in the system is 1
Picked latch-free SCN scheme 3
Error: Failed to allocate SGA granule addr 0000000760000000 size 268435456 
 mode 131073 locality 0
Errors in file D:\app\Administrator\diag\rdbms\xff\xff\trace\xff_ora_77728.trc:
ORA-27102: out of memory
OSD-00026: 附加错误信息
O/S-Error: (OS 1455) 页面文件太小,无法完成操作。
Error: Failed to allocate SGA granule addr 0000000750000000 size 268435456 
 mode 131073 locality 0
Errors in file D:\app\Administrator\diag\rdbms\xff\xff\trace\xff_ora_77728.trc:

看报错信息,第一感觉和内存有关系,可能内存不足无法满足sga分配需求,查看系统空闲内存情况
22


系统明显有足够内存,出现该问题的原因可能和win操作系统本身有关系,由于要快速恢复业务,直接重启系统数据库启动成功

硬件故障导致ORA-01242 ORA-01122等错误

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:硬件故障导致ORA-01242 ORA-01122等错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户多个节点rac,早上反馈说有两个节点实例异常,需要分析原因,查看其中一个节点的数据库alert日志,发现是由于访问1399号文件异常报ORA-01242 ORA-01122等错误,导致实例crash

Mon Aug 19 20:48:02 2024
Read of datafile '+DATA/xifenfei_01-157.dbf' (fno 1399) header failed with ORA-01207
Rereading datafile 1399 header failed with ORA-01207
Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_ckpt_75582.trc:
ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf'
ORA-01207: file is more recent than control file - old control file
Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_ckpt_75582.trc:
ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf'
ORA-01207: file is more recent than control file - old control file
CKPT (ospid: 75582): terminating the instance due to error 1242
Mon Aug 19 20:48:02 2024
System state dump requested by (instance=6, osid=75582 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_diag_75520.trc
Termination issued to instance processes. Waiting for the processes to exit
Mon Aug 19 20:48:13 2024
ORA-1092 : opitsk aborting process

继续分析日志发现集群尝试拉起该实例,遭遇ORA-01186,ORA-01122无法启动成功

ALTER DATABASE OPEN /* db agent *//* {0:6:39} */
Mon Aug 19 20:49:34 2024
SUCCESS: diskgroup DATA was mounted
Mon Aug 19 20:49:34 2024
NOTE: dependency between database xff and diskgroup resource ora.DATA.dg is established
Mon Aug 19 20:50:41 2024
Picked broadcast on commit scheme to generate SCNs
Mon Aug 19 20:50:42 2024
Read of datafile '+DATA/xifenfei_01-157.dbf' (fno 1399) header failed with ORA-01207
Rereading datafile 1399 header failed with ORA-01207
Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_dbw0_29208.trc:
ORA-01186: file 1399 failed verification tests
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf'
ORA-01207: file is more recent than control file - old control file
File 1399 not verified due to error ORA-01122

这个错误是数据库文件访问异常导致,根据经验,出现这种问题一般是由于底层异常导致,查看系统messages日志,发现有硬件磁盘报错

Aug 19 20:41:58 xff6 fcoemon: FC_HOST_EVENT 6894 at 1724071318 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:41:58 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:41:58 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:41:58 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:41:58 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:03 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:42:03 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:03 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:42:03 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:03 xff6 fcoemon: FC_HOST_EVENT 6895 at 1724071323 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:07 xff6 fcoemon: FC_HOST_EVENT 6896 at 1724071327 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:07 xff6 kernel: sd 1:0:0:44: [sdat]  
Aug 19 20:42:07 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:07 xff6 kernel: sd 1:0:0:44: [sdat]  
Aug 19 20:42:07 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:12 xff6 fcoemon: FC_HOST_EVENT 6897 at 1724071332 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:12 xff6 kernel: sd 1:0:0:44: [sdat]  
Aug 19 20:42:12 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:12 xff6 kernel: sd 1:0:0:44: [sdat]  
Aug 19 20:42:12 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:25 xff6 fcoemon: FC_HOST_EVENT 6898 at 1724071345 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:25 xff6 kernel: sd 1:0:0:42: [sdar]  
Aug 19 20:42:25 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:25 xff6 kernel: sd 1:0:0:42: [sdar]  
Aug 19 20:42:25 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6899 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:42: [sdar]  
Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:42: [sdar]  
Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0xd0 ASCQ=0x6ASC=0xd0 ASCQ=0x6
Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6900 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0xd0 ASCQ=0x6ASC=0xd0 ASCQ=0x6
Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6901 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:53 xff6 fcoemon: FC_HOST_EVENT 6902 at 1724071373 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:53 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:53 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:53 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:53 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
Aug 19 20:43:03 xff6 kernel: sd 1:0:0:40: [sdap]  
Aug 19 20:43:03 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:43:03 xff6 kernel: sd 1:0:0:40: [sdap]  
Aug 19 20:43:03 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6903 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6904 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6905 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:43:03 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:43:03 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:43:03 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:43:03 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
Aug 19 20:49:26 xff6 kernel: scsi_verify_blk_ioctl: 683 callbacks suppressed

客户进一步分析是由于昨天存储坏了一块盘,然后热备盘顶上了,但是不知道什么原因出现了文件访问异常,可能和当时的rebuild过程有关系.由于客户是rac环境,还有部分剩余节点运行正常,对于异常节点直接启动库成功
20240820-182825


节点写入数据报ORA-01187: cannot read from file because it failed verification tests错误
ora-01187

在所有节点通过执行ALTER SYSTEM CHECK DATAFILES,然后所有节点操作正常
check_datafile

Oracle 支持GB18030-2022

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle 支持GB18030-2022

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

GB18030 是中国政府标准,定义了在中国软件所需的语言和字符集支持. 它是一种与 ISO 10646/Unicode 相匹配的 Unicode 转换格式,并提供涵盖所有 Unicode 的字符库.它最初于 2000 年发布 (GB18030-2000),随后在 2005 年 (GB18030-2005) 和 2022 年 (GB18030-2022) 发布了两次更新.GB18030-2022 在字符覆盖方面的内容等同于Unicode Version 11.0. GB18030-2022 标准的生效日期为 2023 年 8 月 1 日.Oracle 数据库通过 AL32UTF8 Unicode 数据库字符集支持 GB18030 字符的存储.同时还提供客户端专用字符集 ZHS32GB18030,使应用程序可以在客户端处理 GB18030 编码文本的输入/输出,并根据需要进行字符集转换. 截至 2023 年 3 月,现有 Oracle 数据库版本中 ZHS32GB18030 的实施是基于 GB18030-2005.
Oracle Database 23ai 版本中已实现对最新 GB18030-2022 标准的支持.具体来说,Oracle Database 23ai 支持 GB18030-2022 标准的级别 3 的实现,这是 GB18030 最广泛的支持级别. 与此同时,我们还为现有客户在 Oracle Database 19c 各 RU 版本之上通过的一次性补丁来支持 GB18030-2022.Oracle Database 19c 如果要支持 GB18030-2022 需要下载补丁 (#34994751).它可以应用于 Oracle 19c RU 版本 19.3 或更高版本.它需要应用于数据库服务器和客户端. 此补丁仅启用对 Oracle 数据库的 GB18030-2022 支持. 为了使应用程序完全支持 GB18030-2022,应用程序堆栈中的每个组件都需要能够确保处理 GB18030-2022 中的字符.
ZHS32GB18030字符集是仅限客户端的字符集,因此不支持将 ZHS32GB18030 用作NLS_CHARACTERSET.如果您的数据库使用ZHS32GB18030作为NLS_CHARACTERSET,那么强烈建议您尽快迁移到 AL32UTF8.
GB18030编码定义了完整的 Unicode 映射,这意味着如果在 GB18030客户端上使用设置为ZHS32GB18030的NLS_LANG并使用 AL32UTF8(或 UTF8)NLS_CHARACTERSET数据将相互转换和从 AL32UTF8 转换为 并完全保留.在使用非 Unicode GB18030编码和设置为 ZHS32GB18030 的 NLS_LANG 的 GB18030客户端上插入 GB18030数据时,Oracle 会将非 Unicode GB18030代码转换为AL32UTF8代码,并将其作为 AL32UTF8 代码存储在数据库中.在使用非 Unicode GB18030编码和设置为 ZHS32GB18030 的 NLS_LANG 的 GB18030客户端上选择 GB18030-2000 数据时,Oracle 会将 GB18030数据从 AL32UTF8 代码转换为非 Unicode GB18030代码提供给客户端,因此,如果此客户端使用非 Unicode GB18030代码,则数据库端的存储AL32UTF8对客户端完全透明.
通过以下sql验证你的数据库是否支持GB18030-2022
以下结果表示支持GB18030-2022

[oracle@xifenfei ~]$ sqlplus / as sysdba

SQL*Plus: Release 23.0.0.0.0 - for Oracle Cloud and Engineered Systems on Thu Aug 1 21:02:36 2024
Version 23.5.0.24.07

Copyright (c) 1982, 2024, Oracle.  All rights reserved.


Connected to:
Oracle Database 23ai Enterprise Edition Release 23.0.0.0.0 - for Oracle Cloud and Engineered Systems
Version 23.5.0.24.07

SQL> SELECT DUMP(CONVERT(UNISTR('\FE10\FFFD\E78D'), 'ZHS32GB18030', 'AL16UTF16'), 16) FROM DUAL;

DUMP(CONVERT(UNISTR('\FE10\FFFD\E78D'),'ZHS32GB18030','AL16UTF16'),16)
--------------------------------------------------------------------------------
Typ=1 Len=10: a6,d9,84,31,a4,37,84,31,82,36

以下结果不表示支持GB18030-2022

[oracle@iZbp11c0qyuuo1gr7j98upZ ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Thu Aug 1 20:46:12 2024

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> SELECT DUMP(CONVERT(UNISTR('\FE10\FFFD\E78D'), 'ZHS32GB18030', 'AL16UTF16'), 16) FROM DUAL;

DUMP(CONVERT(UNISTR('\FE10\FFFD\E78D
------------------------------------
Typ=1 Len=8: 84,31,82,36,a3,bf,a6,d9

参考文档:Oracle 数据库 GB18030-2022 支持方针 (Doc ID 2937409.1)

手工对multipath设备进行授权导致asm 磁盘组mount报ORA-15032-ORA-15131

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:手工对multipath设备进行授权导致asm 磁盘组mount报ORA-15032-ORA-15131

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户硬件通过底层重组raid,然后把lun进行到asm的机器上,在mount data_dg磁盘组的时候,报ORA-15032 ORA-15131错误,磁盘组无法正常mount,这种报错不太常见,一般要不直接报某个block无法访问,要不直接报缺少asm disk之类的.
ORA-15131


通过远程上去分析,发现alert日志如下

Wed Jul 31 04:55:17 2024
NOTE: attached to recovery domain 1
NOTE: cache recovered group 1 to fcn 0.1814063801
NOTE: redo buffer size is 256 blocks (1053184 bytes)
Wed Jul 31 04:55:17 2024
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (DATA_DG)
Errors in file /oracle/u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_lgwr_8681.trc:
ORA-15025: could not open disk "/dev/mapper/xffdb_data01_new"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
Errors in file /oracle/u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_lgwr_8681.trc:
ORA-15025: could not open disk "/dev/mapper/xffdb_data01_new"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
WARNING: cache failed reading from group=1(DATA_DG) fn=1 blk=3 count=1 from disk= 0 
  (DATA_DG_0000) kfkist=0x20 status=0x02 osderr=0x0 file=kfc.c line=11596
Errors in file /oracle/u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_lgwr_8681.trc:
ORA-15025: could not open disk "/dev/mapper/xffdb_data01_new"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
ORA-15080: synchronous I/O operation to a disk failed
ERROR: cache failed to read group=1(DATA_DG) fn=1 blk=3 from disk(s): 0(DATA_DG_0000)
ORA-15080: synchronous I/O operation to a disk failed
NOTE: cache initiating offline of disk 0 group DATA_DG
NOTE: process _lgwr_+asm2 (8681) initiating offline of disk 0.3915927124 (DATA_DG_0000) with mask 0x7e in group 1
NOTE: initiating PST update: grp = 1, dsk = 0/0xe9684e54, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 42 for pid 15, osid 8681
ERROR: Disk 0 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 1)
WARNING: Offline for disk DATA_DG_0000 in mode 0x7f failed.
Wed Jul 31 04:55:17 2024
NOTE: halting all I/Os to diskgroup 1 (DATA_DG)
NOTE: LGWR caught ORA-15131 while mounting diskgroup 1
ORA-15080: synchronous I/O operation to a disk failed
NOTE: cache initiating offline of disk 0 group DATA_DG
NOTE: process _lgwr_+asm2 (8681) initiating offline of disk 0.3915927124 (DATA_DG_0000) with mask 0x7e in group 1
NOTE: initiating PST update: grp = 1, dsk = 0/0xe9684e54, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 42 for pid 15, osid 8681
ERROR: Disk 0 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 1)
WARNING: Offline for disk DATA_DG_0000 in mode 0x7f failed.
Wed Jul 31 04:55:17 2024
NOTE: halting all I/Os to diskgroup 1 (DATA_DG)
NOTE: LGWR caught ORA-15131 while mounting diskgroup 1
ERROR: ORA-15131 signalled during mount of diskgroup DATA_DG
NOTE: cache dismounting (clean) group 1/0xA868BD55 (DATA_DG)
NOTE: messaging CKPT to quiesce pins Unix process pid: 16915, image: oracle@xffdb2 (TNS V1-V3)
NOTE: lgwr not being msg'd to dismount
Wed Jul 31 04:55:18 2024
List of instances:
 2
Dirty detach reconfiguration started (new ddet inc 1, cluster inc 9)
 Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 1 invalid = TRUE
 2 GCS resources traversed, 0 cancelled
Dirty Detach Reconfiguration complete
freeing rdom 1
WARNING: dirty detached from domain 1
WARNING: thread recovery enqueue was not held for domain 1 when doing a dirty detach
NOTE: cache dismounted group 1/0xA868BD55 (DATA_DG)
NOTE: cache ending mount (fail) of group DATA_DG number=1 incarn=0xa868bd55
NOTE: cache deleting context for group DATA_DG 1/0xa868bd55
GMON dismounting group 1 at 43 for pid 29, osid 16915
NOTE: Disk DATA_DG_0000 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_DG_0001 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_DG_0002 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_DG_0003 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_DG_0004 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_DG_0005 in mode 0x7f marked for de-assignment
ERROR: diskgroup DATA_DG was not mounted
ORA-15032: not all alterations performed
ORA-15131: block  of file  in diskgroup  could not be read
ERROR: alter diskgroup data_dg mount

基本上可以确认是由于访问/dev/mapper/xffdb_data01_new 磁盘权限不对导致读disk= 0 fn=1 blk=3失败(突然读这个block没有权限,而没有报最初的磁盘头无权限,有点不合常理),进一步分析确认是xffdb_data01_new 权限不对.

xffdb2:/oracle/u01/app/grid/diag/asm/+asm/+ASM2/trace$ls -l /dev/mapper/
total 0
crw-rw---- 1 root root 10, 58 Jul 26 12:24 control
lrwxrwxrwx 1 root root      8 Jul 31 04:21 mpathe -> ../dm-17
lrwxrwxrwx 1 root root      7 Jul 31 04:28 mpathf -> ../dm-7
lrwxrwxrwx 1 root root      8 Jul 31 04:55 xffdb_data01_new -> ../dm-14
lrwxrwxrwx 1 root root      8 Jul 31 04:55 xffdb_data02_new -> ../dm-13
lrwxrwxrwx 1 root root      7 Jul 31 04:55 xffdb_data03 -> ../dm-2
lrwxrwxrwx 1 root root      7 Jul 31 04:55 xffdb_data04 -> ../dm-5
lrwxrwxrwx 1 root root      8 Jul 31 04:55 xffdb_data05_new -> ../dm-12
lrwxrwxrwx 1 root root      7 Jul 31 04:55 xffdb_data06 -> ../dm-6
lrwxrwxrwx 1 root root      8 Jul 31 04:28 xffdb_data07 -> ../dm-11
lrwxrwxrwx 1 root root      7 Jul 31 04:28 xffdb_data08 -> ../dm-9
lrwxrwxrwx 1 root root      7 Jul 31 04:59 xffdb_log1 -> ../dm-4
lrwxrwxrwx 1 root root      7 Jul 31 04:59 xffdb_log2 -> ../dm-3
lrwxrwxrwx 1 root root      7 Jul 31 04:59 xffdb_vote2 -> ../dm-8
lrwxrwxrwx 1 root root      8 Jul 31 04:59 xffdb_vote3 -> ../dm-10
lrwxrwxrwx 1 root root      8 Jul 26 12:24 vgdata-lv_data -> ../dm-15
lrwxrwxrwx 1 root root      7 Jul 26 12:24 vg_xffdb2-LogVol00 -> ../dm-1
lrwxrwxrwx 1 root root      7 Jul 26 12:24 vg_xffdb2-LogVol01 -> ../dm-0
lrwxrwxrwx 1 root root      8 Jul 26 12:24 vg_xffdb2-LogVol02 -> ../dm-16
xffdb2:/oracle/u01/app/grid/diag/asm/+asm/+ASM2/trace$ls -l /dev/dm*
brw-rw---- 1 root disk     253,  0 Jul 26 12:24 /dev/dm-0
brw-rw---- 1 root disk     253,  1 Jul 26 12:24 /dev/dm-1
brw-rw---- 1 grid asmadmin 253, 10 Jul 31 05:13 /dev/dm-10
brw-rw---- 1 root disk     253, 11 Jul 31 04:28 /dev/dm-11
brw-rw---- 1 root disk     253, 12 Jul 31 04:55 /dev/dm-12
brw-rw---- 1 grid asmadmin 253, 13 Jul 31 04:55 /dev/dm-13
brw-rw---- 1 grid asmadmin 253, 14 Jul 31 04:55 /dev/dm-14
brw-rw---- 1 root disk     253, 15 Jul 26 12:24 /dev/dm-15
brw-rw---- 1 root disk     253, 16 Jul 26 12:24 /dev/dm-16
brw-rw---- 1 root disk     253, 17 Jul 31 04:21 /dev/dm-17
brw-rw---- 1 grid asmadmin 253,  2 Jul 31 04:55 /dev/dm-2
brw-rw---- 1 grid asmadmin 253,  3 Jul 31 04:59 /dev/dm-3
brw-rw---- 1 grid asmadmin 253,  4 Jul 31 05:13 /dev/dm-4
brw-rw---- 1 grid asmadmin 253,  5 Jul 31 04:55 /dev/dm-5
brw-rw---- 1 grid asmadmin 253,  6 Jul 31 04:55 /dev/dm-6
brw-rw---- 1 root disk     253,  7 Jul 31 04:28 /dev/dm-7
brw-rw---- 1 grid asmadmin 253,  8 Jul 31 05:13 /dev/dm-8
brw-rw---- 1 root disk     253,  9 Jul 31 04:28 /dev/dm-9

再进一步确认xffdb_*_new三个磁盘是硬件恢复之后镜像过来的,然后现场工程师直接人工修改/dev/dm_[12-14]权限,再尝试mount磁盘组,结果发生该错误,通过v$asm_disk再次查询asm disk情况,发现xffdb_*_new的磁盘均不在列表中

GROUP_NUMBER DISK_NUMBER HEADER_STATUS         STATE          PATH
------------ ----------- --------------------- -------------- --------------------------
           0           2 MEMBER                NORMAL         /dev/mapper/xffdb_data03
           0           3 MEMBER                NORMAL         /dev/mapper/xffdb_data06
           0           4 MEMBER                NORMAL         /dev/mapper/xffdb_data04
           3           1 MEMBER                NORMAL         /dev/mapper/xffdb_vote2
           2           0 MEMBER                NORMAL         /dev/mapper/xffdb_log1
           3           2 MEMBER                NORMAL         /dev/mapper/xffdb_vote3
           2           1 MEMBER                NORMAL         /dev/mapper/xffdb_log2

7 rows selected.

进一步查看磁盘权限

xffdb2:/dev/mapper$ls -ltr
total 0
crw-rw---- 1 root root 10, 58 Jul 26 12:24 control
lrwxrwxrwx 1 root root      7 Jul 26 12:24 vg_xffdb2-LogVol01 -> ../dm-0
lrwxrwxrwx 1 root root      8 Jul 26 12:24 vgdata-lv_data -> ../dm-15
lrwxrwxrwx 1 root root      7 Jul 26 12:24 vg_xffdb2-LogVol00 -> ../dm-1
lrwxrwxrwx 1 root root      8 Jul 26 12:24 vg_xffdb2-LogVol02 -> ../dm-16
lrwxrwxrwx 1 root root      8 Jul 31 04:21 mpathe -> ../dm-17
lrwxrwxrwx 1 root root      7 Jul 31 04:28 xffdb_data08 -> ../dm-9
lrwxrwxrwx 1 root root      8 Jul 31 04:28 xffdb_data07 -> ../dm-11
lrwxrwxrwx 1 root root      7 Jul 31 04:28 mpathf -> ../dm-7
lrwxrwxrwx 1 root root      8 Jul 31 04:55 xffdb_data05_new -> ../dm-12
lrwxrwxrwx 1 root root      8 Jul 31 04:59 xffdb_vote3 -> ../dm-10
lrwxrwxrwx 1 root root      7 Jul 31 04:59 xffdb_vote2 -> ../dm-8
lrwxrwxrwx 1 root root      7 Jul 31 04:59 xffdb_log2 -> ../dm-3
lrwxrwxrwx 1 root root      7 Jul 31 04:59 xffdb_log1 -> ../dm-4
lrwxrwxrwx 1 root root      8 Jul 31 05:15 xffdb_data01_new -> ../dm-14
lrwxrwxrwx 1 root root      8 Jul 31 05:15 xffdb_data02_new -> ../dm-13
lrwxrwxrwx 1 root root      7 Jul 31 05:15 xffdb_data06 -> ../dm-6
lrwxrwxrwx 1 root root      7 Jul 31 05:15 xffdb_data04 -> ../dm-5
lrwxrwxrwx 1 root root      7 Jul 31 05:15 xffdb_data03 -> ../dm-2
xffdb2:/dev/mapper$ls -l /dev/dm*
brw-rw---- 1 root disk     253,  0 Jul 26 12:24 /dev/dm-0
brw-rw---- 1 root disk     253,  1 Jul 26 12:24 /dev/dm-1
brw-rw---- 1 grid asmadmin 253, 10 Jul 31 05:22 /dev/dm-10
brw-rw---- 1 root disk     253, 11 Jul 31 04:28 /dev/dm-11
brw-rw---- 1 root disk     253, 12 Jul 31 04:55 /dev/dm-12
brw-rw---- 1 root disk     253, 13 Jul 31 05:15 /dev/dm-13
brw-rw---- 1 root disk     253, 14 Jul 31 05:15 /dev/dm-14
brw-rw---- 1 root disk     253, 15 Jul 26 12:24 /dev/dm-15
brw-rw---- 1 root disk     253, 16 Jul 26 12:24 /dev/dm-16
brw-rw---- 1 root disk     253, 17 Jul 31 04:21 /dev/dm-17
brw-rw---- 1 grid asmadmin 253,  2 Jul 31 05:15 /dev/dm-2
brw-rw---- 1 grid asmadmin 253,  3 Jul 31 04:59 /dev/dm-3
brw-rw---- 1 grid asmadmin 253,  4 Jul 31 05:22 /dev/dm-4
brw-rw---- 1 grid asmadmin 253,  5 Jul 31 05:15 /dev/dm-5
brw-rw---- 1 grid asmadmin 253,  6 Jul 31 05:15 /dev/dm-6
brw-rw---- 1 root disk     253,  7 Jul 31 04:28 /dev/dm-7
brw-rw---- 1 grid asmadmin 253,  8 Jul 31 05:22 /dev/dm-8
brw-rw---- 1 root disk     253,  9 Jul 31 04:28 /dev/dm-9

发现进一步访问,这三个盘权限全部还原成root:disk,导致grid无法正常访问,到这一部分基本上可以判断恢复过来的多路径下面的三个磁盘,当被访问之时,权限会发生改变,一般发生该问题,是由于这些设备没有被udev进行绑定导致,使用udev对这三个磁盘进行权限和所有组相关信息进行绑定之后,磁盘权限不再变化,v$asm_disk中显示信息也正常

[root@xffdb2 rules.d]# ls -l /dev/dm*
brw-rw---- 1 root disk     253,  0 Jul 31 05:26 /dev/dm-0
brw-rw---- 1 root disk     253,  1 Jul 31 05:26 /dev/dm-1
brw-rw---- 1 grid asmadmin 253, 10 Jul 31 05:26 /dev/dm-10
brw-rw---- 1 root disk     253, 11 Jul 31 05:26 /dev/dm-11
brw-rw---- 1 grid asmadmin 253, 12 Jul 31 05:26 /dev/dm-12
brw-rw---- 1 grid asmadmin 253, 13 Jul 31 05:26 /dev/dm-13
brw-rw---- 1 grid asmadmin 253, 14 Jul 31 05:26 /dev/dm-14
brw-rw---- 1 root disk     253, 15 Jul 31 05:26 /dev/dm-15
brw-rw---- 1 root disk     253, 16 Jul 31 05:26 /dev/dm-16
brw-rw---- 1 root disk     253, 17 Jul 31 05:26 /dev/dm-17
brw-rw---- 1 grid asmadmin 253,  2 Jul 31 05:26 /dev/dm-2
brw-rw---- 1 grid asmadmin 253,  3 Jul 31 05:26 /dev/dm-3
brw-rw---- 1 grid asmadmin 253,  4 Jul 31 05:26 /dev/dm-4
brw-rw---- 1 grid asmadmin 253,  5 Jul 31 05:26 /dev/dm-5
brw-rw---- 1 grid asmadmin 253,  6 Jul 31 05:26 /dev/dm-6
brw-rw---- 1 root disk     253,  7 Jul 31 05:26 /dev/dm-7
brw-rw---- 1 grid asmadmin 253,  8 Jul 31 05:26 /dev/dm-8
brw-rw---- 1 root disk     253,  9 Jul 31 05:26 /dev/dm-9
[root@xffdb2 rules.d]# ls -l /dev/mapper/
total 0
crw-rw---- 1 root root 10, 58 Jul 31 05:26 control
lrwxrwxrwx 1 root root      8 Jul 31 05:26 mpathe -> ../dm-17
lrwxrwxrwx 1 root root      7 Jul 31 05:26 mpathf -> ../dm-7
lrwxrwxrwx 1 root root      8 Jul 31 05:26 xffdb_data01_new -> ../dm-14
lrwxrwxrwx 1 root root      8 Jul 31 05:26 xffdb_data02_new -> ../dm-13
lrwxrwxrwx 1 root root      7 Jul 31 05:26 xffdb_data03 -> ../dm-2
lrwxrwxrwx 1 root root      7 Jul 31 05:26 xffdb_data04 -> ../dm-5
lrwxrwxrwx 1 root root      8 Jul 31 05:26 xffdb_data05_new -> ../dm-12
lrwxrwxrwx 1 root root      7 Jul 31 05:26 xffdb_data06 -> ../dm-6
lrwxrwxrwx 1 root root      8 Jul 31 05:26 xffdb_data07 -> ../dm-11
lrwxrwxrwx 1 root root      7 Jul 31 05:26 xffdb_data08 -> ../dm-9
lrwxrwxrwx 1 root root      7 Jul 31 05:26 xffdb_log1 -> ../dm-4
lrwxrwxrwx 1 root root      7 Jul 31 05:26 xffdb_log2 -> ../dm-3
lrwxrwxrwx 1 root root      7 Jul 31 05:26 xffdb_vote2 -> ../dm-8
lrwxrwxrwx 1 root root      8 Jul 31 05:26 xffdb_vote3 -> ../dm-10
lrwxrwxrwx 1 root root      8 Jul 31 05:26 vgdata-lv_data -> ../dm-15
lrwxrwxrwx 1 root root      7 Jul 31 05:26 vg_xffdb2-LogVol00 -> ../dm-1
lrwxrwxrwx 1 root root      7 Jul 31 05:26 vg_xffdb2-LogVol01 -> ../dm-0
lrwxrwxrwx 1 root root      8 Jul 31 05:26 vg_xffdb2-LogVol02 -> ../dm-16
[root@xffdb2 rules.d]# 
SQL> /

GROUP_NUMBER DISK_NUMBER HEADER_STATUS                        STATE                    PATH
------------ ----------- ------------------------------------ ------------------------ -----------------------------
           0           0 MEMBER                               NORMAL                   /dev/mapper/xffdb_data01_new
           0           1 MEMBER                               NORMAL                   /dev/mapper/xffdb_data05_new
           0           2 MEMBER                               NORMAL                   /dev/mapper/xffdb_data03
           0           3 MEMBER                               NORMAL                   /dev/mapper/xffdb_data06
           0           4 MEMBER                               NORMAL                   /dev/mapper/xffdb_data04
           0           5 MEMBER                               NORMAL                   /dev/mapper/xffdb_data02_new
           3           1 MEMBER                               NORMAL                   /dev/mapper/xffdb_vote2
           2           0 MEMBER                               NORMAL                   /dev/mapper/xffdb_log1
           3           2 MEMBER                               NORMAL                   /dev/mapper/xffdb_vote3
           2           1 MEMBER                               NORMAL                   /dev/mapper/xffdb_log2

10 rows selected.

mount磁盘组成功

SQL>  alter diskgroup data_dg mount 
NOTE: cache registered group DATA_DG number=1 incarn=0x4178bd5e
NOTE: cache began mount (first) of group DATA_DG number=1 incarn=0x4178bd5e
NOTE: Assigning number (1,0) to disk (/dev/mapper/xffdb_data01_new)
NOTE: Assigning number (1,4) to disk (/dev/mapper/xffdb_data05_new)
NOTE: Assigning number (1,2) to disk (/dev/mapper/xffdb_data03)
NOTE: Assigning number (1,5) to disk (/dev/mapper/xffdb_data06)
NOTE: Assigning number (1,3) to disk (/dev/mapper/xffdb_data04)
NOTE: Assigning number (1,1) to disk (/dev/mapper/xffdb_data02_new)
Wed Jul 31 05:27:47 2024
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 46 for pid 29, osid 26738
NOTE: cache opening disk 0 of grp 1: DATA_DG_0000 path:/dev/mapper/xffdb_data01_new
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 of grp 1: DATA_DG_0001 path:/dev/mapper/xffdb_data02_new
NOTE: cache opening disk 2 of grp 1: DATA_DG_0002 path:/dev/mapper/xffdb_data03
NOTE: cache opening disk 3 of grp 1: DATA_DG_0003 path:/dev/mapper/xffdb_data04
NOTE: cache opening disk 4 of grp 1: DATA_DG_0004 path:/dev/mapper/xffdb_data05_new
NOTE: cache opening disk 5 of grp 1: DATA_DG_0005 path:/dev/mapper/xffdb_data06
NOTE: cache mounting (first) external redundancy group 1/0x4178BD5E (DATA_DG)
Wed Jul 31 05:27:47 2024
* allocate domain 1, invalid = TRUE 
kjbdomatt send to inst 1
Wed Jul 31 05:27:47 2024
NOTE: attached to recovery domain 1
NOTE: cache recovered group 1 to fcn 0.1814063801
NOTE: redo buffer size is 256 blocks (1053184 bytes)
Wed Jul 31 05:27:47 2024
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (DATA_DG)
NOTE: LGWR found thread 1 closed at ABA 12401.4517
NOTE: LGWR mounted thread 1 for diskgroup 1 (DATA_DG)
NOTE: LGWR opening thread 1 at fcn 0.1814063801 ABA 12402.4518
NOTE: cache mounting group 1/0x4178BD5E (DATA_DG) succeeded
NOTE: cache ending mount (success) of group DATA_DG number=1 incarn=0x4178bd5e
Wed Jul 31 05:27:47 2024
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup DATA_DG was mounted
SUCCESS:  alter diskgroup data_dg mount

重要提醒:手工直接对multipath设备权限所有者操作,当该设备被访问之时权限可能恢复成当初默认root:disk,对于这样的设备建议通过udev进行设置权限和所有者等信息