Bug 21915719 Database hang or may fail to OPEN in 12c IBM AIX or HPUX Itanium – ORA-742, DEADLOCK or ORA-600 [kcrfrgv_nextlwn_scn] ORA-600 [krr_process_read_error_2]

Applies to:
Oracle Database – Enterprise Edition – Version 12.1.0.1 to 12.1.0.2 [Release 12.1]
Oracle Database Cloud Schema Service – Version N/A and later
Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) – Version N/A and later
Oracle Cloud Infrastructure – Database Service – Version N/A and later
Oracle Database Cloud Exadata Service – Version N/A and later
IBM AIX on POWER Systems (64-bit)

Description
Oracle 12c introduces a new default feature of using multiple LGWRs which may lead to DEADLOCK / Database Hang or ORA-742 “Log read detects lost write” or ORA-600 [kcrfrgv_nextlwn_scn] during instance OPEN or ORA-600 [krr_process_read_error_2] during Recovery on IBM AIX and potentially on HPUX Itanium 64bit.
The database may become unusable and fail to be OPEN.

Occurrence
This issue is specific to RDBMS version 12c (12.1.0.1 or 12.1.0.2) where the new default feature of using multiple LGWRs is introduced.
It affects databases on IBM AIX and potentially on HPUX Itanium 64bit

Symptoms
ORACLE on IBM AIX or HPUX Itanium 64bit with RDBMS Version 12c.

DEADLOCK or ORA-742 “Log read detects lost write” or ORA-600 [kcrfrgv_nextlwn_scn] during instance OPEN or ORA-600 [krr_process_read_error_2] during Recovery caused by bug 21915719.

PMON may terminate the instance while extensive block recovery is being performed.

A DEADLOCK example is with LG0[n] waiting on ‘LGWR worker group ordering’. Example from a System State Dump trace file:

PROCESS 18: LG01
SO: 0x7000101f95ad720, type: 4, owner: 0x7000101f84195f8, flag: INIT/-/-/0x00
if: 0x3 c: 0x3
   proc=0x7000101f84195f8, name=session, file=ksu.h LINE:13590 ID:, pg=0
conuid=0
  (session) sid: 865 ser: 1 trans: 0x0, creator: 0x7000101f84195f
 Current Wait Stack:
   0: waiting for 'LGWR worker group ordering'
      lwn_id=0x58, phase=0x1, =0x0
      wait_id=4947 seq_num=4948 snap_id=1
      wait times: snap=13 min 21 sec, exc=13 min 21 sec, total=13 min 21 sec
      wait times: max=infinite, heur=13 min 21 sec
      wait counts: calls=1 os=267
      in_wait=1 iflags=0x5a0
  There is at least one session blocking this session.
    Dumping 1 direct blocker(s):
      inst: 1, sid: 817, ser: 1
    Dumping final blocker:
      inst: 1, sid: 817, ser: 1
  There are 730 sessions blocked by this session.
.
.
PROCESS 17: LG00
SO: 0x7000101f85bcc60, type: 4, owner: 0x7000101f93eeb20, flag: INIT/-/-/0x00
if: 0x3 c: 0x3
   proc=0x7000101f93eeb20, name=session, file=ksu.h LINE:13590 ID:, pg=0
conuid=0
  (session) sid: 817 ser: 1 trans: 0x0, creator: 0x7000101f93eeb20
  ksuxds FALSE at location: 0
  service name: SYS$BACKGROUND
  Current Wait Stack:
   0: waiting for 'LGWR worker group ordering'
      lwn_id=0x56, phase=0x1, =0x0
      wait_id=1630680 seq_num=57841 snap_id=1
      wait times: snap=13 min 21 sec, exc=13 min 21 sec, total=13 min 21 sec
      wait times: max=infinite, heur=13 min 21 sec
      wait counts: calls=2 os=268
      in_wait=1 iflags=0x15a0
  There is at least one session blocking this session.
    Dumping 1 direct blocker(s):
      inst: 1, sid: 865, ser: 1
    Dumping final blocker:
      inst: 1, sid: 865, ser: 1
 

The instance may fail to OPEN with errors ORA-600 [kcrfrgv_nextlwn_scn] and/or ORA-600 [krr_process_read_error_2]:

Recovery Session Failed with:

ORA-00283: recovery session canceled due to errors
ORA-00600: internal error code, arguments: [krr_process_read_error_2],


Alter database open fails with:

ORA-00600: internal error code, arguments: [kcrfrgv_nextlwn_scn] .....
ORA-600 signalled during: ALTER DATABASE OPEN...

Workaround
Disable the new feature of multiple LGWR worker processes by proactively setting _use_single_log_writer=true.

Setting _use_single_log_writer = true is a safe workaround; it is the behavior before 12c where multiple LGWR worker groups were not available.

ALTER SYSTEM SET "_use_single_log_writer"=TRUE SID='*' SCOPE=SPFILE;
-- Restart the database or all instances of the RAC database

Note that while _use_single_log_writer=true is not set, then error ORA-600 [kcrfrgv_nextlwn_scn] might be produced avoiding the database to OPEN. Once the problem is introduced, _use_single_log_writer=true may not fix it. _use_single_log_writer = true prevents inconsistencies in the redo log to be introduced which causes that error.
If the parameter does not help, because the problem was already introduced when _use_single_log_writer=true had not been proactively set, then Point in Time Recovery (PITR) or Flashback Database are the options to recover from this situation.
参考:ALERT: Bug 21915719 Database hang or may fail to OPEN in 12c IBM AIX or HPUX Itanium – ORA-742, DEADLOCK or ORA-600 [kcrfrgv_nextlwn_scn] ORA-600 [krr_process_read_error_2] (Doc ID 1957710.1)

_use_single_log_writer和_max_outstanding_log_writes

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:_use_single_log_writer和_max_outstanding_log_writes

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

SCALABLE LGWR是12cR1中引入的一个令人激动的特性, 这是由于在OLTP环境中LGWR写日志往往成为系统的主要性能瓶颈, 如果LGWR进程能像DBWR(DBW0~DBWn)那样多进程(LGNN)写出redo到LOGFILE那么就可能大幅释放OLTP的并发能力,增长Transcation系统的单位时间事务处理能力。这里在12.2版本中进行测试,确定_use_single_log_writer和_max_outstanding_log_writes参数对于SCALABLE LGWR特性的影响
数据库版本

SQL> select * from v$version;
BANNER                                                                                         CON_ID
------------------------------------------------------------------------------------------ ----------
Oracle Database 12c Enterprise Edition Release 12.2.0.0.3 - 64bit Production                        0
PL/SQL Release 12.2.0.0.3 - Production                                                              0
CORE    12.2.0.0.3      Production                                                                  0
TNS for Linux: Version 12.2.0.0.3 - Production                                                      0
NLSRTL Version 12.2.0.0.3 - Production                                                              0

_use_single_log_writer和_max_outstanding_log_writes默认值

SQL> /
Enter value for param: _use_single_log_writer
old   6:    and upper(a.ksppinm) LIKE upper('%&param%')
new   6:    and upper(a.ksppinm) LIKE upper('%_use_single_log_writer%')
NAME                                                 VALUE                    DESCRIPTION
---------------------------------------------------- ------------------------ -------------------------------------------
_use_single_log_writer                               ADAPTIVE                 Use a single process for redo log writing
SQL> /
Enter value for param: _max_outstanding_log_writes
old   6:    and upper(a.ksppinm) LIKE upper('%&param%')
new   6:    and upper(a.ksppinm) LIKE upper('%_max_outstanding_log_writes%')
NAME                                                 VALUE                    DESCRIPTION
---------------------------------------------------- ------------------------ ----------------------------------------------
_max_outstanding_log_writes                          2                        Maximum number of outstanding redo log writes

lg进程数量
这里可以看出来,有一个lgwr进程,两个lg进程和_max_outstanding_log_writes参数配置匹配

[oracle@ora1221 ~]$ ps -ef|grep ora_lg
oracle    49790      1  0 10:32 ?        00:00:00 ora_lgwr_orcl12c2
oracle    49794      1  0 10:32 ?        00:00:00 ora_lg00_orcl12c2
oracle    49798      1  0 10:32 ?        00:00:00 ora_lg01_orcl12c2

修改_max_outstanding_log_writes参数
通过修改_max_outstanding_log_writes参数为4,发现lg进程数量也变为了4,证明_max_outstanding_log_writes进程决定lg进程数量

SQL> alter system set "_max_outstanding_log_writes"=4 ;
alter system set "_max_outstanding_log_writes"=4
                 *
ERROR at line 1:
ORA-02095: specified initialization parameter cannot be modified
SQL> alter system set "_max_outstanding_log_writes"=4 scope=spfile;
System altered.
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 2516582400 bytes
Fixed Size                  8260048 bytes
Variable Size             671090224 bytes
Database Buffers         1828716544 bytes
Redo Buffers                8515584 bytes
Database mounted.
Database opened.
SQL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.0.3 - 64bit Production
[oracle@ora1221 ~]$ ps -ef|grep lg
oracle    72339      1  0 13:45 ?        00:00:00 ora_lgwr_orcl12c2
oracle    72343      1  0 13:45 ?        00:00:00 ora_lg00_orcl12c2
oracle    72347      1  0 13:45 ?        00:00:00 ora_lg01_orcl12c2
oracle    72351      1  0 13:45 ?        00:00:00 ora_lg02_orcl12c2
oracle    72359      1  0 13:45 ?        00:00:00 ora_lg03_orcl12c2

修改_use_single_log_writer参数
通过测试_use_single_log_writer参数修改,我们可以确定_use_single_log_writer修改为true后,数据库恢复到12c之前的lgwr管理方式

[oracle@ora1221 ~]$ ss
SQL*Plus: Release 12.2.0.0.3 Production on Thu Aug 6 13:45:33 2015
Copyright (c) 1982, 2016, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.0.3 - 64bit Production
SQL> alter system set "_use_single_log_writer"=1 ;
alter system set "_use_single_log_writer"=1
                 *
ERROR at line 1:
ORA-02095: specified initialization parameter cannot be modified
SQL>  alter system set "_use_single_log_writer"=1 scope=spfile;
 alter system set "_use_single_log_writer"=1 scope=spfile
*
ERROR at line 1:
ORA-00096: invalid value 1 for parameter _use_single_log_writer, must be from
among ADAPTIVE, FALSE, TRUE
SQL> alter system set "_use_single_log_writer"=TRUE scope=spfile;
System altered.
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.0.3 - 64bit Production
[oracle@ora1221 ~]$ ps -ef|grep lg
oracle    72702  71510  0 13:46 pts/0    00:00:00 grep lg
[oracle@ora1221 ~]$ ss
SQL*Plus: Release 12.2.0.0.3 Production on Thu Aug 6 13:46:50 2015
Copyright (c) 1982, 2016, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 2516582400 bytes
Fixed Size                  8260048 bytes
Variable Size             671090224 bytes
Database Buffers         1828716544 bytes
Redo Buffers                8515584 bytes
Database mounted.
Database opened.
SQL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.0.3 - 64bit Production
[oracle@ora1221 ~]$ ps -ef|grep lg
oracle    72754      1  0 13:46 ?        00:00:00 ora_lgwr_orcl12c2
oracle    73008  71510  0 13:47 pts/0    00:00:00 grep lg

从这里可以确定_use_single_log_writer确定是否启用SCALABLE LGWR(多个lg子进程),_max_outstanding_log_writes确定lg进程个数