使用alter system events导致库crash

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:使用alter system events导致库crash

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

由于数据库导入大量数据的时候io等待比较高,新的存储无法直接挂过来,考虑使用nfs挂载过来,然后存放redo缓解io压力。
数据库版本信息

SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for IBM/AIX RISC System/6000: Version 10.2.0.4.0 - Productio
NLSRTL Version 10.2.0.4.0 - Production

挂载参数(mount命令查看)

10.240.10.1 /top/data4/nfs   /back1            nfs3
Aug 29 13:40 cio,rw,bg,hard,nointr,rsize=32768,wsize=32768,proto=tcp,noac,vers=3,timeo=600

尝试创建redo

SQL> alter database add logfile group 13 ('/back/newxff/redo13.log') size 2048m;
alter database add logfile group 13 ('/back1/newxff/redo13.log') size 2048m
*
ERROR at line 1:
ORA-00301: error in adding log file '/back1/newxff/redo13.log' - file cannot be
created
ORA-27054: NFS file system where the file is created or resides is not mounted
with correct options
Additional information: 6

根据mos文档
ORA-27054 ERRORS WHEN RUNNING RMAN WITH NFS (文档 ID 387700.1)

SQL> Alter system set events '10298 trace name context forever,level 32';
System altered.
Mon Sep  5 10:10:18 2016
Thread 1 advanced to log sequence 109 (LGWR switch)
  Current log# 1 seq# 109 mem# 0: +DATA/xff/onlinelog/group_1.257.921671023
Mon Sep  5 10:12:19 2016
OS Pid: 160710 executed alter system set events '10298 trace name context forever,level 32'

创建redo成功

SQL> alter database add logfile group 13 ('/back1/newxff/redo13.log') size 2048m;
System altered.
Mon Sep  5 10:18:13 2016
alter database add logfile group 13 ('/back1/newxff/redo13.log') size 2048m
Mon Sep  5 10:18:43 2016
Completed: alter database add logfile group 13 ('/back1/newxff/redo13.log') size 2048m

数据库crash

Mon Sep  5 10:19:06 2016
Errors in file /opt/oracle/admin/xff/bdump/xff1_lgwr_246566.trc:
ORA-00313: open failed for members of log group 13 of thread 1
ORA-00312: online log 13 thread 1: '/back1/newxff/redo13.log'
ORA-27054: NFS file system where the file is created or resides is not mounted with correct options
Additional information: 6
Mon Sep  5 10:19:06 2016
Errors in file /opt/oracle/admin/xff/bdump/xff1_lgwr_246566.trc:
ORA-00313: open failed for members of log group 13 of thread 1
ORA-00312: online log 13 thread 1: '/back1/newxff/redo13.log'
ORA-27054: NFS file system where the file is created or resides is not mounted with correct options
Additional information: 6
Mon Sep  5 10:19:06 2016
LGWR: terminating instance due to error 313
Mon Sep  5 10:19:06 2016
System state dump is made for local instance
System State dumped to trace file /opt/oracle/admin/xff/bdump/xff1_diag_299654.trc

通过报错很明显可以看出来数据库挂掉的原因和当时不能创建redo的原因一样,都是由于ORA-27054导致数据库挂了,但是为什么创建redo成功,但是使用redo失败呢?
这里需要注意使用的命令是events,而这个命令是对当前会话和后续新建的会话生效,也就是说他不会对数据库已经存在的后台进程生效,那也就可以理解了,我创建redo是在执行events的当前命令行窗口处理的,因此可以创建成功;但是lgwr进程是数据库一启动就存在的进程,现在设置的events对他没有影响,因此当lgwr去使用redo的时候无法正常使用因此就导致数据库crash掉。如果希望event对已经存在的进程生效,可以考虑使用oradebug对进程进行设置event(这个案例主要要设置多个后台进程不光lgwr访问redo),或者设置event=的方式,然后重启数据库让其生效。