新删除data guard归档日志shell脚本

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:新删除data guard归档日志shell脚本

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

以前写过删除dataguard归档日志的方法(删除data guard归档日志),但是以前的方法确实不够灵活也不够简便,现在提供最新的一次在客户现场部署的dg删除归档日志的shell脚本

#!/bin/sh
source ~/.bash_profile
grep "Media Recovery Log" $ORACLE_BASE/admin/$ORACLE_SID/bdump/alert_${ORACLE_SID}.log|
\ awk '{print $4}'|sed -e 's/^/rm /' >/tmp/rmarchlog.sh
chmod +x /tmp/rmarchlog.sh
/tmp/rmarchlog.sh
cd $ORACLE_BASE/admin/$ORACLE_SID/bdump
cat alert_${ORACLE_SID}.log >>alert_${ORACLE_SID}.log.bak
echo ''>alert_${ORACLE_SID}.log
rm -f /tmp/rmarchlog.sh
$ORACLE_HOME/bin/rman target / <<XIFENFEI
crosscheck archivelog all;
delete expired archivelog all;
YES
exit;
XIFENFEI

根据alert日志中dg应用日志的信息”Media Recovery Log”信息来删除掉相关的归档日志,可以保证应用过的归档日志都被删除,而没有应用的归档日志都保留.

shell监控dataguard备库是否正常应用日志

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:shell监控dataguard备库是否正常应用日志

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一直在思索怎么去监控dg比较方便,又能够做到比较适用.想到了几种方法:
1.使用主备库两边的alert日志,但是这样的方法需要配置ssh,用来一个节点获取另外一个节点的alert日志
2.通过查询v$archived_log或者其他相关视图,然后主备库进行比较,但是这个需要访问另外一个库,需要另外库的登录信息
3.通过查询备库的v$archived_log视图,粗略评估dg是否工作正常.
这里我选择了3,dg的监控大部分时候是为了让人及时的发现日志应用异常,然后人工干预处理,从而减少修改gap或者重建dg的概率.而这个额监控可以在很大程度上发现dg应用归档日志异常,从而确定dg是否工作正常,如果发现工作异常,及时处理,可以减少很多工作量,甚至拯救你的数据.

#!/bin/bash
source ~/.bash_profile
#check time(M)
export CHECK_M=120
export RESULT_FILE=/tmp/dg_switch_check.log
$ORACLE_HOME/bin/sqlplus -silent "/ as sysdba" <<XFF>/tmp/check_dg.log
set pagesize 0 feedback off verify off heading off echo off
 select ceil((sysdate-next_time)*24*60) "M"
 from v\$archived_log
 where applied='YES' AND SEQUENCE#=(SELECT MAX(SEQUENCE#)
 FROM V\$ARCHIVED_LOG WHERE applied='YES');
exit;
XFF
GET_M=`cat /tmp/check_dg.log`
rm /tmp/check_dg.log
if [ ${CHECK_M} -lt ${GET_M} ];
then
    echo "check dataguard time:`date`">$RESULT_FILE
    echo "The last time application archivelog happened in $GET_M minutes ago">>$RESULT_FILE
else
 echo ''>$RESULT_FILE
fi

针对这样的脚本,根据你的dg归档切换的频率,设置监控dg的最近一次日志应用与当前时间差,然后判断dg是否工作正常.根据监控程序的特点,可以通过判断结果集文件,然后邮件/短信或者其他方式处理.

因v$archived_log视图记录异常导致dg MRP进程异常

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:因v$archived_log视图记录异常导致dg MRP进程异常

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

版本信息

操作系统Linux 4.8 x86
数据库版本ORACLE 9.2.0.4

alert日志报错
MRP进程出现异常报ORA-00310/ORA-00334错误

…………
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_75.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_76.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_77.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_78.dbf
Media Recovery Log /u01/oracle/oradata/xifenfei/redo02.log
MRP0: Background Media Recovery terminated with error 310
Thu Nov  8 07:44:39 2012
Errors in file /u01/oracle/admin/lunar/bdump/lunar_mrp0_25625.trc:
ORA-00310: archived log contains sequence 85; sequence 79 required
ORA-00334: archived log: '/u01/oracle/oradata/xifenfei/redo02.log'
Recovery interrupted.
MRP0: Background Media Recovery process shutdown

trace文件

*** SESSION ID:(17.13) 2012-11-08 07:24:12.986
Background Managed Standby Recovery process started
*** 2012-11-08 07:24:18.023
Managed Recovery: Active posted.
*** 2012-11-08 07:41:03.171
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_64.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_65.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_66.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_67.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_68.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_69.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_70.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_71.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_72.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_73.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_74.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_75.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_76.dbf
*** 2012-11-08 07:41:39.083
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_77.dbf
*** 2012-11-08 07:44:39.171
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_78.dbf
Media Recovery Log /u01/oracle/oradata/xifenfei/redo02.log
Background Media Recovery terminated with error 310
ORA-00310: archived log contains sequence 85; sequence 79 required
ORA-00334: archived log: '/u01/oracle/oradata/xifenfei/redo02.log'
----- Redo read statistics for thread 1 -----
Read rate (ASYNC): 21990Kb in 1221.38s => 0.02 Mb/sec
Longest record: 1Kb, moves: 0/92129 (0%)
Change moves: 34869/213735 (16%), moved: 2Mb
----------------------------------------------
*** 2012-11-08 07:44:39.404
Managed Recovery: Not Active posted.
Background Media Recovery process shutdown
*** 2012-11-08 07:44:39.406

猜想数据库恢复需要sequence为79的归档日志,但是该归档日志对应的为文件为redo02.log,而该redo02的seq为85所以使得MRP进程异常

使用rman尝试恢复

RMAN> recover database ;
Starting recover at 08-NOV-12
using target database controlfile instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=8 devtype=DISK
starting media recovery
unable to find archive log
archive log thread=1 sequence=79
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 11/08/2012 08:12:48
RMAN-06054: media recovery requesting unknown log: thread 1 scn 12286829427051

测试证明rman也无法正常的恢复该异常问题

验证猜想

SQL> select name from v$archived_log where SEQUENCE#=79;
NAME
-------------------------------------------------------------------
/u01/oracle/oradata/lunar/arch/1_79.dbf
/u01/oracle/oradata/xifenfei/redo02.log
/u01/oracle/oradata/lunar/arch/1_79.dbf
SQL> select dest_id,name from v$archived_log where SEQUENCE#=79;
   DEST_ID  NAME
----------  ---------------------------------------------------
         2  /u01/oracle/oradata/lunar/arch/1_79.dbf
         1  /u01/oracle/oradata/xifenfei/redo02.log
         1  /u01/oracle/oradata/lunar/arch/1_79.dbf
SQL> select sequence#,group# from v$log;
 SEQUENCE#     GROUP#
---------- ----------
        86          1
        85          2
        87          3
SQL> select member from v$logfile where group#=2;
MEMBER
---------------------------------------------------------------
/u01/oracle/oradata/xifenfei/redo02.log

通过查询上面相关视图,证实了猜想是因为redo log被注册进入了v$archived_log导致该故障,解决该问题的思路是把redo log file从备库控制文件的v$archived_log视图中拿掉.具体方法是:
1.如果主库正常,那直接生成standby controlfile来实现
2.如果主库也是相同情况,那么先重建主库控制文件,然后重建standby controlfile来实现(该方法需要维护窗口)

解决问题思路

--查询主库,确定主库正常
SQL> select name from v$archived_log where SEQUENCE#=79;
NAME
------------------------------------------------------------
/u01/oracle/oradata/xifenfei/archive/1_79.dbf
lunar
--重新创建standby controlfile
###############################################################
注:如果主库和备库的数据文件路径不完全一致,
建议通过设置db_file_name_convert来直接实现备库数据文件路径的转换,
而不建议通过alter database rename file来实现重命名
###############################################################
SQL> ALTER DATABASE CREATE STANDBY CONTROLFILE AS
   2 '/u01/oracle/oradata/lunar/control01.ctl' reuse;
Database altered.
--重新启动备库
SQL> startup nomount;
ORACLE instance started.
Total System Global Area  353441008 bytes
Fixed Size                   451824 bytes
Variable Size             184549376 bytes
Database Buffers          167772160 bytes
Redo Buffers                 667648 bytes
SQL> alter database mount standby database;
Database altered.
--开启日志应用
SQL> alter database recover managed standby database disconnect from session;
Database altered.
--alert日志
Thu Nov  8 08:28:16 2012
Completed: alter database recover managed standby database di
Thu Nov  8 08:28:20 2012
Restarting dead background process QMN0
QMN0 started with pid=13
Thu Nov  8 08:29:45 2012
Fetching gap sequence for thread 1, gap sequence 79-87
Trying FAL server: xifenfei
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_79.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_80.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_81.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_82.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_83.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_84.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_85.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_86.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_87.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_88.dbf
Media Recovery Waiting for thread 1 seq# 89

ORA-00600[3689]引起MRP进程异常

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-00600[3689]引起MRP进程异常

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

今天检查数据库发现一数据库因为添加数据文件导致ORA-00600[3689]错误,进入引起MRP进程异常,从而使得数据库可以正常接收归档日志,但是不能在备库应用归档日志
数据库版本

SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for Solaris: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 – Production

检查日志

Fri Oct 19 12:00:22 2012
RFS[1]: No standby redo logfiles created
RFS[1]: Archived Log: '/baceldba/archive/1_30374_730819916.dbf'
Fri Oct 19 12:00:28 2012
Media Recovery Log /baceldba/archive/1_30374_730819916.dbf
WARNING: File being created with same name as in Primary
Existing file may be overwritten
Fri Oct 19 12:00:50 2012
Recovery created file /data/oracle/oradata/vodapp/NETTVCLPS02.dbf
Successfully added datafile 32 to media recovery
Datafile #32: '/data/oracle/oradata/vodapp/NETTVCLPS02.dbf '
WARNING: File being created with same name as in Primary
Existing file may be overwritten
Fri Oct 19 12:01:15 2012
Recovery created file /data/oracle/oradata/vodapp/UCICMS01_DATA04.dbf
Successfully added datafile 33 to media recovery
Datafile #33: '/data/oracle/oradata/vodapp/UCICMS01_DATA04.dbf'
Fri Oct 19 12:01:16 2012
Errors in file /data/oracle/admin/vodapp/bdump/vodapp_mrp0_21304.trc:
ORA-00600: internal error code, arguments: [3689], [33], [], [], [], [], [], []
Errors with log /baceldba/archive/1_30374_730819916.dbf
MRP0: Background Media Recovery terminated with error 600  <--mrp进程异常
Fri Oct 19 12:01:16 2012
Errors in file /data/oracle/admin/vodapp/bdump/vodapp_mrp0_21304.trc:
ORA-00600: internal error code, arguments: [3689], [33], [], [], [], [], [], []
Some recovered datafiles maybe left media fuzzy
Media recovery may continue but open resetlogs may fail
Fri Oct 19 12:01:18 2012
Errors in file /data/oracle/admin/vodapp/bdump/vodapp_mrp0_21304.trc:
ORA-00600: internal error code, arguments: [3689], [33], [], [], [], [], [], []
Fri Oct 19 12:01:18 2012
Errors in file /data/oracle/admin/vodapp/bdump/vodapp_mrp0_21304.trc:
ORA-00600: internal error code, arguments: [3689], [33], [], [], [], [], [], []

通过观察日志发现,数据库的mrp0进程因为ora-600的错误导致异常终止,从而使得dg备库无法正常使用归档日志

分析trace文件得出

*** 2012-10-19 12:01:16.327
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [3689], [33], [], [], [], [], [], []
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp()+744         CALL     ksedst()             000000840 ?
                                                   FFFFFFFF7FFFA08C ?
                                                   000000000 ?
                                                   FFFFFFFF7FFF6B80 ?
                                                   FFFFFFFF7FFF58E8 ?
                                                   FFFFFFFF7FFF62E8 ?
kgeriv()+220         PTR_CALL 0000000000000000     000106000 ? 106323304 ?
                                                   106323000 ? 000106323 ?
                                                   000106000 ? 106323304 ?
kgesiv()+112         CALL     kgeriv()             10631DCD0 ? 000000000 ?
                                                   000000E69 ? 000000001 ?
                                                   FFFFFFFF7FFFA5E8 ?
                                                   000001430 ?
ksesic1()+96         CALL     kgesiv()             10631DCD0 ? 10646AD00 ?
                                                   000000E69 ? 000000001 ?
                                                   FFFFFFFF7FFFA5E8 ?
                                                   000001420 ?
kcvadc1_10gr2()+265  CALL     ksesic1()            000000E69 ? 00010631D ?
2                                                  1063232F8 ? 000106000 ?
                                                   106323000 ? 000106323 ?
kcoapl()+5664        PTR_CALL 0000000000000000     FFFFFFFF7A5570F8 ?
                                                   FFFFFFFF7FFFB320 ?
                                                   FFFFFFFF76667E18 ?
                                                   000000000 ? 000000021 ?
                                                   000000020 ?
kcramr()+5352        CALL     kcoapl()             FFFFFFFF76667E00 ?
                                                   000000003 ? 1055136B0 ?
                                                   105513340 ? 00000001B ?
                                                   000000017 ?
krddmr()+2688        CALL     kcramr()             000000000 ?
                                                   FFFFFFFF7A55FF78 ?
                                                   000002000 ? 000106000 ?
                                                   0000001F6 ? 464897868 ?
krsmdp()+2332        CALL     krddmr()             105502000 ?
                                                   FFFFFFFF7A5570F8 ?
                                                   000000001 ? 10646C760 ?
                                                   464897868 ? 000000000 ?
ksbrdp()+896         PTR_CALL 0000000000000000     000380000 ? 000380000 ?
                                                   105517000 ? 000106328 ?
                                                   000106000 ? 00000FFFF ?
opirip()+824         CALL     ksbrdp()             000106320 ? 380007734 ?
                                                   000380000 ? 00038000E ?
                                                   000106000 ? 1005DCF80 ?
opidrv()+1200        CALL     opirip()             10632A000 ? 000106000 ?
                                                   000106332 ? 380007000 ?
                                                   00010632A ? 1063D73A0 ?
sou2o()+80           CALL     opidrv()             10632D460 ? 000000001 ?
                                                   000000032 ? 000000000 ?
                                                   000000032 ? 000106000 ?
opimai_real()+268    CALL     sou2o()              FFFFFFFF7FFFF968 ?
                                                   000000032 ? 000000004 ?
                                                   FFFFFFFF7FFFF990 ?
                                                   105C1F000 ? 000105C1F ?
main()+152           CALL     opimai_real()        000000003 ?
                                                   FFFFFFFF7FFFFA68 ?
                                                   000000000 ? 000000000 ?
                                                   0023F9764 ? 000014400 ?
_start()+380         CALL     main()               000000001 ?
                                                   FFFFFFFF7FFFFB78 ?
                                                   000000000 ?
                                                   FFFFFFFF7FFFFA70 ?
                                                   FFFFFFFF7FFFFB80 ?
                                                   FFFFFFFF7CF00200 ?
--------------------- Binary Stack Dump ---------------------

查询mos发现
Bug 5230162 : ORA-600[3689][42] ON PHYSICAL STANDBY PREVENT MEDIA RECOVERY描述相符

处理建议
1.10.2.0.1数据库不稳定,bug较多,建议升级数据库到10.2.0.4或者10.2.0.5
2.当前standby_file_management为true,建议修改为false,每次手工两边增加数据文件
3.主库添加数据文件后,观察备库的mrp进程是否正常,如果异常
1)尝试启动mrp进程,如果因为新增加的数据文件导致异常,那先使用rman或者其他方法拷贝主库新建立数据文件到备库,然后启动mrp进程
2)如果因为某种原因导致归档日志出现gap,可以从备份中找出归档日志,如果备份中没有,那可以使用基于scn的备份方式来修复该故障。
3)如果scn相差较大,建议直接重建备库

DATAGUARD中MAXIMUM AVAILABILITY+LGWR SYNC导致主库不能启动

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:DATAGUARD中MAXIMUM AVAILABILITY+LGWR SYNC导致主库不能启动

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

qq群里面的一朋友的的DG因为备库已经下架,主库重启的时候不能正常启动,帮忙处理结果如下
版本相关信息

Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production
ORACLE_HOME = /export/home/oracle/product/9.2.0
System name:    SunOS
Node name:      eTermSrv2
Release:        5.10
Version:        Generic
Machine:        sun4u
Instance name: abcd

数据库不能启动日志

Fri Aug 10 12:37:56 2012
ALTER DATABASE OPEN
Fri Aug 10 12:37:56 2012
LGWR: Primary database is in CLUSTER CONSISTENT mode
LGWR: Primary database is in MAXIMUM AVAILABILITY mode
LGWR: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR
LNS0 started with pid=16
Fri Aug 10 12:37:59 2012
ORA-1013 signalled during: ALTER DATABASE OPEN...
Fri Aug 10 12:41:45 2012
LGWR: Error 12535 verifying archivelog destination LOG_ARCHIVE_DEST_2
LGWR: Continuing...
Fri Aug 10 12:41:45 2012
Errors in file /export/home/oracle/admin/abcd/bdump/abcd_lgwr_11504.trc:
ORA-12535: TNS:operation timed out
******************************************************************
LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2
******************************************************************
Creating archive destination LOG_ARCHIVE_DEST_2: 'ora9i'
LGWR: Error 12535 creating archivelog file 'ora9i'
Fri Aug 10 12:45:32 2012
Errors in file /export/home/oracle/admin/abcd/bdump/abcd_lgwr_11504.trc:
ORA-12535: TNS:operation timed out
LGWR: Completed archiving  log 1 thread 1 sequence 6808
Thread 1 advanced to log sequence 6808
Fri Aug 10 12:45:32 2012
Errors in file /export/home/oracle/admin/abcd/bdump/abcd_lgwr_11504.trc:
ORA-01157: cannot identify/lock data file 1 - see DBWR trace file
ORA-01110: data file 1: '/export/home/oracle/oradata/abcd/system01.dbf'
LGWR: terminating instance due to error 1157
Instance terminated by LGWR, pid = 11504

通过这里大概看出数据库原dg配置是MAXIMUM AVAILABILITY,因为备机下架,导致ora9i的tns不能访问,从而出现一些列错误,其中使得lgwr异常,因为oracle的某种内部机制,导致dbwr不能访问数据文件(这里体现出来是system01.dbf不能访问,但实际上应该是所有所有数据文件均不能访问,因为system01.dbf位于第一,所以报出该错误.)

*** SESSION ID:(3.1) 2012-08-10 12:37:56.847
Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR
*** 2012-08-10 12:41:45.614
Error 12535 connecting to destination LOG_ARCHIVE_DEST_2 standby host 'ora9i'
*** 2012-08-10 12:41:45.615
LGWR: Error 12535 verifying archivelog destination LOG_ARCHIVE_DEST_2
Continuing...
ORA-12535: TNS:operation timed out
*** 2012-08-10 12:45:32.514
Error 12535 connecting to destination LOG_ARCHIVE_DEST_2 standby host 'ora9i'
Error 12535 attaching to destination LOG_ARCHIVE_DEST_2 standby host 'ora9i'
*** 2012-08-10 12:45:32.514
LGWR: Error 12535 creating archivelog file 'ora9i'
*** 2012-08-10 12:45:32.514
kcrrfail: dest:2 err:12535 force:0
ORA-12535: TNS:operation timed out
error 1157 detected in background process
ORA-01157: cannot identify/lock data file 1 - see DBWR trace file
ORA-01110: data file 1: '/export/home/oracle/oradata/abcd/system01.dbf'

通过trace文件,更加清楚的说明,可能是因为lgwr异常导致dbwr访问数据文件出现问题.

问题分析/解决汇总

SQL> show parameter log_archive_dest_state_1;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_state_1             string      enable
SQL> show parameter log_archive_dest_state_2;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_state_2             string      enable
SQL> show parameter log_archive_dest_1;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_1                   string      location=/export/home/oracle/o
                                                 radata/abcd/archive
SQL> show parameter log_archive_dest_2;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_2                   string      SERVICE=ora9i LGWR SYNC AFFIRM
SQL>select  protection_mode,database_role from v$database;
PROTECTION_MODE      DATABASE_ROLE
-------------------- ----------------
MAXIMUM AVAILABILITY PRIMARY
SQL>   show parameter succ;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_min_succeed_dest         integer     1

通过上面的sql查询结果和alert日志与trace文件结合分析,大体结论是:
数据库的dg的保护模式为:MAXIMUM AVAILABILITY
数据库的日志传输方式是:LGWR SYNC AFFIRM
现在因为备机下架,主库LGWR不能通过tns访问备库,从而导致数据库的dbwr进程访问异常,是的数据库不能正常被open
这里的故障有一些巧合:MAXIMUM AVAILABILITY+LGWR SYNC AFFIRM+9.2.0.1+SunOS
我不清楚是不是ORACLE的bug导致,但是这个问题解决起来比较简单,只需要修改log_archive_dest_state_2=defer,使得log_archive_dest_2参数不生效,让lgwr不再访问备机

深入理解LOG_ARCHIVE_DEST_n与STANDBY_ARCHIVE_DEST

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:深入理解LOG_ARCHIVE_DEST_n与STANDBY_ARCHIVE_DEST

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一、案例引入
朋友的dg配置如下,问我为什么归档日志都放在了use_db_recovery_file_dest

主库
log_archive_dest_1='location=/U01/app/oracle/oradata/bfodb/arch,valid_for=(ONLINE_LOGFILE,ALL_ROLES)'
log_archive_dest_2='service=tnsname,ARCH SYNC NOAFFIRM delay=0 OPTIONAL max_failure=0 max_connections=1   reopen=300  register net_timeout=180  valid_for=(online_logfile,primary_role)'
standby_archive_dest='location=use_db_recovery_file_dest'
备库
log_archive_dest_1='location=/U01/app/oracle/oradata/bfodb/arch,valid_for=(ONLINE_LOGFILE,ALL_ROLES)'
log_archive_dest_2='location=location=use_db_recovery_file_dest,valid_for=(STANDBY_LOGFILE,STANDBY_ROLE)'
standby_archive_dest='location=use_db_recovery_file_dest'

我很惯性的回答,直接传输过来的日志放到LOG_ARCHIVE_DEST_n下面,fal_*过来的归档放置在standby_archive_dest中,也没有过多的思考为什么,因为我们的库都是这样的规则,我已经认为是一种准则了。这个规则也整合符合了他们的要求(都在use_db_recovery_file_dest)中,没有仔细的去看他们的配置,当朋友说到valid_for的属性的时候,我感觉有点不对头了。
1)他们的log_archive_dest_1配置的是ONLINE_LOGFILE,这个是备库,所以肯定不会放到这里
2)难道是通过log_archive_dest_2使得直接传输过来的日志放置到了use_db_recovery_file_dest中?让朋友查询v$standby_log,发现他们的库没有使用standby redo logfile,也就是说,log_archive_dest_2不可能用来传输日志了,现在剩下来可以传输日志的,只有standby_archive_dest了。
3)问题解决了,都传输到use_db_recovery_file_dest,因为LOG_ARCHIVE_DEST_n都不能用(只是从排除法证明)

二、问题深入分析
standby_archive_dest和log_archive_dest_*到底有什么关系,在什么情况下传输到对应的目录中?

如果备库利用standby redo log在备库端自动归档,那么归档日志将会被放置到LOG_ARCHIVE_DEST_n
如果备库是利用主库的arch进程传输过来的归档,那么将会被放置到STANDBY_ARCHIVE_DEST
补充说明:
1)主库的LOG_ARCHIVE_DEST_n='service'默认的arch传输方式,primary会远程将archived log传输到standby_archive_dest下
2)fal_*是通过arch传输过来的,所以使用的是STANDBY_ARCHIVE_DEST目录,如果不存在这个目录,就使用LOG_ARCHIVE_DEST_n
3)If both parameters are specified, the STANDBY_ARCHIVE_DEST initialization parameter
  overrides the directory location specified with the LOG_ARCHIVE_DEST_n parameter.
如果STANDBY_ARCHIVE_DEST和LOG_ARCHIVE_DEST_n两个参数指定,STANDBY_ARCHIVE_DEST初始化覆盖LOG_ARCHIVE_DEST_n目录指定位置参数。
4)If none of the initialization parameters have been specified, then archived redo log files are stored in the default location for the STANDBY_ARCHIVE_DEST initialization parameter.
如果STANDBY_ARCHIVE_DEST和LOG_ARCHIVE_DEST_n都没有配置,将会把归档日志放到STANDBY_ARCHIVE_DEST的默认目录
5)如果STANDBY_ARCHIVE_DEST未配置,而配置了LOG_ARCHIVE_DEST_n,那么产生的归档将放置到LOG_ARCHIVE_DEST_n
6)建议:STANDBY_ARCHIVE_DEST和LOG_ARCHIVE_DEST_n=‘location’的配置相同
7)在11g,已经不建议使用STANDBY_ARCHIVE_DEST,也就是说建议配置dg的时候尽量使用standby redo logfile

Data Guard出现gap sequence修复

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:Data Guard出现gap sequence修复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一、出现gap sequence现象
备库

Fetching gap sequence in thread 1, gap sequence 710-716
Tue May 31 15:02:38 2011
FAL[client]: Failed to request gap sequence
 GAP - thread 1 sequence 710-716
 DBID 3240478808 branch 746916894
FAL[client]: All defined FAL servers have been attempted.
-------------------------------------------------------------
Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
parameter is defined to a value that is sufficiently large
enough to maintain adequate log switch information to resolve
archivelog gaps.
-------------------------------------------------------------

主库

Tue May 31 13:50:47 2011
FAL[server]: Fail to queue the whole FAL gap
 GAP - thread 1 sequence 710-716
 DBID 3240478808 branch 746916894

二、修复操作
1、查询备库的scn

SQL> SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
-----------
    1154337
--在出现意外datafile header scn不一致的时候,需要根据提示归档日志,找出最小scn

2、确定主库是否添加数据文件

SQL> select FILE#,name from v$datafile where CREATION_CHANGE#> =1154337;
no rows selected

确定主库在这个scn之后是否有添加数据文件,如果添加文件,需要手工在备库添加

3、备库停止日志应用

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

4、主库增量备份并传输到备库上
主库进行增量备份

RMAN> BACKUP INCREMENTAL FROM SCN  1154337 DATABASE
FORMAT '/home/oracle/xff_%U' tag 'XIFENFEI';
[oracle@localhost ~]$ scp xff* 192.168.1.30:/home/oracle/rman

说明:主库之前必须要做过rman的全备(没有全备的库,基于scn的增量备份也能够成功)

5、备库上进行恢复

RMAN> CATALOG START WITH '/home/oracle/rman';
RMAN> RECOVER DATABASE NOREDO;

说明:CATALOG START WITH是10g及其以后版本中才存在功能,没有该功能可以采用catalog或者复制主库的控制文件,rman备份放置和主库备份时相同目录实现。

6、主库上创建standby controlfile文件并传输到备库

RMAN> BACKUP CURRENT CONTROLFILE FOR STANDBY
FORMAT '/home/oracle/xff_ctl.bck';
[oracle@localhost ~]$ scp xff_ctl.bck 192.168.1.30:/home/oracle/rman

创建standby controlfile两步可以需要根据实际情况考虑,大多数情况下不需要

7、备库恢复控制文件

RMAN> shutdown;
RMAN> STARTUP NOMOUNT;
RMAN> RESTORE STANDBY CONTROLFILE FROM '/home/oracle/rman/xff_ctl.bck';
RMAN> alter database mount;

8、清空备库日志组

SQL> ALTER DATABASE CLEAR LOGFILE GROUP 1;
注:如果采用了standby log模式,不需要清空,如果清空会出现
SQL> ALTER DATABASE CLEAR LOGFILE GROUP 1;
ALTER DATABASE CLEAR LOGFILE GROUP 1
*
ERROR at line 1:
ORA-19527: physical standby redo log must be renamed
ORA-00312: online log 1 thread 1: '/u01/oradata/xienfei/redo01.log'

说明:如果没有采用standby log模式,有几组需要清空几组

9、备库重设flashback

SQL> ALTER DATABASE FLASHBACK OFF;
SQL> ALTER DATABASE FLASHBACK ON;

10、备库重新接收并应用日志

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;

三、修复成功标志
1、sql中操作
在主库中执行alter system switch logfile;
分别主备库中执行select max(sequence#) from v$archived_log;如果一致标示修复成功

2、通过alert文件
主库

PING[ARC0]: Error 3113 when pinging standby xff.
Tue May 31 14:11:51 2011
Thread 1 advanced to log sequence 719
  Current log# 3 seq# 719 mem# 0: /u01/oradata/xienfei/redo03.log
Tue May 31 14:20:05 2011
Thread 1 advanced to log sequence 720
  Current log# 1 seq# 720 mem# 0: /u01/oradata/xienfei/redo01.log
Tue May 31 14:20:16 2011
ARC0: Standby redo logfile selected for thread 1 sequence 719 for destination LOG_ARCHIVE_DEST_2

备库

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION
Tue May 31 15:30:37 2011
Attempt to start background Managed Standby Recovery process (xff)
MRP0 started with pid=18, OS id=14704
Tue May 31 15:30:37 2011
MRP0: Background Managed Standby Recovery process started (xff)
Managed Standby Recovery not using Real Time Apply
 parallel recovery started with 2 processes
Media Recovery Log /u01/archive/1_718_746916894.arc
Tue May 31 15:30:43 2011
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION
Tue May 31 15:30:52 2011
RFS[1]: Successfully opened standby log 4: '/u01/oradata/xienfei/s_redo1.log'
Media Recovery Log /u01/archive/1_719_746916894.arc
Media Recovery Waiting for thread 1 sequence 720

删除data guard归档日志

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:删除data guard归档日志

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

Oracle Data Guard 是通过归档文件来进行数据同步的。 主库的归档文件,我们可以在RMAN 备份的时候进行删除,但是备库的归档文件无法自动删除,需要写脚本来定时删除。

#!/bin/bash
export ORACLE_HOME=/opt/oracle/product/10.2.0/db_1
export ORACLE_SID=zjch
export ARCHIVE_DIR=/opt/oracle/oradata/zjch/archive
export LOG_FILE=/home/oracle/logs/del_archive.log
echo "开始删除归档日志:`date`……">>$LOG_FILE
if [ `whoami` != 'oracle' ]
then
echo "Error: You must be oracle to execute.">>$LOG_FILE
exit 99
fi
del_seq=`ls -tr $ARCHIVE_DIR/|grep -v stdarch | head -1|cut -f2 -d_`
$ORACLE_HOME/bin/sqlplus -silent "/ as sysdba" <<XFF>tmp.log
set pagesize 0 feedback off verify off heading off echo off
select max(sequence#) from v\$ARCHIVED_LOG where APPLIED='YES';
exit;
XFF
max_sn=`cat tmp.log`
rm tmp.log
max_sn=$(( $max_sn - 20 ))
#我这里是保留最近的20个归档文件,这个具体情况自己决定
while [[ ${del_seq} -lt ${max_sn} ]]
do
  echo "${ARCHIVE_DIR}/1_${del_seq}_681145105.dbf">>$LOG_FILE
  rm ${ARCHIVE_DIR}/1_${del_seq}_681145105.dbf
  #这里是我定义归档文件的格式,具体根据自己的归档文件格式来匹配,关键是匹配日志的sequence no。
  del_seq=$(( $del_seq + 1 ))
done
echo "删除归档日志结束:`date`……">>$LOG_FILE
#清楚controlfile中信息
$ORACLE_HOME/bin/rman target / <<XIFENFEI>>$LOG_FILE
crosscheck archivelog all;
delete expired archivelog all;
YES
exit;
XIFENFEI
echo "………………………………………………………………………………………………………………………">>$LOG_FILE

重点是在读取现在存在的归档日志文件中sequence最小值时,
使用了ls -tr $ARCHIVE_DIR/|grep -v stdarch | head -1|cut -f2 -d_

Oracle 9I Data Guard配置

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:Oracle 9I Data Guard配置

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

1、修改参数文件
primary:
*.standby_archive_dest=’/opt/ora9/oradata/txzldb/stdarch/’
*.log_archive_dest_2=’service=txzldb.Standby optional reopen=60′ –配置tns
*.utl_file_dir=’/opt/ora9′
*.remote_archive_enable=TRUE
*.standby_file_management=AUTO
*.fal_server=’txzldb.Standby’
*.fal_client=’txzldb.Primary’
*.log_archive_dest_state_2=defer
standby:
*.standby_archive_dest=’/opt/ora9/oradata/txzldb/stdarch/’
*.log_archive_dest_2=’service=txzldb.Primary optional reopen=60′ –配置tns
*.utl_file_dir=’/opt/ora9′
*.remote_archive_enable=TRUE
*.standby_file_management=AUTO
*.fal_server=’txzldb.Primary’
*.fal_client=’txzldb.Standby’
*.log_archive_dest_state_2=defer
2、备份主库
backup database format=’/home/oracle/full_backup_%U_%s.bak’;
sql “alter system archive log current”;
–backup filesperset 10 archivelog all format=’/home/oracle/archivlog_%U_%s.bak’;(可选)
alter database create standby controlfile as ‘/home/oracle/control01.ctl’;
3、拷贝密码文件控制文件
scp /opt/oracle/product/9.2.0/dbs/orapwora9i
192.168.3.2:/opt/oracle/product/9.2.0/dbs/
scp control01.ctl
192.168.3.2:/opt/oracle/oradata/ora9i/control01.ctl
4、恢复备机
拷贝备份至备机与主机相同的目录
sqlplus “/ as sysdba”
  startup nomount pfile=”
  alter database mount standby database;
  exit
  rman target / nocatalog
  restore database
–restore archivelog all;(可选)
–recover automatic standby database;(可选)
5、启动恢复状态
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;
create spfile from pfile=”;
alter system set log_archive_dest_state_2=enable scope=both;
6、添加临时表空间
recover managed standby database cancel;
alter database open read only;
alter tablespace temp add tempfile ‘/opt/oracle/oradata/ora9i/temp01.dbf’
SIZE 412M AUTOEXTEND ON NEXT 640K MAXSIZE UNLIMITED;
shutdown immediate;
startup nomount pfile=”
alter database mount standby database;
recover managed standby database disconnect from session;

使用rman备份做Data Guard

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:使用rman备份做Data Guard

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

前两天公司的一套oracle dg因为维护人员的疏忽到值出现gap,无法恢复,只能重做,因为该库每天都有rman的增量备份,所以选择最近的一次全部进行重做dg
1、主库
ALTER DATABASE CREATE STANDBY CONTROLFILE AS ‘/tmp/control01.ctl’;
alter system archive log current;
2、修改pfile文件(因为我是重做,所以不需要做)
11g data guard(PHYSICAL STANDBY)配置
3、拷贝控制文件/rman文件到备库的相同位置
备份全备的内容(/opt/ORBS/data /tmp/control01.ctl)
4、复制数据库做dg
启动到nomount状态
rman target sys/srtsysdb@zjch_Action auxiliary /
duplicate target database for standby dorecover nofilenamecheck;
出现以下错误
RMAN-06025: no backup of log thread 1 seq 675 lowscn 1096519 found to restore
RMAN-06025: no backup of log thread 1 seq 674 lowscn 1095666 found to restore
RMAN-06025: no backup of log thread 1 seq 673 lowscn 1095663 found to restore
RMAN-06025: no backup of log thread 1 seq 672 lowscn 1095656 found to restore
RMAN-06025: no backup of log thread 1 seq 671 lowscn 1095081 found to restore
RMAN-06025: no backup of log thread 1 seq 670 lowscn 1092545 found to restore
RMAN-06025: no backup of log thread 1 seq 669 lowscn 1090351 found to restore
RMAN-06025: no backup of log thread 1 seq 668 lowscn 1088837 found to restore
RMAN-06025: no backup of log thread 1 seq 667 lowscn 1088834 found to restore
RMAN-06025: no backup of log thread 1 seq 666 lowscn 1088721 found to restore
原因:因为控制文件和备份文件中有时间间隔,所以rman要恢复,但是归档日志没有传输过来,所以不能恢复。
–到主库的归档目录查看这些日志是否存在
ls -l(如果存在,下面的操作恢复日志的操作可以忽略)
–查看这些日志是否存在
list backup of archivelog from sequence 666 until sequence 675;
–如果存在rman备份中,使用rman进行恢复
restore archivelog from sequence 666 until sequence 675;
如果存在相关联机日志就不需要添加
–添加redo log
alter database add logfile group 1 (‘/opt/oracle/oradata/zjch/redo01.log’) size 50m;
alter database add logfile group 2 (‘/opt/oracle/oradata/zjch/redo02.log’) size 50m;
alter database add logfile group 3 (‘/opt/oracle/oradata/zjch/redo03.log’) size 50m;
–添加sandby log
ALTER DATABASE ADD STANDBY LOGFILE (‘/opt/oracle/oradata/zjch/std_redo04.log’) SIZE 50M reuse;
ALTER DATABASE ADD STANDBY LOGFILE (‘/opt/oracle/oradata/zjch/std_redo05.log’) SIZE 50M reuse;
ALTER DATABASE ADD STANDBY LOGFILE (‘/opt/oracle/oradata/zjch/std_redo06.log’) SIZE 50M reuse;
ALTER DATABASE ADD STANDBY LOGFILE (‘/opt/oracle/oradata/zjch/std_redo07.log’) SIZE 50M reuse;
5、开启日志应用
–开启实时应用
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;
–开启应用(根据配置和需求)
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;
到这一步会如果主机端有所需的归档日志,会自动传输过来,并进行恢复,不用人工干预。
–查看alter文件,查询日志传输是否正常
–查看日志是否传输完成
select thread#, low_sequence#, high_sequence# from v$archive_gap;
–主库切换日志
alter system switch logfile;
–查询最大sequence
select max(sequence#) from v$archived_log;
–备库查看alert文件
tail -100 alert.log
Media Recovery Log /opt/oracle/oradata/zjch/archive/stdarch/1_46544_681145105.dbf
Media Recovery Log /opt/oracle/oradata/zjch/archive/stdarch/1_46545_681145105.dbf 有类似内容
–查询最大sequence
select max(sequence#) from v$archived_log;
–主库建立一个测试表
create table SCOTT.t_test(id number,name varchar2(10));
insert into SCOTT.t_test values(1,’aaaa’);
commit;
alter system switch logfile;
–查询最大sequence
select max(sequence#) from v$archived_log;
–备库检查
–查询最大sequence
select max(sequence#) from v$archived_log;
–取消日志应用
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE cancel;
–只读模式打开数据库
alter database open read only;
–查询刚刚建的表和插入数据是否通报不成功
select * from SCOTT.t_test;
–重启备库,设置为日志应用模式
shutdown immediate;
startup mount;
–开启实时应用
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;
–开启应用(根据配置和需求)
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;
–主库删除测试表
drop table SCOTT.t_test;