shell监控dataguard备库是否正常应用日志

一直在思索怎么去监控dg比较方便,又能够做到比较适用.想到了几种方法:
1.使用主备库两边的alert日志,但是这样的方法需要配置ssh,用来一个节点获取另外一个节点的alert日志
2.通过查询v$archived_log或者其他相关视图,然后主备库进行比较,但是这个需要访问另外一个库,需要另外库的登录信息
3.通过查询备库的v$archived_log视图,粗略评估dg是否工作正常.
这里我选择了3,dg的监控大部分时候是为了让人及时的发现日志应用异常,然后人工干预处理,从而减少修改gap或者重建dg的概率.而这个额监控可以在很大程度上发现dg应用归档日志异常,从而确定dg是否工作正常,如果发现工作异常,及时处理,可以减少很多工作量,甚至拯救你的数据.

#!/bin/bash
source ~/.bash_profile
#check time(M)
export CHECK_M=120
export RESULT_FILE=/tmp/dg_switch_check.log
$ORACLE_HOME/bin/sqlplus -silent "/ as sysdba" <<XFF>/tmp/check_dg.log
set pagesize 0 feedback off verify off heading off echo off
 select ceil((sysdate-next_time)*24*60) "M"
 from v\$archived_log
 where applied='YES' AND SEQUENCE#=(SELECT MAX(SEQUENCE#)
 FROM V\$ARCHIVED_LOG WHERE applied='YES');
exit;
XFF
GET_M=`cat /tmp/check_dg.log`
rm /tmp/check_dg.log
if [ ${CHECK_M} -lt ${GET_M} ];
then
    echo "check dataguard time:`date`">$RESULT_FILE
    echo "The last time application archivelog happened in $GET_M minutes ago">>$RESULT_FILE
else
 echo ''>$RESULT_FILE
fi

针对这样的脚本,根据你的dg归档切换的频率,设置监控dg的最近一次日志应用与当前时间差,然后判断dg是否工作正常.根据监控程序的特点,可以通过判断结果集文件,然后邮件/短信或者其他方式处理.

因v$archived_log视图记录异常导致dg MRP进程异常

版本信息

操作系统Linux 4.8 x86
数据库版本ORACLE 9.2.0.4

alert日志报错
MRP进程出现异常报ORA-00310/ORA-00334错误

…………
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_75.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_76.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_77.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_78.dbf
Media Recovery Log /u01/oracle/oradata/xifenfei/redo02.log
MRP0: Background Media Recovery terminated with error 310
Thu Nov  8 07:44:39 2012
Errors in file /u01/oracle/admin/lunar/bdump/lunar_mrp0_25625.trc:
ORA-00310: archived log contains sequence 85; sequence 79 required
ORA-00334: archived log: '/u01/oracle/oradata/xifenfei/redo02.log'
Recovery interrupted.
MRP0: Background Media Recovery process shutdown

trace文件

*** SESSION ID:(17.13) 2012-11-08 07:24:12.986
Background Managed Standby Recovery process started
*** 2012-11-08 07:24:18.023
Managed Recovery: Active posted.
*** 2012-11-08 07:41:03.171
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_64.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_65.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_66.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_67.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_68.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_69.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_70.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_71.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_72.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_73.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_74.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_75.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_76.dbf
*** 2012-11-08 07:41:39.083
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_77.dbf
*** 2012-11-08 07:44:39.171
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_78.dbf
Media Recovery Log /u01/oracle/oradata/xifenfei/redo02.log
Background Media Recovery terminated with error 310
ORA-00310: archived log contains sequence 85; sequence 79 required
ORA-00334: archived log: '/u01/oracle/oradata/xifenfei/redo02.log'
----- Redo read statistics for thread 1 -----
Read rate (ASYNC): 21990Kb in 1221.38s => 0.02 Mb/sec
Longest record: 1Kb, moves: 0/92129 (0%)
Change moves: 34869/213735 (16%), moved: 2Mb
----------------------------------------------
*** 2012-11-08 07:44:39.404
Managed Recovery: Not Active posted.
Background Media Recovery process shutdown
*** 2012-11-08 07:44:39.406

猜想数据库恢复需要sequence为79的归档日志,但是该归档日志对应的为文件为redo02.log,而该redo02的seq为85所以使得MRP进程异常

使用rman尝试恢复

RMAN> recover database ;
Starting recover at 08-NOV-12
using target database controlfile instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=8 devtype=DISK
starting media recovery
unable to find archive log
archive log thread=1 sequence=79
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 11/08/2012 08:12:48
RMAN-06054: media recovery requesting unknown log: thread 1 scn 12286829427051

测试证明rman也无法正常的恢复该异常问题

验证猜想

SQL> select name from v$archived_log where SEQUENCE#=79;
NAME
-------------------------------------------------------------------
/u01/oracle/oradata/lunar/arch/1_79.dbf
/u01/oracle/oradata/xifenfei/redo02.log
/u01/oracle/oradata/lunar/arch/1_79.dbf
SQL> select dest_id,name from v$archived_log where SEQUENCE#=79;
   DEST_ID  NAME
----------  ---------------------------------------------------
         2  /u01/oracle/oradata/lunar/arch/1_79.dbf
         1  /u01/oracle/oradata/xifenfei/redo02.log
         1  /u01/oracle/oradata/lunar/arch/1_79.dbf
SQL> select sequence#,group# from v$log;
 SEQUENCE#     GROUP#
---------- ----------
        86          1
        85          2
        87          3
SQL> select member from v$logfile where group#=2;
MEMBER
---------------------------------------------------------------
/u01/oracle/oradata/xifenfei/redo02.log

通过查询上面相关视图,证实了猜想是因为redo log被注册进入了v$archived_log导致该故障,解决该问题的思路是把redo log file从备库控制文件的v$archived_log视图中拿掉.具体方法是:
1.如果主库正常,那直接生成standby controlfile来实现
2.如果主库也是相同情况,那么先重建主库控制文件,然后重建standby controlfile来实现(该方法需要维护窗口)

解决问题思路

--查询主库,确定主库正常
SQL> select name from v$archived_log where SEQUENCE#=79;
NAME
------------------------------------------------------------
/u01/oracle/oradata/xifenfei/archive/1_79.dbf
lunar
--重新创建standby controlfile
###############################################################
注:如果主库和备库的数据文件路径不完全一致,
建议通过设置db_file_name_convert来直接实现备库数据文件路径的转换,
而不建议通过alter database rename file来实现重命名
###############################################################
SQL> ALTER DATABASE CREATE STANDBY CONTROLFILE AS
   2 '/u01/oracle/oradata/lunar/control01.ctl' reuse;
Database altered.
--重新启动备库
SQL> startup nomount;
ORACLE instance started.
Total System Global Area  353441008 bytes
Fixed Size                   451824 bytes
Variable Size             184549376 bytes
Database Buffers          167772160 bytes
Redo Buffers                 667648 bytes
SQL> alter database mount standby database;
Database altered.
--开启日志应用
SQL> alter database recover managed standby database disconnect from session;
Database altered.
--alert日志
Thu Nov  8 08:28:16 2012
Completed: alter database recover managed standby database di
Thu Nov  8 08:28:20 2012
Restarting dead background process QMN0
QMN0 started with pid=13
Thu Nov  8 08:29:45 2012
Fetching gap sequence for thread 1, gap sequence 79-87
Trying FAL server: xifenfei
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_79.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_80.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_81.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_82.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_83.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_84.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_85.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_86.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_87.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_88.dbf
Media Recovery Waiting for thread 1 seq# 89

使用dbms_pumpdata执行expdp操作

使用dbms_pumpdata执行expdp操作

set serverout on
declare
  h1 number; -- Datapump handle
  dir_name varchar2(30); -- Directory Name
  job_status VARCHAR2(30);
begin
  dir_name := 'DATA_PUMP_DIR';
  h1 := dbms_datapump.open(
  operation =>'EXPORT', --是export还是impport
--导出表配置
  job_mode =>'TABLE',  --job_mode可以为SCHEMA/TABLE等
--导出用户配置
  job_mode =>'SCHEMA',
  remote_link => NULL, --是否使用dblink导出(就是NETWORK_LINK)
  job_name =>'TABLE_XFF' --job_name expdpjob的名称
  );
  dbms_datapump.add_file(handle =>h1,
                         filename => 'XIFENFEI.DMP',
                         directory => dir_name,
                         filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_DUMP_FILE,
                         reusefile => 1); -- value of 1 instructs to overwrite existing file
  dbms_datapump.add_file(handle =>h1,
                         filename => 'XIFENFEI.LOG',
                         directory => dir_name,
                         filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_LOG_FILE,
                         reusefile => 1);
--导出表配置
  dbms_datapump.metadata_filter(handle =>h1,
                         name => 'TABLE_FILTER',
                         value => 'CHF.T_XIFENFEI');
--导出用户配置
  dbms_datapump.metadata_filter (handle => dp_handle,
                                 name => 'SCHEMA_EXPR',
                                 value => 'IN (''CHF'')');
-- Start the job.
  dbms_datapump.start_job(h1);
  dbms_datapump.wait_for_job (handle => dp_handle,
                              job_state => job_status);
  dbms_output.put_line ('DataPump Export - '||to_char(sysdate,'DD/MM/YYYY HH24:MI:SS')||' Status '||job_status);
  begin
     dbms_datapump.detach(handle => h1);
   end;
end;
/

awrload导入awr数据出现两种常见错误说明

ORA-06502/ORA-06512错误

SQL> @?/rdbms/admin/awrload.sql
~~~~~~~~~~
AWR LOAD
~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~  This script will load the AWR data from a dump file. The   ~
~  script will prompt users for the following information:    ~
~     (1) name of directory object                            ~
~     (2) name of dump file                                   ~
~     (3) staging schema name to load AWR data into           ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Specify the Directory Name
~~~~~~~~~~~~~~~~~~~~~~~~~~
Directory Name                 Directory Path
------------------------------ -------------------------------------------------
DATA_PUMP_DIR                  /u01/oracle/admin/ora11g/dpdump/
ORACLE_OCM_CONFIG_DIR          /u01/oracle/oracle/product/11.2.0/db_1/ccr/state
XMLDIR                         /u01/oracle/oracle/product/11.2.0/db_1/rdbms/xml
Choose a Directory Name from the list above (case-sensitive).
Enter value for directory_name: DATA_PUMP_DIR
Using the dump directory: DATA_PUMP_DIR
Specify the Name of the Dump File to Load
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Please specify the prefix of the dump file (.dmp) to load:
Enter value for file_name: awrdat_70441_70885-vpos-primary
Loading from the file name: awrdat_70441_70885-vpos-primary.dmp
begin
*
ERROR at line 1:
ORA-06502: PL/SQL: numeric or value error: character string buffer too small
ORA-06512: at line 2

因为导出来的awr数据库的编码是AL32UTF8,而现在的编码是ZHS16GBK,所以解决方法是设置NLS_LANG为合适编码

[oracle@xifenfei ~]$ env|grep NLS_LANG
NLS_LANG=AMERICAN_AMERICA.ZHS16GBK
[oracle@xifenfei ~]$ export NLS_LANG=AMERICAN_AMERICA.AL32UTF8

ORA-31640

SQL> @?/rdbms/admin/awrload.sql
~~~~~~~~~~
AWR LOAD
~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~  This script will load the AWR data from a dump file. The   ~
~  script will prompt users for the following information:    ~
~     (1) name of directory object                            ~
~     (2) name of dump file                                   ~
~     (3) staging schema name to load AWR data into           ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Specify the Directory Name
~~~~~~~~~~~~~~~~~~~~~~~~~~
Directory Name                 Directory Path
------------------------------ -------------------------------------------------
DATA_PUMP_DIR                  /u01/oracle/admin/ora11g/dpdump/
ORACLE_OCM_CONFIG_DIR          /u01/oracle/oracle/product/11.2.0/db_1/ccr/state
XMLDIR                         /u01/oracle/oracle/product/11.2.0/db_1/rdbms/xml
Choose a Directory Name from the list above (case-sensitive).
Enter value for directory_name: DATA_PUMP_DIR
Using the dump directory: DATA_PUMP_DIR
Specify the Name of the Dump File to Load
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Please specify the prefix of the dump file (.dmp) to load:
Enter value for file_name: awrdat_70441_70885-vpos-primary  <--注意输入
Loading from the file name: awrdat_70441_70885-vpos-primary.dmp
…………
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|  Loading the AWR data from the following
|  directory/file:
|   /u01/oracle/admin/ora11g/dpdump/
|   awrdat_70441_70885-vpos-primar.dmp    <--提示的文件名
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|  *** AWR Load Started ...
|
|  This operation will take a few moments. The
|  progress of the AWR load operation can be
|  monitored in the following directory/file:
|   /u01/oracle/admin/ora11g/dpdump/
|   awrdat_70441_70885-vpos-primar.log
|
DBMS_DATAPUMP.ADD_FILE(dump file)
ORA-39001: invalid argument value
Exception encountered in AWR_LOAD
begin
*
ERROR at line 1:
ORA-20115: datapump import encountered error:
ORA-39001: invalid argument value
ORA-39000: bad dump file specification
ORA-31640: unable to open dump file
"/u01/oracle/admin/ora11g/dpdump/awrdat_70441_70885-vpos-primar.dmp" for read
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
ORA-06512: at "SYS.DBMS_SWRF_INTERNAL", line 1717
ORA-06512: at line 3
begin
*
ERROR at line 1:
ORA-20106: AWR tables do not exist for the 'AWR_STAGE' user
ORA-06512: at "SYS.DBMS_SWRF_INTERNAL", line 2920
ORA-00942: table or view does not exist
ORA-06512: at line 3
... Dropping AWR_STAGE user
End of AWR Load

这里我们可以看到,dmp文件名为31个字符,而在提示文件名的时候是30个字符,从而出现了dmp文件不存在而导致的相关错误,解决方案重命名dmp文件,名称不超过30个字符

windows中使用dd程序

很多习惯了Linux和Uinx的人都困惑windows中没有dd这样的好命令,经过不懈的努力终于发现了win dd
帮助命令

E:\>dd --help
rawwrite dd for windows version 0.6beta3.
Written by John Newbigin <jn@it.swin.edu.au>
This program is covered by terms of the GPL Version 2.
dd [bs=SIZE] [count=BLOCKS] [if=FILE] [of=FILE] [seek=BLOCKS] [skip=BLOCKS]
[--size] [--list] [--progress]
SIZE and BLOCKS may have one of the following suffix:
 k = 1024
 M = 1048576
 G = 1073741824
default block size (bs) is 512 bytes
skip specifies the starting offset of the input file (if)
seek specifies the starting offset of the output file (of)

磁盘列表

E:\>dd --list
rawwrite dd for windows version 0.6beta3.
Written by John Newbigin <jn@it.swin.edu.au>
This program is covered by terms of the GPL Version 2.
Win32 Available Volume Information
\\.\Volume{7cf2f59c-6207-11e1-b81a-806e6f6e6963}\
  link to \\?\Device\HarddiskVolume1
  fixed media
  Mounted on \\.\c:
\\.\Volume{9ba58399-a85a-11e1-b19f-005056c00008}\
  link to \\?\Device\HarddiskVolume5
  removeable media
  Mounted on \\.\h:
\\.\Volume{7cf2f59d-6207-11e1-b81a-806e6f6e6963}\
  link to \\?\Device\HarddiskVolume2
  fixed media
  Mounted on \\.\d:
\\.\Volume{7cf2f59e-6207-11e1-b81a-806e6f6e6963}\
  link to \\?\Device\HarddiskVolume3
  fixed media
  Mounted on \\.\e:
\\.\Volume{7cf2f59f-6207-11e1-b81a-806e6f6e6963}\
  link to \\?\Device\HarddiskVolume4
  fixed media
  Mounted on \\.\f:
\\.\Volume{d0f53c6a-6272-11e1-8729-806e6f6e6963}\
  link to \\?\Device\CdRom0
  CD-ROM
  Mounted on \\.\g:
NT Block Device Objects
\\?\Device\CdRom0
  size is 2147483647 bytes
Virtual input devices
 /dev/zero   (null data)
 /dev/random (pseudo-random data)
 -           (standard input)
Virtual output devices
 -           (standard output)
 /dev/null   (discard the data)

拷贝磁盘头(asm修复常见)

E:\>dd if=\\.\Volume{9ba58399-a85a-11e1-b19f-005056c00008}
of=f:\usb.img bs=1M count=10
rawwrite dd for windows version 0.6beta3.
Written by John Newbigin <jn@it.swin.edu.au>
This program is covered by terms of the GPL Version 2.
10+0 records in
10+0 records out
E:\>dir f:\usb.img
 驱动器 F 中的卷没有标签。
 卷的序列号是 000F-FFCB
 f:\ 的目录
2012/11/13  23:39        10,485,760 usb.img
               1 个文件     10,485,760 字节
               0 个目录 30,501,912,576 可用字节

拷贝文件

E:\>dd if=f:\usb.img   of=f:\usb_new.img bs=1M count=5
rawwrite dd for windows version 0.6beta3.
Written by John Newbigin <jn@it.swin.edu.au>
This program is covered by terms of the GPL Version 2.
5+0 records in
5+0 records out
E:\>dir f:\usb*
 驱动器 F 中的卷没有标签。
 卷的序列号是 000F-FFCB
 f:\ 的目录
2012/11/13  23:39        10,485,760 usb.img
2012/11/13  23:46         5,242,880 usb_new.img
               2 个文件     15,728,640 字节
               0 个目录 30,496,669,696 可用字节

ORA-00600[qmxtriCheckAndRewriteQb0]

数据库报ORA-00600[qmxtriCheckAndRewriteQb0]

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /u01/oracle/product/10.2.0
System name:	AIX
Node name:	abc
Release:	3
Version:	5
Machine:	00C58A644C00
Instance name: XFF2
Redo thread mounted by this instance: 2
Oracle process number: 434
Unix process pid: 492340, image: oracle@abc
*** ACTION NAME:() 2012-11-12 08:46:47.132
*** MODULE NAME:() 2012-11-12 08:46:47.132
*** SERVICE NAME:(ORCL) 2012-11-12 08:46:47.132
*** CLIENT ID:() 2012-11-12 08:46:47.132
*** SESSION ID:(870.58602) 2012-11-12 08:46:47.132
*** 2012-11-12 08:46:47.132
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [qmxtriCheckAndRewriteQb0], [], [], [], [], [], [], []
Current SQL statement for this session:
SELECT EXTRACTVALUE(配置,'//SYSTEM[@XTH="'||:B1 ||'"]/FILE') ,
WHERE EXTRACTVALUE(配置,'//SYSTEM[@XTH="'||:B1 ||'"]/BM')=:B2  AND ROWNUM<2
----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
70000021d535f70        25  procedure ZLTOOLS.ZL_MBRUNLOG_INSERT
7000002b6819368         1  anonymous block
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              000000000 ? 000000000 ?
ksedmp+0290          bl       ksedst               104A2C690 ?
ksfdmp+0018          bl       03F26C3C
kgerinv+00dc         bl       _ptrgl
kgeasnmierr+004c     bl       kgerinv              7000002F735A838 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   0FFFFBFFF ?
IPRA.$qmxtriCheckAn  bl       03F25970
dRewriteQb_rec+0194
IPRA.$qmxtriCheckAn  bl       IPRA.$qmxtriCheckAn  1000881EC ? 000000000 ?
dRewriteQb_rec+006c           dRewriteQb_rec       000000000 ?
IPRA.$qmxtriCheckAn  bl       IPRA.$qmxtriCheckAn  FFFFFFFFFFF07E0 ? 000000033 ?
dRewriteQb_rec+006c           dRewriteQb_rec       1056037F8 ?
qmxtriCheckAndRewri  bcl      dmqlKMlod+00c0       000000000 ? 110421CB0 ?
teQb+0094                                          FFFFFFFFFFE87C0 ?
qmxtrxq+0210         bl       03F252EC
qmxtrxop+00a4        bl       qmxtrxq              FFFFFFFFFFF25B8 ?
                                                   700000282F66DD0 ? 110195E98 ?
koksspend+02b0       bl       qmxtrxop             100346AB4 ?
kkmdrvend+01a8       bl       koksspend            000000001 ? 104B3A8A8 ?
                                                   000000000 ?
kkmdrv+004c          bl       kkmdrvend            FFFFFFFFFFE8BE0 ?
                                                   883843401048F2F8 ?
opiSem+13c0          bl       kkmdrv               000000000 ? 000000000 ?
                                                   000000000 ? 11022AC50 ?
opiDeferredSem+0234  bl       opiSem               FFFFFFFFFFE9CE0 ?
                                                   7000001E327CCE0 ? 000000111 ?
                                                   100000001 ?
opitca+01e8          bl       opiDeferredSem
kksFullTypeCheck+00  bl       03F25230
1c
rpiswu2+034c         bl       _ptrgl
kksSetBindType+0d28  bl       rpiswu2              70000030850C178 ?
                                                   3300000033 ?
                                                   FFFFFFFFFFF0570 ?
                                                   FFFFFFFFFFF0578 ?
                                                   7000002F6F0C700 ?
                                                   33104027D8 ?
                                                   FFFFFFFFFFF1F48 ? 000000000 ?
kksfbc+1054          bl       kksSetBindType       70000030F58F400 ? 1107CB418 ?
                                                   70000001003B800 ?
                                                   10200003000 ? 110000FF8 ?
                                                   7000000100ECAB8 ?
                                                   FFFFFFFFFFF1480 ?
                                                   481A408400003000 ?
opiexe+098c          bl       01F960BC
opipls+185c          bl       opiexe               FFFFFFFFFFF3900 ?
                                                   FFFFFFFFFFF39E8 ?
                                                   FFFFFFFFFFF38A0 ?
opiodr+0ae0          bl       _ptrgl
rpidrus+01bc         bl       opiodr               66FFFF54B0 ? 608736A20 ?
                                                   FFFFFFFFFFF67C0 ?
                                                   1510195E98 ?
skgmstack+00c8       bl       _ptrgl
rpidru+0088          bl       skgmstack            102320840 ? 000000000 ?
                                                   000000002 ? 000000000 ?
                                                   FFFFFFFFFFF5F88 ?
rpiswu2+034c         bl       _ptrgl
rpidrv+095c          bl       rpiswu2              70000030850C178 ? 110469C28 ?
                                                   11044AA58 ? 000000000 ?
                                                   FFFFFFFFFFF5D60 ?
                                                   3300000000 ? 000000000 ?
                                                   000000000 ?
psddr0+02bc          bl       03F266D4
psdnal+01d0          bl       psddr0               1500000000 ? 6600000000 ?
                                                   FFFFFFFFFFF67C0 ?
                                                   30100BACC8 ?
pevm_EXECC+01f8      bl       _ptrgl
pfrinstr_EXECC+0070  bl       pevm_EXECC           10147B2A4 ? 000000000 ?
                                                   700000262828B72 ?
pfrrun_no_tool+005c  bl       _ptrgl
pfrrun+1014          bl       pfrrun_no_tool       FFFFFFFFFFF6B20 ?
                                                   7000002B6819368 ? 3100ECBB0 ?
plsql_run+06b4       bl       pfrrun               1107D84A8 ?
peicnt+0224          bl       plsql_run            1107D84A8 ? 10001102676F8 ?
                                                   000000000 ?
kkxexe+0250          bl       peicnt               FFFFFFFFFFF7E38 ? 1107D84A8 ?
opiexe+2ef8          bl       kkxexe               11047E1C8 ?
kpoal8+0edc          bl       opiexe               FFFFFFFFFFFB454 ?
                                                   FFFFFFFFFFFB1A8 ?
                                                   FFFFFFFFFFF9628 ?
opiodr+0ae0          bl       _ptrgl
ttcpip+1020          bl       _ptrgl
opitsk+1124          bl       01F96AC8
opiino+0990          bl       opitsk               0FFFFD490 ? 000000000 ?
opiodr+0ae0          bl       _ptrgl
opidrv+0484          bl       01F95914
sou2o+0090           bl       opidrv               3C02D99B7C ? 4A076D928 ?
                                                   FFFFFFFFFFFF390 ?
opimai_real+01bc     bl       01F93294
main+0098            bl       opimai_real          000000000 ? 000000000 ?
__start+0098         bl       main                 000000000 ? 000000000 ?
--------------------- Binary Stack Dump ---------------------

通过这个trace的部分信息可以得到:
1.操作系统版本AIX x64(5.3)
2.数据库版本10.2.0.4
3.sql语句调用EXTRACTVALUE函数
4.Call Stack Trace信息

查询MOS[ID 467350.1]发现匹配信息

Cause
Bug 6030982 ORA-600 [QMXTRICHECKANDREWRITEQB0] WITH QUERY USING EXTRACTVALUE FUNCTION
Solution
This bug is going to be fixed in furture 10.2.0.5.0 and 11g
At the mean time , user can workaround by
set
event = "19027 trace name context forever, level 1"
within init.ora or spfile file then bounce database.
or
SQL> alter session set events ='19027 trace name context forever, level 1';
SQL> Alter system flush shared_pool;
-- Execute affected query

通过mos可以确定:
1.是因为数据库执行EXTRACTVALUE函数遇到该bug
2.在11g和10.2.0.5中修复该bug
3.可以通过设置event = “19027 trace name context forever, level 1″来临时解决该问题

个人处理建议
1.如果数据库方便升级,那建议升级处理
2.如果数据库不便立马升级,建议在业务低估时设置session event 19027,然后 flush shared_pool,执行报错sql,如果问题解决,在合适时间设置system event来临时屏蔽该问题.

数据文件的CREATION_TIME来源和算法

对ORACLE比较熟悉的人都知道v$datafile.CREATION_TIME和v$datafile_header.CREATION_TIME这两个列都是表示数据文件的创建时间,而根据我们的经验可以知道几点:
1.当v$datafile.CREATION_TIME与v$datafile_header.CREATION_TIME不一致时数据库不能正常启动
2.v$datafile.CREATION_TIME的值来源于v$datafile_header.CREATION_TIME
3.而v$datafile_header.CREATION_TIME的值来源于数据文件头的块中的信息

现在就出现一个问题,数据块中的kcvfhcrt是一个16进制的数,如何实现在v$datafile和v$datafile_header中转为为了数据文件创建的日期
数据文件中存储创建数据文件日期内容

ub4 kcvfhcrt                             @108      0x2c67319c

v$datafile.CREATION_TIME值

SQL> select to_char(CREATION_TIME,'yyyy-mm-dd hh24:mi:ss') xifenfei
   2 from v$datafile where file#=1;
XIFENFEI
-------------------
2011-03-05 05:26:52

如何通过kcvfhcrt值推算出来CREATION_TIME或者通过CREATION_TIME推断出来kcvfhcrt的值规则:
熟悉数据库SCN计数原理的人都知道,我们现在使用的数据库是从1988/01/01 00:00:00开始记录SCN,也就是说我们的数据库的使用最早时间只能是从1988年元旦凌晨开始,那么也就是说数据库记录的创建时间可以采用这个时间点为起点,然后每增加一秒,数据库的kcvfhcrt就增加1,但是ORACLE为了计算简便,每个月按照31天计算
通过时间推算出来kcvfhcrt值

--数据库记录时间起点
1988/01/01 00:00:00
--当前数据文件创建日志
2011/03/05 05:26:52
--两者相差时间
23年02月04日05时26分52秒
--计算相差秒
23*12*31*24*60*60+2*31*24*60*60+4*24*60*60+5*60*60+26*60+52=744960412
--kcvfhcrt值转换
2c67319c(16进制)=744960412(10进制)

通过kcvfhcrt计算CREATION_TIME值

SQL> select to_number('2c67319c','xxxxxxxxxxx') from dual;
TO_NUMBER('2C67319C','XXXXXXXXXXX')
-----------------------------------
                          744960412
SQL> select 744960412/(12*31*24*60*60) from dual;
744960412/(12*31*24*60*60)
--------------------------
                23.1780295
SQL> select mod(744960412,(12*31*24*60*60)) from dual;
MOD(744960412,(12*31*24*60*60))
-------------------------------
                        5722012
SQL> select 5722012/(31*24*60*60) from dual;
5722012/(31*24*60*60)
---------------------
           2.13635454
SQL> select mod(5722012,(31*24*60*60)) from dual;
MOD(5722012,(31*24*60*60))
--------------------------
                    365212
SQL> select 365212/(24*60*60) from dual;
365212/(24*60*60)
-----------------
       4.22699074
SQL> select mod(365212,(24*60*60)) from dual;
MOD(365212,(24*60*60))
----------------------
                 19612
SQL> select 19612/(60*60) from dual;
19612/(60*60)
-------------
   5.44777778
SQL> select mod(19612,(60*60)) from dual;
MOD(19612,(60*60))
------------------
              1612
SQL> select 1612/60 from dual;
   1612/60
----------
26.8666667
SQL> select mod(1612,60) from dual;
MOD(1612,60)
------------
          52
从这里可以得出23年2月4天5时26分52秒,与1988年01月01日00时00分00秒相加得到
2011年03月05日 5:26:52
SQL> select to_char(CREATION_TIME,'yyyy-mm-dd hh24:mi:ss') from v$datafile where file#=1;
TO_CHAR(CREATION_TI
-------------------
2011-03-05 05:26:52

bbed打开丢失部分system数据文件库

在某种情况下,数据库system表空间可能有多个数据文件,而意外的丢失了其中某个(不能为第一个),然后通过bbed来模拟一个数据文件来open库
system增加数据文件

SQL> alter tablespace system add datafile '/u01/oracle/oradata/ora11g/system02.dbf' size 10m;
Tablespace altered.
--创建表,为了使得数据库发生类此生产环境的部分操作,使得system表空间可能发生改变
SQL> create table t_xifenfei tablespace system
  2  as
  3  select * from dba_tables;
Table created.

删除system中某个文件(system02.dbf)

[oracle@xifenfei ora11g]$ mv system02.dbf system02.dbf_bak

尝试启动数据库

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area  313860096 bytes
Fixed Size                  1344652 bytes
Variable Size             251661172 bytes
Database Buffers           54525952 bytes
Redo Buffers                6328320 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 8 - see DBWR trace file
ORA-01110: data file 8: '/u01/oracle/oradata/ora11g/system02.dbf'

错误思路offline system数据文件

SQL> alter database datafile 8 offline;
Database altered.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01147: SYSTEM tablespace file 8 is offline
ORA-01110: data file 8: '/u01/oracle/oradata/ora11g/system02.dbf'
SQL> alter database datafile 8 online;
alter database datafile 8 online
*
ERROR at line 1:
ORA-01157: cannot identify/lock data file 8 - see DBWR trace file
ORA-01110: data file 8: '/u01/oracle/oradata/ora11g/system02.dbf'

使用system表空间其他数据文件来模拟丢失数据文件

[oracle@xifenfei ora11g]$ cp system01.dbf system02.dbf

通过dul获取file$相关信息

FILE#
RELFILE#
CRSCNWRP
CRSCNBAS

bbed修改下面参数值

--rdba
 ub4 rdba_kcbh                         @4        0x02000001
--绝对文件号
 ub2 kccfhfno                          @52       0x0008
--scn
 ub4 kscnbas                           @100      0xc01a3581
 ub2 kscnwrp                           @104      0x0b2c
--相对文件号
 ub4 kcvfhrfn                             @368      0x00000008
--文件大小(不修改,为了重建欺骗数据库重建控制文件)
kccfhfsz
--文件创建时间(重建控制文件来实现控制文件和数据文件头一致)
kcvfhcrt

重建控制文件
1.因为复制来自同一个表空间下面的数据文件,数据文件大小和原数据文件一样, 所以不要修改kccfhfsz大小,不然会出现

CREATE CONTROLFILE REUSE DATABASE "ORA11G" NORESETLOGS       ARCHIVELOG
*
ERROR at line 1:
ORA-01503: CREATE CONTROLFILE failed
ORA-01200: actual file size of 90880 is smaller than correct size of 10485760 blocks
ORA-01110: data file 8: '/u01/oracle/oradata/ora11g/system02.dbf'

2. 数据文件创建时间是通过kcvfhcrt参数值来控制的,而这个值是通过1988年01月01日00时00分00秒开始计时,按照每月31天计算的累计值,按照这个规则可以推断出来kcvfhcrt.因为数据库在启动的时候会验证控制文件中这个值和数据文件头的该值是否一致,所以如果你不修改kcvfhcrt,可以选择重建控制文件来完成.

再次open数据库

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: '/u01/oracle/oradata/ora11g/system01.dbf'
SQL> recover database;
Media recovery complete.
SQL> alter database open;
Database altered.

操作到这里,库已经可以正常的被open,如果通过这种方面屏蔽掉的异常的system数据文件中数据字典的部分表信息时,可能数据库依然不能被正常逻辑导出(例如dba_segments,dba_extents的基表等),需要进一步特殊处理,如果不能自行解决相关问题,需要恢复支持,请联系我们
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

解决Statspack报告时Snap Id为"#####"

生成Statspack报告时候发现Snap Id为”#####”

Instances in this Statspack schema
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   DB Id    Inst Num DB Name      Instance     Host
----------- -------- ------------ ------------ ------------
  631690435        1 VODAPP       vodapp       T5440-1
Enter value for dbid: 631690435
Using 631690435 for database Id
Enter value for inst_num: 1
Using 1 for instance number
Completed Snapshots
                           Snap                    Snap
Instance     DB Name         Id   Snap Started    Level Comment
------------ ------------ ----- ----------------- ----- ----------------------
vodapp       VODAPP       ##### 16 Oct 2012 17:00     5
                          ##### 16 Oct 2012 18:00     5
                          ##### 16 Oct 2012 19:00     5
                          ##### 16 Oct 2012 20:00     5
                          ##### 16 Oct 2012 21:00     5
                          ##### 16 Oct 2012 22:00     5
                          ##### 16 Oct 2012 23:00     5
                          …………
Specify the Begin and End Snapshot Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter value for begin_snap:

因为没有办法定位到Snap Id,所以暂时无法准确的输入对应值,当然可以通过如下sql查询相应的Snap Id

select SNAP_ID,to_char(SNAP_TIME,'yyyy-mm-dd hh24:mi:ss') SNAP_TIME
from STATS$SNAPSHOT order by SNAP_ID;

虽然可以通过这个方面来曲线解决这个问题,但是还有比较完善一点的解决方法,我们阅读spcreate.sql相关脚本,修改相关程序来实现

--通过spcreate.sql发现
spcreate.sql调用@@sprepins
--编辑sprepins.sql
column instart_fmt noprint;
column inst_name   format a12  heading 'Instance';
column db_name     format a12  heading 'DB Name';
column snap_id     format 9990 heading 'Snap|Id';
-->(修改为)column snap_id     format 999990 heading 'Snap|Id';
column snapdat     format a17  heading 'Snap Started' just c;
column lvl         format 99   heading 'Snap|Level';
column commnt      format a22  heading 'Comment';

再次生成sp报告

SQL> @?/rdbms/admin/sprepins.sql
Instances in this Statspack schema
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   DB Id    Inst Num DB Name      Instance     Host
----------- -------- ------------ ------------ ------------
  631690435        1 VODAPP       vodapp       T5440-1
Enter value for dbid: 631690435
Using 631690435 for database Id
Enter value for inst_num: 1
Using 1 for instance number
Completed Snapshots
                             Snap                    Snap
Instance     DB Name           Id   Snap Started    Level Comment
------------ ------------ ------- ----------------- ----- ----------------------
vodapp       VODAPP         58916 16 Oct 2012 17:00     5
                            58917 16 Oct 2012 18:00     5
                            58918 16 Oct 2012 19:00     5
                            58919 16 Oct 2012 20:00     5
                            58920 16 Oct 2012 21:00     5
                            58921 16 Oct 2012 22:00     5
                            58922 16 Oct 2012 23:00     5
                            58923 17 Oct 2012 00:00     5

ORA-00600[3689]引起MRP进程异常

今天检查数据库发现一数据库因为添加数据文件导致ORA-00600[3689]错误,进入引起MRP进程异常,从而使得数据库可以正常接收归档日志,但是不能在备库应用归档日志
数据库版本

SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for Solaris: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 – Production

检查日志

Fri Oct 19 12:00:22 2012
RFS[1]: No standby redo logfiles created
RFS[1]: Archived Log: '/baceldba/archive/1_30374_730819916.dbf'
Fri Oct 19 12:00:28 2012
Media Recovery Log /baceldba/archive/1_30374_730819916.dbf
WARNING: File being created with same name as in Primary
Existing file may be overwritten
Fri Oct 19 12:00:50 2012
Recovery created file /data/oracle/oradata/vodapp/NETTVCLPS02.dbf
Successfully added datafile 32 to media recovery
Datafile #32: '/data/oracle/oradata/vodapp/NETTVCLPS02.dbf '
WARNING: File being created with same name as in Primary
Existing file may be overwritten
Fri Oct 19 12:01:15 2012
Recovery created file /data/oracle/oradata/vodapp/UCICMS01_DATA04.dbf
Successfully added datafile 33 to media recovery
Datafile #33: '/data/oracle/oradata/vodapp/UCICMS01_DATA04.dbf'
Fri Oct 19 12:01:16 2012
Errors in file /data/oracle/admin/vodapp/bdump/vodapp_mrp0_21304.trc:
ORA-00600: internal error code, arguments: [3689], [33], [], [], [], [], [], []
Errors with log /baceldba/archive/1_30374_730819916.dbf
MRP0: Background Media Recovery terminated with error 600  <--mrp进程异常
Fri Oct 19 12:01:16 2012
Errors in file /data/oracle/admin/vodapp/bdump/vodapp_mrp0_21304.trc:
ORA-00600: internal error code, arguments: [3689], [33], [], [], [], [], [], []
Some recovered datafiles maybe left media fuzzy
Media recovery may continue but open resetlogs may fail
Fri Oct 19 12:01:18 2012
Errors in file /data/oracle/admin/vodapp/bdump/vodapp_mrp0_21304.trc:
ORA-00600: internal error code, arguments: [3689], [33], [], [], [], [], [], []
Fri Oct 19 12:01:18 2012
Errors in file /data/oracle/admin/vodapp/bdump/vodapp_mrp0_21304.trc:
ORA-00600: internal error code, arguments: [3689], [33], [], [], [], [], [], []

通过观察日志发现,数据库的mrp0进程因为ora-600的错误导致异常终止,从而使得dg备库无法正常使用归档日志

分析trace文件得出

*** 2012-10-19 12:01:16.327
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [3689], [33], [], [], [], [], [], []
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp()+744         CALL     ksedst()             000000840 ?
                                                   FFFFFFFF7FFFA08C ?
                                                   000000000 ?
                                                   FFFFFFFF7FFF6B80 ?
                                                   FFFFFFFF7FFF58E8 ?
                                                   FFFFFFFF7FFF62E8 ?
kgeriv()+220         PTR_CALL 0000000000000000     000106000 ? 106323304 ?
                                                   106323000 ? 000106323 ?
                                                   000106000 ? 106323304 ?
kgesiv()+112         CALL     kgeriv()             10631DCD0 ? 000000000 ?
                                                   000000E69 ? 000000001 ?
                                                   FFFFFFFF7FFFA5E8 ?
                                                   000001430 ?
ksesic1()+96         CALL     kgesiv()             10631DCD0 ? 10646AD00 ?
                                                   000000E69 ? 000000001 ?
                                                   FFFFFFFF7FFFA5E8 ?
                                                   000001420 ?
kcvadc1_10gr2()+265  CALL     ksesic1()            000000E69 ? 00010631D ?
2                                                  1063232F8 ? 000106000 ?
                                                   106323000 ? 000106323 ?
kcoapl()+5664        PTR_CALL 0000000000000000     FFFFFFFF7A5570F8 ?
                                                   FFFFFFFF7FFFB320 ?
                                                   FFFFFFFF76667E18 ?
                                                   000000000 ? 000000021 ?
                                                   000000020 ?
kcramr()+5352        CALL     kcoapl()             FFFFFFFF76667E00 ?
                                                   000000003 ? 1055136B0 ?
                                                   105513340 ? 00000001B ?
                                                   000000017 ?
krddmr()+2688        CALL     kcramr()             000000000 ?
                                                   FFFFFFFF7A55FF78 ?
                                                   000002000 ? 000106000 ?
                                                   0000001F6 ? 464897868 ?
krsmdp()+2332        CALL     krddmr()             105502000 ?
                                                   FFFFFFFF7A5570F8 ?
                                                   000000001 ? 10646C760 ?
                                                   464897868 ? 000000000 ?
ksbrdp()+896         PTR_CALL 0000000000000000     000380000 ? 000380000 ?
                                                   105517000 ? 000106328 ?
                                                   000106000 ? 00000FFFF ?
opirip()+824         CALL     ksbrdp()             000106320 ? 380007734 ?
                                                   000380000 ? 00038000E ?
                                                   000106000 ? 1005DCF80 ?
opidrv()+1200        CALL     opirip()             10632A000 ? 000106000 ?
                                                   000106332 ? 380007000 ?
                                                   00010632A ? 1063D73A0 ?
sou2o()+80           CALL     opidrv()             10632D460 ? 000000001 ?
                                                   000000032 ? 000000000 ?
                                                   000000032 ? 000106000 ?
opimai_real()+268    CALL     sou2o()              FFFFFFFF7FFFF968 ?
                                                   000000032 ? 000000004 ?
                                                   FFFFFFFF7FFFF990 ?
                                                   105C1F000 ? 000105C1F ?
main()+152           CALL     opimai_real()        000000003 ?
                                                   FFFFFFFF7FFFFA68 ?
                                                   000000000 ? 000000000 ?
                                                   0023F9764 ? 000014400 ?
_start()+380         CALL     main()               000000001 ?
                                                   FFFFFFFF7FFFFB78 ?
                                                   000000000 ?
                                                   FFFFFFFF7FFFFA70 ?
                                                   FFFFFFFF7FFFFB80 ?
                                                   FFFFFFFF7CF00200 ?
--------------------- Binary Stack Dump ---------------------

查询mos发现
Bug 5230162 : ORA-600[3689][42] ON PHYSICAL STANDBY PREVENT MEDIA RECOVERY描述相符

处理建议
1.10.2.0.1数据库不稳定,bug较多,建议升级数据库到10.2.0.4或者10.2.0.5
2.当前standby_file_management为true,建议修改为false,每次手工两边增加数据文件
3.主库添加数据文件后,观察备库的mrp进程是否正常,如果异常
1)尝试启动mrp进程,如果因为新增加的数据文件导致异常,那先使用rman或者其他方法拷贝主库新建立数据文件到备库,然后启动mrp进程
2)如果因为某种原因导致归档日志出现gap,可以从备份中找出归档日志,如果备份中没有,那可以使用基于scn的备份方式来修复该故障。
3)如果scn相差较大,建议直接重建备库