spfile被覆盖导致ORA-600[kmgs_parameter_update_timeout_1]

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:spfile被覆盖导致ORA-600[kmgs_parameter_update_timeout_1]

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库出现如下错误ORA-00600[kmgs_parameter_update_timeout_1]

Thu Jun 21 17:42:45 BEIST 2012
alter tablespace TS_TAB_WG_SYSMGR_01 add datafile '/dev/rvgoradata3_1_01'
Thu Jun 21 17:42:58 BEIST 2012
Completed: alter tablespace TS_TAB_WG_SYSMGR_01 add datafile '/dev/rvgoradata3_1_01'
Thu Jun 21 17:45:31 BEIST 2012
System State dumped to trace file /oracle/app/oracle/admin/bomc3/bdump/bomc3_mmon_19530138.trc
Thu Jun 21 17:45:42 BEIST 2012
Errors in file /oracle/app/oracle/admin/bomc3/bdump/bomc3_mmon_19530138.trc:
ORA-00600: internal error code, arguments: [kmgs_parameter_update_timeout_1], [1565], [], [], [], [], [], []
ORA-01565: error in identifying file '/dev/rvgoradata3_1_01'
ORA-27086: unable to lock file - already in use
IBM AIX RISC System/6000 Error: 13: Permission denied
Additional information: 8
Additional information: 18874484
Thu Jun 21 17:45:49 BEIST 2012
Errors in file /oracle/app/oracle/admin/bomc3/bdump/bomc3_dbw0_18874484.trc:
ORA-00600: internal error code, arguments: [ksprcvsp1], [0], [0], [], [], [], [], []
Thu Jun 21 17:45:52 BEIST 2012
Errors in file /oracle/app/oracle/admin/bomc3/bdump/bomc3_dbw0_18874484.trc:
ORA-00600: internal error code, arguments: [kmgs_parameter_update_timeout_1], [600], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [ksprcvsp1], [0], [0], [], [], [], [], []
Thu Jun 21 17:45:53 BEIST 2012
Errors in file /oracle/app/oracle/admin/bomc3/bdump/bomc3_dbw0_18874484.trc:
ORA-00600: internal error code, arguments: [kmgs_parameter_update_timeout_1], [600], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [ksprcvsp1], [0], [0], [], [], [], [], []
Thu Jun 21 17:45:53 BEIST 2012
DBW0: terminating instance due to error 471
Instance terminated by DBW0, pid = 18874484

通过这个错误可以看出大概:TS_TAB_WG_SYSMGR_01增加数据文件/dev/rvgoradata3_1_01成功后,然后mmon启动收集统计信息,读取spfile文件信息出错.最后dbw进程读取spfile文件出错,使得dbwn进程终止,从而数据库abort掉.通过这些信息,初步怀疑是增加数据文件的时候,错误的把spfile文件的裸设备作为一个新数据文件增加到数据库中,导致spfile被覆盖,从而出现mmon和dbwn访问spfile出错.

找出证据
如果spfile使用裸设备而且文件名是dev/rvgoradata3_1_01,那很可能是通过init_SID.ora中的spfile项实现,查找该文件内容果然发现

[zwq_acc1:/home/xifenfei]cat initbomc3.ora
spfile='/dev/rvgoradata3_1_01'

通过这些可以确定是用户增加数据文件时,错误的把spfile文件当中新的控制问及爱你增加到相关表空间中导致该问题.

解决办法
1.如果有备份spfile文件,使用备份spfile文件
2.如果有pfile文件,使用pfile创建spfile
3.如果上面两个都没有,那么使用alert中相关信息创建pfile文件然后创建spfile

11G RAC库 ORA-00600[ktubko_1]错误

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:11G RAC库 ORA-00600[ktubko_1]错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库版本信息

SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') "www.xifenfei.com" from dual;
www.xifenfei.com
-------------------
2012-06-12 14:44:53

数据库启动报ORA-00600[ktubko_1]错误

Database Characterset is ZHS16GBK
Errors in file /u01/diag/rdbms/xff/XFF1/trace/XFF1_smon_17248.trc  (incident=21754):
ORA-00600: internal error code, arguments: [ktubko_1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/diag/rdbms/xff/XFF1/incident/incdir_21754/XFF1_smon_17248_i21754.trc
Tue Jun 12 10:37:10 2012
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORACLE Instance XFF1 (pid = 19) - Error 600 encountered while recovering transaction (4, 2) on object 5887.
Errors in file /u01/diag/rdbms/xff/XFF1/trace/XFF1_smon_17248.trc:
ORA-00600: internal error code, arguments: [ktubko_1], [], [], [], [], [], [], [], [], [], [], []

查看trace文件

Incorrect next uba in kturCurrBackoutOneChg while backing out xid: 0x0004.002.0000022b uba: 0x00c02068.00af.3b
Undo record:
ktubu redo: slt: 2 rci: 58 opc: 10.22 objn: 5887 objd: 5887 tsn: 1
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo:  No
             0x00000000
index undo for leaf key operations
KTB Redo
op: 0x04  ver: 0x01
compat bit: 4 (post-11) padding: 1
op: L  itl: xid:  0x0003.003.0000030a uba: 0x00c0200c.010a.15
                      flg: C---    lkc:  0     scn: 0x0000.00118c55
Dump kdilk : itl=3, kdxlkflg=0x1 sdc=0 indexid=0x800df2 block=0x00800df3
(kdxlre): restore leaf row (clear leaf delete flags)
key :(5):  02 c1 0d 01 80
keydata/bitmap: (6):  00 81 0f 41 00 01
Undo block: tsn 0x2 rdba: 0xc02068
Dump of buffer cache at level 4 for tsn=2 rdba=12591208
BH (0x33ff7264) file#: 3 rdba: 0x00c02068 (3/8296) class: 24 ba: 0x33f24000
*** 2012-06-12 10:36:40.265
  set: 5 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,0
  dbwrid: 0 obj: -1 objn: 0 tsn: 2 afn: 3 hint: f
  hash: [0x3e78a49c,0x3e78a49c] lru: [0x33ff73ec,0x33ff723c]
  obj-flags: object_ckpt_list
  ckptq:[0x3e05cd68,0x33ff7f0c]fileq:[0x3e05cda4,0x3e05cda4]objq:[0x3b9094a4,0x3b9094a4]objaq:[0x33ff7acc,0x3b909494]
  st: XCURRENT md: NULL fpin: 'ktuwh23: ktubko' tch: 1
  flags: buffer_dirty redo_since_read
  LRBA: [0x15.26.0] LSCN: [0x0.11acb6] HSCN: [0x0.11acb6] HSUB: [1]
Data block dump: tsn: 0x1 rdba: 0x800df3
Dump of buffer cache at level 3 for tsn=1 rdba=8392179
BH (0x33ff70b4) file#: 2 rdba: 0x00800df3 (2/3571) class: 1 ba: 0x33f20000
  set: 5 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,0
  dbwrid: 0 obj: 5887 objn: 5887 tsn: 1 afn: 2 hint: f
  hash: [0x3e7d85e4,0x3e7d85e4] lru: [0x33ff723c,0x3e05c760]
  ckptq: [NULL] fileq: [NULL] objq: [0x3b9092a8,0x3b9092a8] objaq: [0x3b9092a0,0x3b9092a0]
  st: XCURRENT md: NULL fpin: 'kdiwh27: kdiulk' tch: 1
  flags:
  LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
  buffer tsn: 1 rdba: 0x00800df3 (2/3571)
  scn: 0x0000.00118c67 seq: 0x01 flg: 0x06 tail: 0x8c670601
  frmt: 0x02 chkval: 0x5d04 type: 0x06=trans data
Block header dump:  0x00800df3
 Object id on Block? Y
 seg/obj: 0x16ff  csc: 0x00.118c60  itc: 3  flg: E  typ: 2 - INDEX
     brn: 0  bdba: 0x800df0 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x02   0x0006.01d.00000319  0x00c00332.009b.15  --U-    1  fsc 0x000f.00118c67
0x03   0x0003.003.0000030a  0x00c0200c.010a.15  C---    0  scn 0x0000.00118c55
kcra_dump_redo_internal: skipped for critical process
Dumping redo for undo$
kcra_dump_redo_internal: skipped for critical process
*** 2012-06-12 10:36:43.906
Incident 21754 created, dump file: /u01/diag/rdbms/xff/XFF1/incident/incdir_21754/XFF1_smon_17248_i21754.trc
ORA-00600: internal error code, arguments: [ktubko_1], [], [], [], [], [], [], [], [], [], [], []
ORACLE Instance XFF1 (pid = 19) - Error 600 encountered while recovering transaction (4, 2) on object 5887.
*** 2012-06-12 10:37:10.646
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0)
----- Error Stack Dump -----
ORA-00600: internal error code, arguments: [ktubko_1], [], [], [], [], [], [], [], [], [], [], []
----- SQL Statement (None) -----
Current SQL information unavailable - no cursor.
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
skdstdst()+41        call     kgdsdst()            BFFD84FC ? 2 ?
ksedst1()+77         call     skdstdst()           BFFD84FC ? 0 ? 1 ? 8592A48 ?
                                                   85928C6 ? 8592A48 ?
ksedst()+33          call     ksedst1()            0 ? 1 ?
dbkedDefDump()+2704  call     ksedst()             0 ? 0 ? 3FFE17CC ? 3E7C5344 ?
                                                   1 ? 1 ?
ksedmp()+47          call     dbkedDefDump()       3 ? 0 ?
kturRecoverTxn()+52  call     ksedmp()             3 ? B6B3FEA4 ? 2C ?
471                                                B6B3FE38 ? 1 ? B6B3FE58 ?
kturRecoverUndoSegm  call     kturRecoverTxn()     BFFD9078 ? 2 ? 1 ? 0 ? 11 ?
ent()+1091                                         4 ? 1 ?
kturRecoverActiveTx  call     kturRecoverUndoSegm  4 ? 0 ? 1 ? 0 ? FFFF ? 11 ?
ns()+931                      ent()                4 ?
ktprbeg()+281        call     kturRecoverActiveTx  10 ? 0 ?
                              ns()
ktmmon()+13050       call     ktprbeg()            0 ? 1 ? 0 ? B6B5B72C ? 0 ?
                                                   0 ?
ktmSmonMain()+174    call     ktmmon()             20018C2C ? B6B46D2C ?
                                                   114E7B00 ? 0 ? 0 ? B6B59F50 ?
ksbrdp()+826         call     00000000             20018C2C ? 432A884E ? 0 ? 0 ?
                                                   0 ? 0 ?
opirip()+559         call     ksbrdp()             0 ? 0 ? 0 ? 0 ? 0 ? 0 ?
opidrv()+515         call     opirip()             32 ? 4 ? BFFDB20C ?
sou2o()+80           call     opidrv()             32 ? 4 ? BFFDB20C ?
opimai_real()+230    call     sou2o()              BFFDB1F0 ? 32 ? 4 ?
                                                   BFFDB20C ?
ssthrdmain()+212     call     00000000             3 ? BFFDB338 ? 0 ? 4318AF14 ?
                                                   BFFDB2F4 ? 4317E670 ?
main()+147           call     ssthrdmain()         3 ? BFFDB338 ?
__libc_start_main()  call     00000000             1 ? BFFDB434 ? BFFDB43C ?
+220                                               4317E828 ? 0 ? 1 ?
_start()+33          call     __libc_start_main()  856F1C4 ? 1 ? BFFDB434 ?
                                                   BCF1440 ? BCF1430 ?
                                                   43170790 ?
--------------------- Binary Stack Dump ---------------------

通过上面的trace可以看出是2/3571中包含了事务,但是和3/8296[4号回滚段]回滚中的信息不相符,从而出现了在数据库启动回滚的时候出现该错误.查询mos[ID 1318986.1]发现这个是数据库的Bug 10205230比较相似,虽说在11.2.0.2中修复而且在asm中不受该影响,我这里库是11.2.0.3的asm rac照样出现该bug.

解决方法
通过alert日志提示object可以找到object_id=5887.当然也可以通过trace中的rdba来确定

SQL> col OBJECT_NAME for a30
SQL> select object_name,object_type,owner from dba_objects where object_id=5887;
OBJECT_NAME                    OBJECT_TYPE         OWNER
------------------------------ ------------------- ------------------------------
WRI$_ADV_MESSAGE_GROUPS_PK     INDEX               SYS
SQL> alter index sys.WRI$_ADV_MESSAGE_GROUPS_PK rebuild online;
Index altered.

补充说明:如果损坏对象是表,需要使用DBMS_REPAIR跳过坏块,然后重建表

重启数据库观察
数据库已经正常,开始报undo回滚段错误的记录已经不再存在,数据库恢复正常

Tue Jun 12 13:50:43 2012
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Tue Jun 12 13:51:11 2012
No Resource Manager plan active
Tue Jun 12 13:52:01 2012
Starting background process GTX0
Tue Jun 12 13:52:01 2012
GTX0 started with pid=29, OS id=14234
Starting background process RCBG
Tue Jun 12 13:52:04 2012
RCBG started with pid=41, OS id=14238
replication_dependency_tracking turned off (no async multimaster replication found)
Tue Jun 12 13:54:01 2012
Starting background process QMNC
Tue Jun 12 13:54:01 2012
QMNC started with pid=42, OS id=14279
Tue Jun 12 13:57:26 2012
Completed: ALTER DATABASE OPEN

因使用OEM引起ORA-00600[12761]

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:因使用OEM引起ORA-00600[12761]

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

alert日志报ORA-00600[12761]错

Sun Jun 10 13:52:56 2012
Errors in file e:\oracle\product\10.2.0\admin\interlib\udump\interlib_ora_19840.trc:
ORA-04030: 在尝试分配 82444 字节 (pga heap,control file i/o buffer) 时进程内存不足
ORA-00600: 内部错误代码, 参数: [12761], [], [], [], [], [], [], []
ORA-00604: 递归 SQL 级别 2 出现错误
ORA-04030: 在尝试分配 123404 字节 (QERHJ hash-joi,kllcqas:kllsltba) 时进程内存不足

数据库版本信息

Sun Jun 10 13:52:56 2012
ORACLE V10.2.0.1.0 - Production vsnsta=0
vsnsql=14 vsnxtr=3
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
Windows NT Version V5.2 Service Pack 2
CPU                 : 8 - type 586, 2 Physical Cores
Process Affinity    : 0x00000000
Memory (Avail/Total): Ph:1263M/4095M, Ph+PgF:2716M/5976M, VA:19M/2047M
Instance name: interlib

trace信息

*** 2012-06-10 13:52:56.763
ksedmp: internal or fatal error
ORA-00600: 内部错误代码, 参数: [12761], [], [], [], [], [], [], []
ORA-00604: 递归 SQL 级别 2 出现错误
ORA-04030: 在尝试分配 123404 字节 (QERHJ hash-joi,kllcqas:kllsltba) 时进程内存不足
Current SQL statement for this session:
BEGIN EM_PING.RECORD_BATCH_HEARTBEAT(:1, :2, :3); END;
----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
0BC35C44         1  anonymous block
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
00404686             CALLrel  00404694             0 1
0040307E             CALLrel  00404660             0
0043AB6A             CALLrel  00402CFC             3
603A816A             CALLreg  00000000             6190E2E0 3
603A8550             CALLrel  603A80D8             6190E2E0 5E340020 31D9 0
                                                   5ED6CDF8
031B7197             CALLrel  025FA21E             6190E2E0 5E340020 31D9 2 0
02C92859             CALLrel  02C92360             5ED6D2B4 5ED6D3CC 2 61BA71E4
                                                   0 5ED6CEFA
60BAD7C6             CALL???  00000000             5ED6D2B4 5ED6D3CC 3 61BA71E4
                                                   0 5ED6CEFA
60C41C40             CALLrel  60BAD758             5D9356A0 F1 3 5E344888
60C3C780             CALL???  00000000             5D9356A0 951F190 5D9356DC
60C3D1BD             CALLrel  60C3C748             5D9356A0 95A97F0 5D9356DC
60BB0392             CALLrel  60C3CEB0             5D9356A0
60B89393             CALLrel  60BB00B0             5D9356A0 1 0
02600CD9             CALLrel  0260F22C
0140AF2C             CALLrel  02600B3C             4E8EC08
013CBFEC             CALLrel  01409984             49 3 5ED6DB14
0085174B             CALLreg  00000000             5E 17 5ED6F6F8
60FEFF8D             CALLreg  00000000             5E 17 5ED6F6F8 0
00850A69             CALL???  00000000
0122134B             CALLrel  00850670             0 0
0085174B             CALLreg  00000000             3C 4 5ED6FC90
00420E53             CALLrel  00851300             3C 4 5ED6FC90 0
00421645             CALLrel  00420B20             3C 4 5ED6FC90
0040116C             CALLrel  00421618             5ED6FC84 3C 4 5ED6FC90
0040105C             CALLrel  004010FC             2 5ED6FCBC
00401900             CALLrel  00401000
7C82482C             CALLreg  00000000
--------------------- Binary Stack Dump ---------------------
--会话信息
    (session) sid: 525 trans: 00000000, creator: 7AE024D8, flag: (41) USR/- BSY/-/-/-/-/-
              DID: 0001-001B-00000004, short-term DID: 0000-0000-00000000
              txn branch: 00000000
              oct: 47, prv: 0, sql: 7A0F0A38, psql: 7A0A2430, user: 51/SYSMAN
    O/S info: user: , term: , ospid: 1234, machine: tushuguan01
              program: OMS
    client info: tushuguan01_Management_Service
    application name: OEM.SystemPool, hash value=2960518376

通过这里我们可以得到几个信息
1.数据库先发生了ORA-00600[12761],然后引发了ORA-04030
2.引发ORA-00600[12761]错误的原因是因为OEM的某种操作导致
3.未知因某种原因导致Call Stack Trace信息不完善,无法准确评估bug情况
4.查询数据库当前最大使用使用pga为250M,数据库配置pga为500M,原则上讲不是pga消耗完导致4030错误,可能是这个会话在执行某个基表的查询时候的hash-jion运算时pga不足导致.
5.查询dba_users发现EM_PING不是数据库用户,查询dba_source发现RECORD_BATCH_HEARTBEAT不是plsql名称,从这里可以看出OEM调用程序有一定特殊性

对于该问题的解决方案
1.因为OEM功能不太使用,建议直接关闭该进程,并设置为开机不自动启动
2.因为信息不完善,无法确定具体bug,但目前数据库版本为10.2.0.1,强烈建议升级到新版本

记录一次ORA-00600[2252]故障解决

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:记录一次ORA-00600[2252]故障解决

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库alert日志报ORA-00600[2252]

Wed Jun 06 08:56:02 2012
Thread 1 cannot allocate new log, sequence 552
Checkpoint not complete
  Current log# 1 seq# 551 mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO01.LOG
Thread 1 advanced to log sequence 552
  Current log# 2 seq# 552 mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO02.LOG
Sun Jun 06 09:39:19 2010
Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl_m000_3344.trc:
ORA-00600: 内部错误代码, 参数: [2252], [2834], [4076554712], [], [], [], [], []
Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl_m000_3344.trc:
ORA-00600: 内部错误代码, 参数: [2252], [2834], [4076554712], [], [], [], [], []
Sun Jun 06 10:19:49 2010
Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl_m000_4904.trc:
ORA-00600: 内部错误代码, 参数: [2252], [2834], [4076555573], [], [], [], [], []
Sun Jun 06 10:20:49 2010
Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl_m000_5984.trc:
ORA-00600: 内部错误代码, 参数: [2252], [2834], [4076555594], [], [], [], [], []
Sun Jun 06 10:21:49 2010
Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl_m000_4204.trc:
ORA-00600: 内部错误代码, 参数: [2252], [2834], [4076555614], [], [], [], [], []
Sun Jun 06 10:22:49 2010
Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl_m000_5896.trc:
ORA-00600: 内部错误代码, 参数: [2252], [2834], [4076555634], [], [], [], [], []
Sun Jun 06 10:23:49 2010
Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl_m000_4612.trc:
ORA-00600: 内部错误代码, 参数: [2252], [2834], [4076555654], [], [], [], [], []
Sun Jun 06 10:24:49 2010
Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl_m000_4696.trc:
ORA-00600: 内部错误代码, 参数: [2252], [2834], [4076555676], [], [], [], [], []
Sun Jun 06 10:25:50 2010
Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl_m000_5568.trc:
ORA-00600: 内部错误代码, 参数: [2252], [2834], [4076555696], [], [], [], [], []
Sun Jun 06 10:26:50 2010
Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl_m000_5776.trc:
ORA-00600: 内部错误代码, 参数: [2252], [2834], [4076555716], [], [], [], [], []
--启动数据库
Mon Jun 07 09:18:39 2010
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Mon Jun 07 09:18:49 2010
Picked latch-free SCN scheme 2
Using LOG_ARCHIVE_DEST_10 parameter default value as USE_DB_RECOVERY_FILE_DEST
Autotune of undo retention is turned on.
IMODE=BR
ILAT =18
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.1.0.
System parameters with non-default values:
  processes                = 150
  __shared_pool_size       = 100663296
  __large_pool_size        = 12582912
  __java_pool_size         = 4194304
  __streams_pool_size      = 0
  spfile                   = D:\ORACLE\PRODUCT\10.2.0\DB_1\DBS\SPFILEORCL.ORA
  nls_language             = SIMPLIFIED CHINESE
  nls_territory            = CHINA
  sga_target               = 452984832
  control_files            = D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\CONTROL01.CTL
  db_block_size            = 8192
  __db_cache_size          = 327155712
  compatible               = 10.2.0.1.0
  db_file_multiblock_read_count= 16
  db_recovery_file_dest    = D:\oracle\product\10.2.0/flash_recovery_area
  db_recovery_file_dest_size= 2147483648
  undo_management          = AUTO
  undo_tablespace          = UNDOTBS1
  remote_login_passwordfile= EXCLUSIVE
  db_domain                =
  dispatchers              = (protocol=TCP)
  shared_servers           = 1
  job_queue_processes      = 10
  audit_file_dest          = D:\ORACLE\PRODUCT\10.2.0\ADMIN\ORCL\ADUMP
  background_dump_dest     = D:\ORACLE\PRODUCT\10.2.0\ADMIN\ORCL\BDUMP
  user_dump_dest           = D:\ORACLE\PRODUCT\10.2.0\ADMIN\ORCL\UDUMP
  core_dump_dest           = D:\ORACLE\PRODUCT\10.2.0\ADMIN\ORCL\CDUMP
  db_name                  = orcl
  open_cursors             = 300
  pga_aggregate_target     = 149946368
PSP0 started with pid=3, OS id=3028
MMAN started with pid=4, OS id=3528
PMON started with pid=2, OS id=2772
DBW0 started with pid=5, OS id=816
CKPT started with pid=7, OS id=3372
SMON started with pid=8, OS id=2584
RECO started with pid=9, OS id=3976
CJQ0 started with pid=10, OS id=1912
MMON started with pid=11, OS id=624
Mon Jun 07 09:19:00 2010
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started with pid=12, OS id=2696
Mon Jun 07 09:19:00 2010
starting up 1 shared server(s) ...
LGWR started with pid=6, OS id=3128
Mon Jun 07 09:19:04 2010
alter database mount exclusive
Mon Jun 07 09:19:09 2010
Setting recovery target incarnation to 2
Mon Jun 07 09:19:10 2010
Successful mount of redo thread 1, with mount id 1248834568
Mon Jun 07 09:19:10 2010
Database mounted in Exclusive Mode
Completed: alter database mount exclusive
Mon Jun 07 09:19:10 2010
alter database open
Mon Jun 07 09:19:15 2010
Beginning crash recovery of 1 threads
 parallel recovery started with 2 processes
Mon Jun 07 09:19:18 2010
Started redo scan
Mon Jun 07 09:19:19 2010
Completed redo scan
 13 redo blocks read, 7 data blocks need recovery
Mon Jun 07 09:19:20 2010
Started redo application at
 Thread 1: logseq 552, block 28631
Mon Jun 07 09:19:20 2010
Recovery of Online Redo Log: Thread 1 Group 2 Seq 552 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO02.LOG
Mon Jun 07 09:19:20 2010
Completed redo application
Mon Jun 07 09:19:20 2010
Completed crash recovery at
 Thread 1: logseq 552, block 28644, scn 12176013920948
 7 data blocks read, 7 data blocks written, 13 redo blocks read
Mon Jun 07 09:19:28 2010
Errors in file d:\oracle\product\10.2.0\admin\orcl\udump\orcl_ora_2688.trc:
ORA-00600: 内部错误代码, 参数: [2252], [2834], [4076604085], [], [], [], [], []

通过这些日志可以看出数据库一直在报ORA-00600[2252],在后来因某种原因数据库异常重启后启动不了.仔细观察可以发现系统显示的时间是2010年6月7日和当前时间相差了整整两年.
At any point in time, the Oracle Database calculates a “not to exceed” limit for the number of SCNs a database can have used, based on the number of seconds elapsed since 1988, multiplied by 16,384. This is known as the database’s current maximum SCN limit. Doing this ensures that Oracle Databases will ration SCNs over time, allowing over 500 years of data processing for any Oracle Database.

错误原因
根据错误提示计算scn(2834为现在系统的SCN WRAP,4076604085就是BASE)=2834*2^32+4076604085=12176013920949
根据数据库日志显示系统时间计算最大scn值:

SQL>select to_number( ((to_date('20100607 09:19:28','yyyymmdd hh24:mi:ss')-
  2 to_date('19880101','yyyymmdd'))*24*3600*16*1024),'999999999999999999') max_scn from dual;
   MAX_SCN
--------------
11598377254912

通过这里的计算可以知道数据库当前的SCN大于系统时间点上允许的最大时间的SCN,从而出现ORA-00600[2252]错误.

解决方法
知道了数据库报该错误的原因,那么解决该问题很简单,修改系统时间到正确的时间点即可

SQL> select to_number(((sysdate-to_date('19880101','yyyymmdd'))*24*3600*16*1024),
   2 '999999999999999999') max_scn from dual;
   MAX_SCN
--------------
12634899464192

该SCN大于数据库当前SCN所有数据库不会报ORA-00600[2252]错误可以正常启动.

记录另一起ORA-00600[13013]处理

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:记录另一起ORA-00600[13013]处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

发现ORA-00600[13013]错误
During the execution of an UPDATE statement, after several attempts (Arg [a] passcount) we are unable to get a stable set of rows that conform to the WHERE clause.

Fri Jun  1 03:00:33 2012
Errors in file /opt/oracle/admin/oraapp/bdump/oraapp_m000_12104.trc:
ORA-00600: internal error code, arguments: [13013], [5001], [8943], [12596577], [25], [12596578], [17], []
Sat Jun  2 03:01:05 2012
Errors in file /opt/oracle/admin/oraapp/bdump/oraapp_m000_1052.trc:
ORA-00600: internal error code, arguments: [13013], [5001], [8943], [12596577], [25], [12596578], [17], []
Sun Jun  3 15:00:50 2012
Errors in file /opt/oracle/admin/oraapp/bdump/oraapp_m000_13876.trc:
ORA-00600: internal error code, arguments: [13013], [5001], [8943], [12596577], [25], [12603219], [17], []
Mon Jun  4 03:01:05 2012
Errors in file /opt/oracle/admin/oraapp/bdump/oraapp_m000_7704.trc:
ORA-00600: internal error code, arguments: [13013], [5001], [8943], [12596577], [25], [12596578], [17], []
Tue Jun  5 03:00:35 2012
Errors in file /opt/oracle/admin/oraapp/bdump/oraapp_m000_27983.trc:
ORA-00600: internal error code, arguments: [13013], [5001], [8943], [12596577], [25], [12596578], [17], []
Wed Jun  6 03:01:07 2012
Errors in file /opt/oracle/admin/oraapp/bdump/oraapp_m000_19204.trc:
ORA-00600: internal error code, arguments: [13013], [5001], [8943], [12596577], [25], [12596578], [17], []
Thu Jun  7 03:00:37 2012
Errors in file /opt/oracle/admin/oraapp/bdump/oraapp_m000_7273.trc:
ORA-00600: internal error code, arguments: [13013], [5001], [8943], [12596577], [25], [12605556], [17], []

以前处理过一次ORA-600[13013],里面包含了各参数含义,这次也按照常规方法处理,分析如下:
1.通过trace文件找出对应表

*** 2012-06-01 03:00:33.325
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [13013], [5001], [8943], [12596577], [25], [12596578], [17], []
Current SQL statement for this session:
UPDATE WRH$_SQL_BIND_METADATA SET snap_id = :lah_snap_id  WHERE dbid = :dbid    AND (SQL_ID)
IN (SELECT STR1_KEWRATTR FROM X$KEWRATTRSTALE)

2.通过ORA-600[13013]中表示rdba参数找出表

SQL> select  DBMS_UTILITY.data_block_address_file (12596577) "file#",
  2  DBMS_UTILITY.data_block_address_block (12596577) "block#"
  3  from dual;
     file#     block#
---------- ----------
         3      13665
SQL> select * from dba_extents where 13665 between block_id and block_id + blocks and file_id=3;
OWNER      SEGMENT_NAME           SEGMENT_TYPE     EXTENT_ID    FILE_ID   BLOCK_ID    BLOCKS
---------- --------------------  --------------- --------------- ---------- -------  ------
SYS        SYS_LOB0000008933C00  LOBSEGMENT           7          3      13657         8
SYS        WRH$_SQL_BIND_METADA  TABLE                1          3      13665         8

检查对象WRH$_SQL_BIND_METADA是否有坏块或者表和index不一致

SQL> analyze table SYS.WRH$_SQL_BIND_METADATA validate structure cascade online;
Table analyzed.

这里分析WRH$_SQL_BIND_METADA表正常,但是通过上面的查询证明WRH$_SQL_BIND_METADA的第一个extent的第一个数据块上可能出现问题,使得analyze未检查(自己猜猜,未做深入验证).针对这个问题,直接备份WRH$_SQL_BIND_METADATA表,truncate掉该表,然后重新插入数据(注意操作时间避开awr插入数据时间段)

create table SQL_BIND_METADATA_BAK
AS
SELECT * FROM SYS.WRH$_SQL_BIND_METADATA;
TRUNCATE TABLE SYS.WRH$_SQL_BIND_METADATA;
INSERT INTO SYS.WRH$_SQL_BIND_METADATA
SELECT * FROM SQL_BIND_METADATA_BAK;
DROP TABBLE SQL_BIND_METADATA_BAK PURGE;

ORA-600[6749] 发生在 SYSMAN.MGMT_METRICS_RAW表

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-600[6749] 发生在 SYSMAN.MGMT_METRICS_RAW表

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库alert日志长时间出现ORA-00600[6749]错误
日志报错如下

Fri Jun  1 12:01:30 2012
Errors in file /opt/oracle/admin/oraapp/bdump/oraapp_j000_396.trc:
ORA-00600: internal error code, arguments: [6749], [3], [12596882], [49], [], [], [], []
Fri Jun  1 12:01:34 2012
Errors in file /opt/oracle/admin/oraapp/bdump/oraapp_j000_396.trc:
Fri Jun  1 13:01:06 2012
Errors in file /opt/oracle/admin/oraapp/bdump/oraapp_j000_13226.trc:
ORA-00600: internal error code, arguments: [6749], [3], [12596882], [49], [], [], [], []
Fri Jun  1 13:01:10 2012
Errors in file /opt/oracle/admin/oraapp/bdump/oraapp_j000_13226.trc:
Fri Jun  1 14:01:46 2012
Errors in file /opt/oracle/admin/oraapp/bdump/oraapp_j000_26691.trc:
ORA-00600: internal error code, arguments: [6749], [3], [12596882], [49], [], [], [], []
Fri Jun  1 14:01:51 2012
Errors in file /opt/oracle/admin/oraapp/bdump/oraapp_j000_26691.trc:
Fri Jun  1 15:01:21 2012
Errors in file /opt/oracle/admin/oraapp/bdump/oraapp_j000_7119.trc:
ORA-00600: internal error code, arguments: [6749], [3], [12596882], [49], [], [], [], []

查看trace日志

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /opt/oracle/product/10/oraapp
System name:	Linux
Node name:	oracle2
Release:	2.6.18-92.el5
Version:	#1 SMP Tue Apr 29 13:16:15 EDT 2008
Machine:	x86_64
Instance name: oraapp
Redo thread mounted by this instance: 1
Oracle process number: 44
Unix process pid: 26691, image: oracle@oracle2 (J000)
*** ACTION NAME:(target 5) 2012-06-01 14:01:00.298
*** MODULE NAME:(Oracle Enterprise Manager.rollup) 2012-06-01 14:01:00.298
*** SERVICE NAME:(SYS$USERS) 2012-06-01 14:01:00.298
*** SESSION ID:(406.24103) 2012-06-01 14:01:00.298
Dumping current redo log in thread 1
DUMP OF REDO FROM FILE '/opt/oracle/oradata/oraapp/systable/redo03.log'
 Opcodes 11.*
 DBAs (file#, block#):
      (3, 13970)
 RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
 SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
 Times: creation thru eternity
 FILE HEADER:
	Compatibility Vsn = 169869568=0xa200100
	Db ID=1462349529=0x5729aed9, Db Name='ORAAPP'
	Activation ID=1462334681=0x572974d9
	Control Seq=2614156=0x27e38c, File size=245760=0x3c000
	File Number=3, Blksiz=512, File Type=2 LOG
 descrip:"Thread 0001, Seq# 0000003963, SCN 0x0000129fc9df-0xffffffffffff"

猜测ORA-600[6749]部分参数

SQL> select  DBMS_UTILITY.data_block_address_file (12596882) "file#",
  2  DBMS_UTILITY.data_block_address_block (12596882) "block#"
  3  from dual;
     file#     block#
---------- ----------
         3      13970

ORA-600[6749][a][b]{c}
这里证明c表示rdba

根据dba查询对象

SQL> select * from dba_extents where 13970 between block_id and block_id + blocks and file_id=3;
OWNER      SEGMENT_NAME                     SEGMENT_TYPE
---------- ------------------------------- -------------------
SYSMAN        SYS_IOT_OVER_10448                 TABLE
SQL> select owner,iot_name from dba_tables where table_name = 'SYS_IOT_OVER_10448';
OWNER                          IOT_NAME
------------------------------ ------------------------------
SYSMAN                         MGMT_METRICS_RAW
SQL>  ANALYZE TABLE SYSMAN.MGMT_METRICS_RAW  VALIDATE STRUCTURE CASCADE;
Table analyzed.

按照常理ORA-00600[6749]错误是因为坏块或者表和索引数据不一致导致,通过ANALYZE可以检查出来.这里显示正常,那可能是其他原因导致,查询MOS果然发现是ORA-600 [6749] Occurring on SYSMAN.MGMT_METRICS_RAW [ID 467439.1]

解决方法

The following workaround may resolve the problem temporarily:
1. Ensure you have a good backup before proceeding.
2. Create a copy of the SYSMAN.MGMT_METRICS_RAW table:
SQL> create table SYSMAN.MGMT_METRICS_RAW_COPY
as select * from SYSMAN.MGMT_METRICS_RAW;
3. Truncate the table:
SQL> truncate table SYSMAN.MGMT_METRICS_RAW;
May need  to disable trigger: "sysman.raw_metrics_after_insert" before proceeding.
Re-enable after the insert.
4. Re-insert the rows:
SQL> insert into SYSMAN.MGMT_METRICS_RAW
select * from SYSMAN.MGMT_METRICS_RAW_COPY;
SQL> commit;
5. Drop the copy table:
SQL> drop table SYSMAN.MGMT_METRICS_RAW_COPY;

动态修改PGA_AGGREGATE_TARGET 导致ORA-600[723]

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:动态修改PGA_AGGREGATE_TARGET 导致ORA-600[723]

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在以前分析过ORA-600[729](SGA内存泄露),这次遇到ORA-600[723](PGA内存泄露)
操作系统数据库信息

ORACLE V9.2.0.3.0 - Production vsnsta=0
vsnsql=12 vsnxtr=3
Windows 2000 Version 5.2 Service Pack 2, CPU type 586
Oracle9i Enterprise Edition Release 9.2.0.3.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.3.0 - Production
Windows 2000 Version 5.2 Service Pack 2, CPU type 586
Instance name: dsdata

alert报错ORA-600[723]

Tue Jun 05 12:16:35 2012
Shutting down instance: further logons disabled
Shutting down instance (immediate)
License high water mark = 274
Tue Jun 05 12:16:40 2012
alter database close normal
Tue Jun 05 12:16:40 2012
SMON: disabling tx recovery
SMON: disabling cache recovery
Tue Jun 05 12:16:40 2012
Shutting down archive processes
Archiving is disabled
Tue Jun 05 12:16:40 2012
ARCH shutting down
Tue Jun 05 12:16:40 2012
ARCH shutting down
ARC1: Archival stopped
Tue Jun 05 12:16:40 2012
ARC0: Archival stopped
Tue Jun 05 12:16:40 2012
Thread 1 closed at log sequence 406
Successful close of redo thread 1.
Tue Jun 05 12:16:41 2012
Completed: alter database close normal
Tue Jun 05 12:16:41 2012
alter database dismount
Completed: alter database dismount
ARCH: Archiving is disabled
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
ARCH: Archiving is disabled
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
Tue Jun 05 12:16:43 2012
Errors in file d:\oracle\admin\dsdata\udump\dsdata_ora_504.trc:
ORA-00600: internal error code, arguments: [723], [20664], [20664], [memory leak], [], [], [], []

通过alert日志可以知道,数据库shutdown immediate的时候报ORA-600[723]

分析trace文件

…………
EXTENT 147 addr=062ACCBC
  Chunk  62accc4 sz=     1252    free      "               "
  Chunk  62ad1a8 sz=     2060    freeable  "qesmmaLogInitia"
  Chunk  62ad9b4 sz=     2060    freeable  "qesmmaLogInitia"
  Chunk  62ae1c0 sz=     2060    freeable  "qesmmaLogInitia"
  Chunk  62ae9cc sz=     2060    freeable  "qesmmaLogInitia"
…………
EXTENT 153 addr=04232414
  Chunk  423241c sz=     4476    perm      "perm           "  alo=2868
  Chunk  4233598 sz=    18516    free      "               "
  Chunk  4237dec sz=     2060    freeable  "qesmmaLogInitia"
  Chunk  42385f8 sz=     2060    freeable  "qesmmaLogInitia"
  Chunk  4238e04 sz=     2060    freeable  "qesmmaLogInitia"
  Chunk  4239610 sz=     2060    freeable  "qesmmaLogInitia"
  Chunk  4239e1c sz=     2060    freeable  "qesmmaLogInitia"
…………
--查询发现没有释放的内容都是在qesmmaLogInitia部分
*** 2012-06-05 12:16:43.000
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [723], [20664], [20664], [memory leak], [], [], [], []
Current SQL information unavailable - no SGA.
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
_ksedmp+147          CALLrel  _ksedst+0
_ksfdmp.108+e        CALLrel  _ksedmp+0            3
_kgeriv+89           CALLreg  00000000             217190 3
_kgesiv+4e           CALLrel  _kgeriv+0            217190 0 2D3 3 418FC2C
_ksesic3+3b          CALLrel  _kgesiv+0            217190 0 2D3 3 418FC2C 2D3 3
                                                   418FC2C
__VInfreq__ksmdpg+e  CALLrel  _ksesic3+0           2D3 0 50B8 0 50B8 1 B 26A3F28
f
_opidcl+1db          CALLrel  _ksmdpg+0
_opidrv+3bf          CALLrel  _opidcl+0            21D328 0
_sou2o+19            CALLrel  _opidrv+0
_opimai+150          CALLrel  _sou2o+0             418FE20 32 0 0
_BackgroundThreadSt  CALLrel  _opimai+0
art@4+164
77E6482C             CALLreg  00000000
--------------------- Binary Stack Dump ---------------------

通过查询MOS发现[ID 242260.1]上的Stack Trace比较匹配.上面说到通过sql来直接修改pga_aggregate_target导致,查找alert日志,果然发现:

Mon May 21 15:18:33 2012
ALTER SYSTEM SET pga_aggregate_target='1048576000' SCOPE=MEMORY;
Mon May 21 15:18:33 2012
ALTER SYSTEM SET pga_aggregate_target='1048576000' SCOPE=SPFILE;

现在基本上可以确定引起整个ORA-600[723]的原因是:用户直接修改pga_aggregate_target参数,然后关闭数据库引起Bug:2975617导致

处理建议
Don’t alter the pga_aggregate_target dynamically Change it in init.ora file
针对本库,再次开启数据库应该处于正常状态(spfile已经修改),无需继续关注该问题.

使用dbms_metadata.get_ddl出现ORA-31605错误

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:使用dbms_metadata.get_ddl出现ORA-31605错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

使用dbms_metadata.get_ddl出现ORA-31605错误

SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
PL/SQL Release 9.2.0.4.0 - Production
CORE    9.2.0.3.0       Production
TNS for Linux: Version 9.2.0.4.0 - Production
NLSRTL Version 9.2.0.4.0 - Production
SQL> select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') "www.xifenfei.com" from dual;
www.xifenfei.com
-------------------
2012-05-26 23:10:22
SQL> select dbms_metadata.get_ddl('TABLE','XFF_IOT','CHF1') from dual;
ERROR:
ORA-06502: PL/SQL: numeric or value error
ORA-31605: the following was returned from LpxXSLResetAllVars in routine kuxslResetParams:
LPX-1: NULL pointer
ORA-06512: at "SYS.UTL_XML", line 0
ORA-06512: at "SYS.DBMS_METADATA_INT", line 3320
ORA-06512: at "SYS.DBMS_METADATA_INT", line 4148
ORA-06512: at "SYS.DBMS_METADATA", line 458
ORA-06512: at "SYS.DBMS_METADATA", line 615
ORA-06512: at "SYS.DBMS_METADATA", line 1221
ORA-06512: at line 1
no rows selected

错误原因
dbms_metadata.get_ddl需要调用Oracle dictionary table “sys.metastylesheet.”中的XSL stylesheets,但是由于某种原因,使得调用失败,出现上述错误.因为该错误可能有:
1.XSL stylesheets没有安装
2.使用alter database 修改数据库字符集(本库是因为昨天修改字符集导致)

解决办法(sys用户执行)
1.在10g及其以上版本中(不带参数)

SQL> exec dbms_metadata_util.load_stylesheets;
PL/SQL procedure successfully completed.

2.在9i版本中(带dir参数)

SQL> exec dbms_metadata_util.load_stylesheets('/u01/oracle/9.2.0/db_1/rdbms/xml/xsl');
PL/SQL procedure successfully completed.
SQL> select dbms_metadata.get_ddl('TABLE','XFF_IOT','CHF1') from dual;
DBMS_METADATA.GET_DDL('TABLE','XFF_IOT','CHF1')
--------------------------------------------------------------------------------
  CREATE TABLE "CHF1"."XFF_IOT"
   (    "ID" NUMBER,
        "NAME" VARCHAR2(30),
         CONSTRAINT "CHF_IOT_ID#_PK" PRIMARY KEY ("ID") ENABLE
   ) ORGANIZATION INDEX NOCOMPRESS PCTFREE 10 INITRANS 2 MAXTRANS 255 LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
  TABLESPACE "SYSTEM"
 PCTTHRESHOLD 50
DBMS_METADATA.GET_DDL('TABLE','XFF_IOT','CHF1')
--------------------------------------------------------------------------------

记录一次rman备份ORA-19502/ORA-27063错误原因分析

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:记录一次rman备份ORA-19502/ORA-27063错误原因分析

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

rman备份出现ORA-19502/ORA-27063错误

RMAN> 2> 3> 4> 5> 6> 7> 8> 9> 10> 11> 12> 13> 14> 15> 16> 17> 18> 19> 20>
allocated channel: t11
channel t11: sid=824 instance=ncdb1 devtype=DISK
allocated channel: t12
channel t12: sid=838 instance=ncdb1 devtype=DISK
allocated channel: t13
channel t13: sid=809 instance=ncdb1 devtype=DISK
allocated channel: t14
channel t14: sid=886 instance=ncdb1 devtype=DISK
allocated channel: t15
channel t15: sid=620 instance=ncdb1 devtype=DISK
allocated channel: t16
channel t16: sid=599 instance=ncdb1 devtype=DISK
allocated channel: t17
channel t17: sid=482 instance=ncdb1 devtype=DISK
allocated channel: t18
channel t18: sid=506 instance=ncdb1 devtype=DISK
一共开通8个通道
channel t12: starting full datafile backupset
channel t12: specifying datafile(s) in backupset
input datafile fno=00008 name=/dev/rnc32g_39
input datafile fno=00016 name=/dev/rnc32g_47
input datafile fno=00024 name=/dev/rnc32g_57
input datafile fno=00032 name=/dev/rnc32g_25
input datafile fno=00040 name=/dev/rnc32g_33
input datafile fno=00048 name=/dev/rnc32g_3
input datafile fno=00056 name=/dev/rnc32g_11
input datafile fno=00064 name=/dev/rnc32g_19
input datafile fno=00072 name=/dev/rnc32g_67
input datafile fno=00080 name=/dev/rnc32g_106
input datafile fno=00088 name=/dev/rnc32g_114
input datafile fno=00096 name=/dev/rnc32g_87
input datafile fno=00104 name=/dev/rnc32g_95
input datafile fno=00112 name=/dev/rnc32g_103
input datafile fno=00120 name=/dev/rnc32g_75
input datafile fno=00003 name=/dev/rnc50_sysaux
input datafile fno=00130 name=/dev/rnc32g_119
channel t12: starting piece 1 at 14-MAY-12
--通道12备份数据文件
channel t17: starting full datafile backupset
channel t17: specifying datafile(s) in backupset
input datafile fno=00002 name=/dev/rnc32g_22
input datafile fno=00013 name=/dev/rnc32g_44
input datafile fno=00021 name=/dev/rnc32g_54
input datafile fno=00029 name=/dev/rnc32g_62
input datafile fno=00037 name=/dev/rnc32g_30
input datafile fno=00045 name=/dev/rnc32g_38
input datafile fno=00053 name=/dev/rnc32g_8
input datafile fno=00061 name=/dev/rnc32g_16
input datafile fno=00069 name=/dev/rnc32g_64
input datafile fno=00077 name=/dev/rncundo_33g_4
input datafile fno=00085 name=/dev/rnc32g_111
input datafile fno=00093 name=/dev/rnc32g_84
input datafile fno=00101 name=/dev/rnc32g_92
input datafile fno=00109 name=/dev/rnc32g_100
input datafile fno=00117 name=/dev/rnc32g_72
input datafile fno=00006 name=/dev/rnc50_4g_1
channel t17: starting piece 1 at 14-MAY-12
--通道17备份数据文件
channel t15: finished piece 1 at 15-MAY-12
piece handle=/rman/db_mpnb04jl_1_1 tag=TAG20120514T204954 comment=NONE
channel t15: backup set complete, elapsed time: 06:07:59
channel t11: finished piece 1 at 15-MAY-12
piece handle=/rman/db_mlnb04jk_1_1 tag=TAG20120514T204954 comment=NONE
channel t11: backup set complete, elapsed time: 06:17:25
channel t16: finished piece 1 at 15-MAY-12
piece handle=/rman/db_mqnb04jm_1_1 tag=TAG20120514T204954 comment=NONE
channel t16: backup set complete, elapsed time: 06:34:49
channel t14: finished piece 1 at 15-MAY-12
piece handle=/rman/db_monb04jl_1_1 tag=TAG20120514T204954 comment=NONE
channel t14: backup set complete, elapsed time: 06:40:05
channel t18: finished piece 1 at 15-MAY-12
piece handle=/rman/db_msnb04jn_1_1 tag=TAG20120514T204954 comment=NONE
channel t18: backup set complete, elapsed time: 06:43:38
channel t13: finished piece 1 at 15-MAY-12
piece handle=/rman/db_mnnb04jl_1_1 tag=TAG20120514T204954 comment=NONE
channel t13: backup set complete, elapsed time: 07:40:56
--这里可以看出rman的备份完成了通道11/13/14/15/16/18,也就是说目前为止通道12/17未完成.
RMAN-03009: failure of backup command on t12 channel at 05/15/2012 04:39:58
ORA-19502: write error on file "/rman/db_mmnb04jl_1_1", blockno 30481025 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
Additional information: -1
Additional information: 1048576
ORA-19502: write error on file "/rman/db_mmnb04jl_1_1", blockno 30480897 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
channel t12 disabled, job failed on it will be run on another channel
--通道12报错(硬盘空间不足)
channel t11: starting full datafile backupset
channel t11: specifying datafile(s) in backupset
input datafile fno=00008 name=/dev/rnc32g_39
input datafile fno=00016 name=/dev/rnc32g_47
input datafile fno=00024 name=/dev/rnc32g_57
input datafile fno=00032 name=/dev/rnc32g_25
input datafile fno=00040 name=/dev/rnc32g_33
input datafile fno=00048 name=/dev/rnc32g_3
input datafile fno=00056 name=/dev/rnc32g_11
input datafile fno=00064 name=/dev/rnc32g_19
input datafile fno=00072 name=/dev/rnc32g_67
input datafile fno=00080 name=/dev/rnc32g_106
input datafile fno=00088 name=/dev/rnc32g_114
input datafile fno=00096 name=/dev/rnc32g_87
input datafile fno=00104 name=/dev/rnc32g_95
input datafile fno=00112 name=/dev/rnc32g_103
input datafile fno=00120 name=/dev/rnc32g_75
input datafile fno=00003 name=/dev/rnc50_sysaux
input datafile fno=00130 name=/dev/rnc32g_119
channel t11: starting piece 1 at 15-MAY-12
--在通道12报错后,通道11已经完成了上次备份,所以启动备份通道12出错的数据文件
RMAN-03009: failure of backup command on t17 channel at 05/15/2012 04:39:58
ORA-19502: write error on file "/rman/db_mrnb04jm_1_1", blockno 30753793 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
Additional information: -1
Additional information: 1048576
ORA-19502: write error on file "/rman/db_mrnb04jm_1_1", blockno 30753665 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
channel t17 disabled, job failed on it will be run on another channel
--通道17也因为磁盘空间报错
channel t13: starting full datafile backupset
channel t13: specifying datafile(s) in backupset
input datafile fno=00002 name=/dev/rnc32g_22
input datafile fno=00013 name=/dev/rnc32g_44
input datafile fno=00021 name=/dev/rnc32g_54
input datafile fno=00029 name=/dev/rnc32g_62
input datafile fno=00037 name=/dev/rnc32g_30
input datafile fno=00045 name=/dev/rnc32g_38
input datafile fno=00053 name=/dev/rnc32g_8
input datafile fno=00061 name=/dev/rnc32g_16
input datafile fno=00069 name=/dev/rnc32g_64
input datafile fno=00077 name=/dev/rncundo_33g_4
input datafile fno=00085 name=/dev/rnc32g_111
input datafile fno=00093 name=/dev/rnc32g_84
input datafile fno=00101 name=/dev/rnc32g_92
input datafile fno=00109 name=/dev/rnc32g_100
input datafile fno=00117 name=/dev/rnc32g_72
input datafile fno=00006 name=/dev/rnc50_4g_1
channel t13: starting piece 1 at 15-MAY-12
--通道13也尝试备份通道17失败的数据文件
RMAN-03009: failure of backup command on t11 channel at 05/15/2012 04:39:59
ORA-19504: failed to create file "/rman/db_mtnb104u_1_1"
ORA-27044: unable to write the header block of file
IBM AIX RISC System/6000 Error: 28: No space left on device
Additional information: 3
Addition
--因为当前没有空闲空间,通道11终止,
--这个时候rman异常终止,导致后续的通道13终止记录未打印到日志

阅读完rman日志,很好理解因为存放rman备份的磁盘空间不足导致了一系列错误

检查磁盘剩余空间

Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4          10.00      9.75    3%     6548     1% /
/dev/hd2          10.00      4.55   55%    84383     8% /usr
/dev/hd9var        5.00      4.04   20%     6290     1% /var
/dev/hd3           5.00      3.87   23%     1551     1% /tmp
/dev/hd1          10.00      9.91    1%      382     1% /home
/proc                 -         -    -         -     -  /proc
/dev/hd10opt       5.00      4.89    3%     3502     1% /opt
/dev/archalv      99.00     82.98   17%       96     1% /archa
/dev/fslv01       40.00     19.49   52%    72324     2% /ora10
/dev/fslv00     1800.00    467.25   75%       10     1% /rman

这下让人迷糊了,磁盘空间还剩余467.25G,怎么会报错呢?

分析原因

RMAN-03009: failure of backup command on t12 channel at 05/15/2012 04:39:58
ORA-19502: write error on file "/rman/db_mmnb04jl_1_1", blockno 30481025 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
Additional information: -1
Additional information: 1048576
ORA-19502: write error on file "/rman/db_mmnb04jl_1_1", blockno 30480897 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
channel t12 disabled, job failed on it will be run on another channel
RMAN-03009: failure of backup command on t17 channel at 05/15/2012 04:39:58
ORA-19502: write error on file "/rman/db_mrnb04jm_1_1", blockno 30753793 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
Additional information: -1
Additional information: 1048576
ORA-19502: write error on file "/rman/db_mrnb04jm_1_1", blockno 30753665 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
channel t17 disabled, job failed on it will be run on another channel

这两个通道在写入rman备份到磁盘中的时候,在05/15/2012 04:39:58发现磁盘空间不足,两个通道分别准备写入30480897/30753665号块的时候出错,那么当时这两个通道分别写入的数据块数为30480896/30753664,写入文件大小为(30480896+30753664)*8192/1024/1024/1024=467.1826171875G.这里可以看出磁盘剩余空间467.25G,其实当时已经写入了467.1826171875G,继续写入的时候出错.然后rman为了保证备份的正确性,自动删除了当时已经备份的467.1826171875G错误的备份文件.从而在备份结束后看到磁盘空间还有大量剩余而rman包空间不足的现象.

ORA-07445 [ACCESS_VIOLATION] [UNABLE_TO_READ] []

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:ORA-07445 [ACCESS_VIOLATION] [UNABLE_TO_READ] []

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

alert中发现ORA-07445错误
ORA-07445: exception encountered: core dump [PC:0x7FFF65D0] [ACCESS_VIOLATION] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0] [UNABLE_TO_READ] []错误,导致数据库down掉

Mon May 14 14:34:34 2012
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0, {empty}]
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_p001_1280.trc:
ORA-07445: exception encountered: core dump [PC:0x7FFF65D0] [ACCESS_VIOLATION]
[ADDR:0xFFFFFFFF] [PC:0x7FFF65D0] [UNABLE_TO_READ] []
Mon May 14 14:34:35 2012
Trace dumping is performing id=[cdmp_20120514143435]
Mon May 14 14:35:10 2012
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0, {empty}]
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_1072.trc  (incident=164712):
ORA-07445: exception encountered: core dump [PC:0x7FFF65D0] [ACCESS_VIOLATION]
[ADDR:0xFFFFFFFF] [PC:0x7FFF65D0] [UNABLE_TO_READ] []
ORA-12080: Buffer cache miss for IOQ batching
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_164712\orcl_smon_1072_i164712.trc

分析trace文件

Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Windows Server 2003 Version V5.2 Service Pack 2
CPU                 : 8 - type 586, 4 Physical Cores
Process Affinity    : 0x00000000
Memory (Avail/Total): Ph:4892M/8189M, Ph+PgF:5638M/9795M, VA:925M/4095M
Instance name: orcl
Redo thread mounted by this instance: 1
Oracle process number: 12
Windows thread id: 1072, image: ORACLE.EXE (SMON)
--以上信息得出操作系统和数据库版本2003 sp2+oracle11g(11.1.0.6 32位)
Dump continued from file: d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_1072.trc
ORA-07445: exception encountered: core dump [PC:0x7FFF65D0] [ACCESS_VIOLATION]
[ADDR:0xFFFFFFFF] [PC:0x7FFF65D0] [UNABLE_TO_READ] []
ORA-12080: Buffer cache miss for IOQ batching
========= Dump for incident 164712 (ORA 7445 [PC:0x7FFF65D0]) ========
----- Beginning of Customized Incident Dump(s) -----
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0, {empty}]
--这里的ORA-07445 [ACCESS_VIOLATION][UNABLE_TO_READ]根据经验结合这里的32位的环境,
--怀疑是sga使用的内存太多,ORACLE数据库不能读SGA相关内存导致
--在trace中找出相关参数配置.
[0004]: processes=300
[0004]: sessions=335
[0004]: __shared_pool_size=1124073472
[0004]: __large_pool_size=8388608
[0004]: __java_pool_size=16777216
[0004]: __streams_pool_size=251658240
[0004]: streams_pool_size=251658240
[0004]: sga_target=0
[0004]: __sga_target=1887436800
[0004]: memory_target=3145728000
[0004]: memory_max_target=4722786304
[0004]: db_block_size=8192
[0004]: __db_cache_size=478150656
[0004]: __shared_io_pool_size=0
[0004]: compatible=11.1.0.0.0
[0004]: log_buffer=8851456
[0004]: __pga_aggregate_target=780140544
--这里可以看到sga_target分配了内存为1887436800=1.7578125G
--pga_aggregate_target分配了780140544=0.7265625G
--两者内存之和大于2G,超过了32位ORACLE默认限制

查询MOS发现[1341681.1]
该错误原因

This is a resource issue (memory in particular). 32-bit windows systems,
are limited to 2GB of addressable memory so if you are on this platform
it's likely you are simply exceeding the capabilities of the 32bit operating system.

解决建议

First recommendation :
If you have not already done so, add the /3GB switch to your boot.ini file and reboot the server. The
boot.ini will be located in the root directory on the drive where windows is installed. The switch, /3GB,
is placed at the end of the line that executes the WinNT loading process.
This will allow applications such as oracle access to 3Gb or memory instead of 2Gb.
Example:
[operating systems] multi(0)disk(0)rdisk(0)partition(2)\WINNT="Windows NT Server Version 4.00" /3GB
 Second recommendation :
You do not want to increase memory target. If anything, this should be decreased.
You are limited to under 2GB of addressable memory on 32bit windows (the limit is actually about 1.85GB).
This is for both SGA and PGA memory for all instances; you have to reduce the SGA size for the instance.
The recommendation is to reduce sga_target, memory_target, and memory_max_target.