关闭数据库出现ORA-00379错误

关闭数据库出现ORA-00379错误

SQL> shutdown immediate
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-00379: 缓冲池 DEFAULT 中无法提供 8K 块大小的空闲缓冲区

查看内存分配

SQL> show parameter sga;
NAME                                 TYPE        VALUE
------------------------------------ ----------- -----------
lock_sga                             boolean     FALSE
pre_page_sga                         boolean     FALSE
sga_max_size                         big integer 412M
sga_target                           big integer 0
SQL> select * from v$sgainfo;
NAME                                  BYTES RES
-------------------------------- ---------- ---
Fixed SGA Size                      1333676 No
Redo Buffers                        6078464 No
Buffer Cache Size                 104857600 Yes
Shared Pool Size                  142606336 Yes
Large Pool Size                     4194304 Yes
Java Pool Size                     12582912 Yes
Streams Pool Size                         0 Yes
Shared IO Pool Size                       0 Yes
Granule Size                        4194304 No
Maximum SGA Size                  431038464 No
Startup overhead in Shared Pool    46137344 No
Free SGA Memory Available         159383552
--spfile中分配情况
orcl.__db_cache_size=104857600
orcl.__java_pool_size=12582912
orcl.__large_pool_size=4194304
orcl.__pga_aggregate_target=104857600
orcl.__sga_target=281018368
orcl.__shared_io_pool_size=0
orcl.__shared_pool_size=142606336
orcl.__streams_pool_size=0
--初始化参数
*.sga_max_size=0
*.sga_target=536870912
*.memory_max_target=536870912
*.memory_target=536870912

alert日志

Mon Jul 02 11:30:19 2012
DIA0 started with pid=8, OS id=1520
Errors in file e:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_dia0_1520.trc  (incident=10883):
ORA-04030: out of process memory when trying to allocate 29916 bytes (heap_ksdhngreq,msg_body:ksdhng)
Errors in file e:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_dia0_1520.trc  (incident=10884):
ORA-04030: out of process memory when trying to allocate 8204 bytes (diag pga,dbgtbDefaultBucket)
ORA-04030: out of process memory when trying to allocate 29916 bytes (heap_ksdhngreq,msg_body:ksdhng)
ORA-4030 : opidrv aborting process DIA0 ospid (1348_1520)
Errors in file e:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_dia0_1520.trc  (incident=12013):
ORA-04030: out of process memory when trying to allocate 8204 bytes (diag pga,dbgtbDefaultBucket)
ORA-04030: out of process memory when trying to allocate 8204 bytes (diag pga,dbgtbDefaultBucket)
ORA-04030: out of process memory when trying to allocate 29916 bytes (heap_ksdhngreq,msg_body:ksdhng)
Process debug not enabled via parameter _debug_enable
Mon Jul 02 11:33:19 2012
Trace dumping is performing id=[cdmp_20120702113319]
Errors in file e:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_1820.trc:
ORA-00379: no free buffers available in buffer pool DEFAULT for block size 8K
ORA-00379: no free buffers available in buffer pool DEFAULT for block size 8K
Mon Jul 02 11:33:49 2012
Errors in file e:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_1820.trc:
ORA-00379: no free buffers available in buffer pool DEFAULT for block size 8K
Mon Jul 02 11:34:38 2012
Errors in file e:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_1820.trc:
ORA-00379: no free buffers available in buffer pool DEFAULT for block size 8K
Mon Jul 02 11:37:05 2012
Errors in file e:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_2400.trc:
ORA-00379: 缓冲池 DEFAULT 中无法提供 8K 块大小的空闲缓冲区
ORA-00379: 缓冲池 DEFAULT 中无法提供 8K 块大小的空闲缓冲区
ORA-00379: 缓冲池 DEFAULT 中无法提供 8K 块大小的空闲缓冲区
Tue Jul 03 09:58:06 2012
WARNING: sga_target 432013312 cannot be more than memory_target (432013312) -
pga_aggregate_target (104857600/0) or untunable pga 104857600, 73783296

通过这里可以看出,系统的data buffe和pga都有内存不足的报错.

解决问题
问题的原因是由于内存分配不多,导致sga组件被消耗完,现在数据库不能正常关闭,修改了相关的内存参数的配置[避免该bug采用asmm内存管理]也无法生效,现在需要做的任务是重启数据库.导致数据库不能被关闭的原因是因为data buffer中的脏数据不能写入新数据.查询MOS发现是Bug 7702085.正常关闭库解决办法手工刷sga组件,然后升级数据库到11.2.0.1 (Base Release)/11.1.0.7.3 (Patch Set Update)/11.1.0.7 Patch 25 on Windows Platforms

SQL>  alter system flush BUFFER_CACHE;
System altered.
SQL> alter system flush  SHARED_POOL;
System altered.
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

使用copy实现long类型转移表空间

在一次8.1.6的数据库恢复过程中,发现一个表空间的数据文件损坏,在转移该表空间相关表时,遇到让人郁闷的long类型.不能使用ctas和move来实现转移,最后通过古老的copy来实现该项工作.
模拟LONG类型表

SQL> create table chf.t_long (id number,name long) tablespace ts_xifenfei;
Table created.
SQL> insert into chf.t_long  select object_id,object_name from dba_objects where rownum<10;
9 rows created.
SQL> commit;
Commit complete.
SQL> desc chf.t_long
 Name                                      Null?    Type
 ----------------------------------------- -------- -----------------
 ID                                                 NUMBER
 NAME                                               LONG

测试ctas和move

SQL> create table chf.t_long_bak
  2  as
  3  select * from chf.t_long;
select * from chf.t_long
       *
ERROR at line 3:
ORA-00997: illegal use of LONG datatype
SQL> alter table chf.t_long move tablespace users;
alter table chf.t_long move tablespace users
*
ERROR at line 1:
ORA-00997: illegal use of LONG datatype

使用copy实现LONG表跟换表空间

SQL> SET LONG 1000
SQL> select dbms_metadata.get_ddl('TABLE','T_LONG','CHF') from dual;
DBMS_METADATA.GET_DDL('TABLE','T_LONG','CHF')
--------------------------------------------------------------------------------
  CREATE TABLE "CHF"."T_LONG"
   (    "ID" NUMBER,
        "NAME" LONG
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "TS_XIFENFEI"
SQL>   CREATE TABLE "CHF"."T_LONG_BAK"
  2     (    "ID" NUMBER,
  3          "NAME" LONG
  4     ) SEGMENT CREATION IMMEDIATE
  5    PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
  6   NOCOMPRESS LOGGING
  7    STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  8    PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  9    BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
 10    TABLESPACE "USERS";
Table created.
SQL> copy from chf/xifenfei@ora11g_d INSERT chf.t_long_bak using select * from chf.t_long;
Array fetch/bind size is 15. (arraysize is 15)
Will commit when done. (copycommit is 0)
Maximum long size is 80. (long is 80)
   9 rows selected from chf@ora11g_d.
   9 rows inserted into CHF.T_LONG_BAK.
   9 rows committed into CHF.T_LONG_BAK at DEFAULT HOST connection.
SQL> alter table t_long rename to t_long_old;
Table altered.
SQL> alter table t_long_bak rename to t_long;
Table altered.
SQL> select tablespace_name,table_name from dba_tables where table_name like 'T_LONG%';
TABLESPACE_NAME TABLE_NAME
--------------- ---------------
TS_XIFENFEI     T_LONG_OLD
USERS           T_LONG
SQL> DROP TABLE T_LONG_OLD PURGE;
Table dropped.

iscsiadm主要操作命令

当前包含磁盘

[root@xifenfei ~]# fdisk -l
Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        2355    18916506   83  Linux
/dev/sda2            2356        2610     2048287+  82  Linux swap / Solaris
Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1        2610    20964793+  83  Linux
Disk /dev/sdc: 2147 MB, 2147483648 bytes
255 heads, 63 sectors/track, 261 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdc doesn't contain a valid partition table
Disk /dev/sdd: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1         100      803218+  83  Linux
/dev/sdd2             101        1000     7229250   83  Linux

查看iscsi运行情况

[root@xifenfei ~]# rpm -aq|grep iscsi
iscsi-initiator-utils-6.2.0.872-10.0.1.el5
[root@xifenfei ~]#  chkconfig --list |grep iscsi
iscsi           0:off   1:off   2:on    3:on    4:on    5:on    6:off
iscsid          0:off   1:off   2:off   3:on    4:on    5:on    6:off
[root@xifenfei ~]# ps -ef|grep iscs
root      2753     2  0 Jun21 ?        00:00:00 [iscsi_eh]
root     15793     1  0 09:08 ?        00:00:00 brcm_iscsiuio
root     15800     1  0 09:08 ?        00:00:00 iscsid
root     15802     1  0 09:08 ?        00:00:00 iscsid
root     19533 15269  0 10:11 pts/1    00:00:00 grep iscs

配置iscsi存储

[root@xifenfei ~]# iscsiadm -m discovery -t sendtargets -p 192.168.1.254:3260
192.168.1.254:3260,1 iqn.2006-01.com.openfiler:tsn.32b32087937b
[root@xifenfei ~]# iscsiadm -m node –T iqn.2006-01.com.openfiler:tsn.32b32087937b -p 192.168.1.254:3260 -l
Logging in to [iface: default, target: iqn.2006-01.com.openfiler:tsn.32b32087937b, portal: 192.168.1.254,3260]
Login to [iface: default, target: iqn.2006-01.com.openfiler:tsn.32b32087937b, portal: 192.168.1.254,3260] successful.
[root@xifenfei ~]# iscsiadm -m node –T iqn.2006-01.com.openfiler:tsn.32b32087937b -p 192.168.1.254:3260
>--op update -n node.startup -v automatic

当前包含磁盘

[root@xifenfei ~]# fdisk -l
Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        2355    18916506   83  Linux
/dev/sda2            2356        2610     2048287+  82  Linux swap / Solaris
Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1        2610    20964793+  83  Linux
Disk /dev/sdc: 2147 MB, 2147483648 bytes
255 heads, 63 sectors/track, 261 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdc doesn't contain a valid partition table
Disk /dev/sdd: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1         100      803218+  83  Linux
/dev/sdd2             101        1000     7229250   83  Linux
Disk /dev/sde: 1073 MB, 1073741824 bytes
34 heads, 61 sectors/track, 1011 cylinders
Units = cylinders of 2074 * 512 = 1061888 bytes
Disk /dev/sde doesn't contain a valid partition table
Disk /dev/sdf: 1073 MB, 1073741824 bytes
34 heads, 61 sectors/track, 1011 cylinders
Units = cylinders of 2074 * 512 = 1061888 bytes
Disk /dev/sdf doesn't contain a valid partition table
Disk /dev/sdg: 1073 MB, 1073741824 bytes
34 heads, 61 sectors/track, 1011 cylinders
Units = cylinders of 2074 * 512 = 1061888 bytes
Disk /dev/sdg doesn't contain a valid partition table

卸载iscsi存储

[root@xifenfei ~]# iscsiadm -m node --logoutall=all
Logging out of session [sid: 3, target: iqn.2006-01.com.openfiler:tsn.32b32087937b, portal: 192.168.1.254,3260]
Logout of [sid: 3, target: iqn.2006-01.com.openfiler:tsn.32b32087937b, portal: 192.168.1.254,3260] successful.
[root@xifenfei ~]# iscsiadm -m node --op delete --targetname iqn.2006-01.com.openfiler:tsn.32b32087937b

当前包含磁盘

[root@xifenfei ~]# fdisk -l
Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        2355    18916506   83  Linux
/dev/sda2            2356        2610     2048287+  82  Linux swap / Solaris
Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1        2610    20964793+  83  Linux
Disk /dev/sdc: 2147 MB, 2147483648 bytes
255 heads, 63 sectors/track, 261 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdc doesn't contain a valid partition table
Disk /dev/sdd: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1         100      803218+  83  Linux
/dev/sdd2             101        1000     7229250   83  Linux

iscsi操作总结

增加iscsi存储
(1)发现iscsi存储:iscsiadm -m discovery -t st -p ISCSI_IP
(2)查看iscsi发现记录:iscsiadm -m node
(3)登录iscsi存储:iscsiadm -m node -T LUN_NAME -p ISCSI_IP -l
(4)开机自动: iscsiadm -m node –T LUN_NAME -p ISCSI_IP --op update -n node.startup -v automatic
删除iscsi存储
(1)登出iscsi存储 iscsiadm -m node -T LUN_NAME -p ISCSI_IP -u
(2)对出iscsi所有登录 iscsiadm -m node --logoutall=all
(3)删除iscsi发现记录:iscsiadm -m node -o delete -T LUN_NAME -p ISCSI_IP
登入需验证码的节点
(1)开启认证
iscsiadm -m node -T LUN_NAME -o update --name node.session.auth.authmethod --value=CHAP
*.使用-o同--op
(2)添加用户
iscsiadm -m node -T LUN_NAME --op update --name node.session.auth.username --value=[用户名]
(3)添加密码
iscsiadm –m node –T LUN_NAME –op update –name node.session.auth.password –value=[密码]

gv$视图不能查询所有节点信息

今天遇到诡异的事情,AIX 5.3 ORACLE 9I RAC的gv$视图只能查询到本地的记录,而不是所有节点.但是所有节点均运行正常,除gv$视图之外未发现其他异常.
异常时节点1信息

SQL> show parameter clu;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cluster_database                     boolean     TRUE
cluster_database_instances           integer     2
cluster_interconnects                string      192.168.6.24
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
PL/SQL Release 9.2.0.8.0 - Production
CORE    9.2.0.8.0       Production
TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production
NLSRTL Version 9.2.0.8.0 - Production
SQL> col host_name for a10
SQL> select inst_id,HOST_NAME,STATUS from gv$instance;
   INST_ID HOST_NAME  STATUS
---------- ---------- ------------
         1 zwq_crm1   OPEN
SQL> show parameter par;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fast_start_parallel_rollback         string      LOW
log_parallelism                      integer     1
parallel_adaptive_multi_user         boolean     FALSE
parallel_automatic_tuning            boolean     FALSE
parallel_execution_message_size      integer     2152
parallel_instance_group              string
parallel_max_servers                 integer     5
parallel_min_percent                 integer     0
parallel_min_servers                 integer     0
parallel_server                      boolean     TRUE
parallel_server_instances            integer     2
parallel_threads_per_cpu             integer     2
partition_view_enabled               boolean     FALSE
recovery_parallelism                 integer     0
SQL> !ps -ef|grep p0
  oracrm 1929258       1   0 09:49:19      -  0:01 ora_p005_crm1
  oracrm  745844       1   0   Jun 26      -  0:20 ora_p004_crm1
  oraeye 2421272 3948648   0 21:53:49  pts/0  0:00 grep p0
  oracrm 3060406       1   0   Jun 26      -  0:20 ora_p002_crm1
  oracrm 3170868       1   0   Jun 20      -  2:13 ora_p000_crm1
  oracrm  787414       1   0   Jun 26      -  0:20 ora_p001_crm1
  oracrm 2552690       1   0   Jun 26      -  0:20 ora_p003_crm1

1.节点最大允许5个并发进程,现在已经启动并发进程到p005(6个)
2.gv$视图只能查询一个节点信息

异常时节点2信息
问题所有情况和1节点完全相似

SQL> col host_name for a10
SQL> select inst_id,HOST_NAME,STATUS from gv$instance;
   INST_ID HOST_NAME  STATUS
---------- ---------- ------------
         2 zwq_crm2   OPEN
SQL> show parameter par;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fast_start_parallel_rollback         string      LOW
log_parallelism                      integer     1
parallel_adaptive_multi_user         boolean     FALSE
parallel_automatic_tuning            boolean     FALSE
parallel_execution_message_size      integer     2152
parallel_instance_group              string
parallel_max_servers                 integer     5
parallel_min_percent                 integer     0
parallel_min_servers                 integer     0
parallel_server                      boolean     TRUE
parallel_server_instances            integer     2
parallel_threads_per_cpu             integer     2
partition_view_enabled               boolean     FALSE
recovery_parallelism                 integer     0
SQL> !ps -ef|grep p0
  oracrm 1867938       1   0 15:17:25      -  0:00 ora_p004_crm2
  oracrm 2633748       1   0 09:49:19      -  0:01 ora_p005_crm2
  oraeye 3059876 1007714   0 21:54:01  pts/0  0:00 grep p0
  oracrm  323884       1 120   Jun 20      - 10692:47 ora_p000_crm2
  oracrm 1839818       1   0 06:16:32      -  0:00 ora_p003_crm2
  oracrm  459660       1 107   Jun 26      - 1857:00 ora_p001_crm2
  oracrm 2351894       1   0 16:52:52      -  0:00 ora_p002_crm2

在异常2节点上做10046

SQL> oradebug setmypid
Statement processed.
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 4
Statement processed.
SQL> select * from gv$version;
   INST_ID BANNER
---------- ----------------------------------------------------------------
         2 Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
         2 PL/SQL Release 9.2.0.8.0 - Production
         2 CORE 9.2.0.8.0       Production
         2 TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production
         2 NLSRTL Version 9.2.0.8.0 - Production
SQL> oradebug EVENT 10046 trace name context off
Statement processed.
SQL> oradebug TRACEFILE_NAME
/oracle9/app/admin/crm/udump/crm2_ora_517066.trc

分析10046内容发现

PARSING IN CURSOR #1 len=24 dep=0 uid=0 oct=3 lid=0 tim=41759005850609 hv=775381991 ad='21195808'
select * from gv$version
END OF STMT
PARSE #1:c=20000,e=33894,p=1,cr=59,cu=3,mis=1,r=0,dep=0,og=4,tim=41759005850607
BINDS #1:
kxfpg1srv
        could not start P006, inst 1
kxfpg1srv
        could not start local P006
EXEC #1:c=0,e=3540,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=41759005854254
FETCH #1:c=0,e=28,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=41759005854347
FETCH #1:c=0,e=23,p=0,cr=0,cu=0,mis=0,r=4,dep=0,og=4,tim=41759005854678
STAT #1 id=1 cnt=5 pid=0 pos=1 obj=0 op='VIEW  '
STAT #1 id=2 cnt=5 pid=1 pos=1 obj=304 op='FIXED TABLE FULL X$VERSION '

通过这里观察10046可以看到:在2节点上查询gv$instance,需要通过使用并发进程去访问1节点,但是因为1节点的parallel_max_servers为5,而当前的并发进程已经达到最大数目,从而使得想在节点1上启动并发失败,进入使得gv$视图只能查询出来本节点数据

gv$视图异常解决方法

--重启两个节点,查询正常
SQL> col host_name for a10
SQL> select inst_id,HOST_NAME,STATUS from gv$instance;
   INST_ID HOST_NAME  STATUS
---------- ---------- ------------
         1 zwq_crm1   OPEN
         2 zwq_crm2   OPEN

针对这个问题,很可能是oracle bug(因为是9i版本,我无法深究),或者是并发进程僵死所致,当时有一个想法,kill 掉数据库并发进程,因客户不同意(采用稳妥重启方案),未能通过尝试验证我的猜想.

对gv$视图正常做10046

SQL> oradebug setmypid
Statement processed.
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
Statement processed.
SQL> select * from gv$version;
   INST_ID BANNER
---------- ----------------------------------------------------------------
         1 Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
         1 PL/SQL Release 9.2.0.8.0 - Production
         1 CORE 9.2.0.8.0       Production
         1 TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production
         1 NLSRTL Version 9.2.0.8.0 - Production
         2 Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
         2 PL/SQL Release 9.2.0.8.0 - Production
         2 CORE 9.2.0.8.0       Production
         2 TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production
         2 NLSRTL Version 9.2.0.8.0 - Production
10 rows selected.
SQL> oradebug EVENT 10046 trace name context off
Statement processed.
SQL> oradebug TRACEFILE_NAME
/oracle9/app/admin/crm/udump/crm1_ora_1708916.trc

分析10046内容

PARSING IN CURSOR #1 len=24 dep=0 uid=0 oct=3 lid=0 tim=41752681925071 hv=775381991 ad='92ef64f0'
select * from gv$version
END OF STMT
PARSE #1:c=0,e=6770,p=0,cr=12,cu=3,mis=1,r=0,dep=0,og=4,tim=41752681925070
BINDS #1:
WAIT #1: nam='PX Deq: reap credit' ela= 21 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Join ACK' ela= 237 p1=268500992 p2=1 p3=504403208016510312
WAIT #1: nam='PX Deq: reap credit' ela= 4 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Join ACK' ela= 64 p1=268500992 p2=2 p3=504403208016510312
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Join ACK' ela= 46 p1=268566528 p2=1 p3=504403208016502096
WAIT #1: nam='PX Deq: reap credit' ela= 4 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Join ACK' ela= 855 p1=268566528 p2=2 p3=504403208016502096
WAIT #1: nam='PX Deq: reap credit' ela= 4 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq Credit: send blkd' ela= 188 p1=268566528 p2=1 p3=504403208016502096
WAIT #1: nam='PX Deq: reap credit' ela= 1 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Parse Reply' ela= 406 p1=200 p2=1 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Parse Reply' ela= 7442 p1=200 p2=2 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 4 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Parse Reply' ela= 1664 p1=200 p2=1 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
EXEC #1:c=0,e=11572,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=41752681936727
WAIT #1: nam='SQL*Net message to client' ela= 1 p1=1650815232 p2=1 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Execute Reply' ela= 166 p1=200 p2=1 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Execute Reply' ela= 310 p1=200 p2=2 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
FETCH #1:c=0,e=557,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=41752681937320
WAIT #1: nam='SQL*Net message from client' ela= 223 p1=1650815232 p2=1 p3=0
WAIT #1: nam='SQL*Net message to client' ela= 1 p1=1650815232 p2=1 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 18 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='DFS lock handle' ela= 263 p1=1128857605 p2=9 p3=5
WAIT #1: nam='DFS lock handle' ela= 254 p1=1128857605 p2=9 p3=1
WAIT #1: nam='DFS lock handle' ela= 263 p1=1128857605 p2=9 p3=3
WAIT #1: nam='DFS lock handle' ela= 73 p1=1128857605 p2=9 p3=2
WAIT #1: nam='DFS lock handle' ela= 363 p1=1128857605 p2=9 p3=2
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 24 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Signal ACK' ela= 3 p1=0 p2=1 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Signal ACK' ela= 155 p1=10 p2=2 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 1 p1=0 p2=0 p3=0
FETCH #1:c=0,e=1735,p=0,cr=0,cu=0,mis=0,r=9,dep=0,og=4,tim=41752681939329
WAIT #1: nam='DFS lock handle' ela= 249 p1=1128857605 p2=9 p3=1
WAIT #1: nam='DFS lock handle' ela= 258 p1=1128857605 p2=9 p3=3
WAIT #1: nam='DFS lock handle' ela= 66 p1=1128857605 p2=9 p3=2
WAIT #1: nam='DFS lock handle' ela= 369 p1=1128857605 p2=9 p3=2
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 19 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 1 p1=0 p2=0 p3=0
WAIT #1: nam='enqueue' ela= 201 p1=1347616774 p2=2 p3=0
WAIT #1: nam='enqueue' ela= 41 p1=1347616774 p2=2 p3=0
WAIT #1: nam='SQL*Net message from client' ela= 6981860 p1=1650815232 p2=1 p3=0
STAT #1 id=1 cnt=0 pid=0 pos=1 obj=0 op='VIEW  '
STAT #1 id=2 cnt=0 pid=1 pos=1 obj=304 op='FIXED TABLE FULL X$VERSION '

通过这里可以看出,rac在正常情况下gv$视图默认的查询就是并发方式进行

因未配置Hugepage会话数添增悲剧案例

今天一朋友反馈他们的一个数据库hang住了,通过ssh也不能登录系统,他们没有办法重启系统解决问题,现在想让我帮忙找出问题原因
分析awr得出


询问朋友,他们的库一般session保持在200个左右,这次突然飙升到750以上,属于异常情况

分析监听日志

看到在截图的时间内,整体访问较频繁,某个ip访问异常频繁,通过这些信息,初步怀疑是用户的数据库内存使用完,导致系统数据库hang住.

查看系统日志

Jun 26 14:35:55 result01 kernel: [5613531.566617] Free swap  = 0kB
Jun 26 14:35:55 result01 kernel: [5613531.566618] Total swap = 2104504kB
Jun 26 14:35:55 result01 kernel: [5613531.566620] Free swap:            0kB
Jun 26 14:35:55 result01 kernel: [5613531.591073] 2359296 pages of RAM
Jun 26 14:35:55 result01 kernel: [5613531.591074] 318236 reserved pages
Jun 26 14:35:55 result01 kernel: [5613531.591075] 73353 pages shared
Jun 26 14:35:56 result01 kernel: [5613531.591076] 529 pages swap cached
Jun 26 14:35:56 result01 kernel: [5613531.591079] Out of Memory: Kill process 8904 (oracle) score 891 and children.
Jun 26 14:35:56 result01 kernel: [5613531.591201] Out of memory: Killed process 8904 (oracle).
Jun 26 14:35:56 result01 kernel: [5613531.592280] oracle invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0

通过这个日志看出系统内存和交换分区都使用完,因为内存不够,系统开始kill掉部分oracle进程.通过这些确定是系统内存使用完导致hang住可以理解.

分析hang住原因
为什么session意外的从200添增到750的时候,系统内存被使用完

cat /proc/meminfo
MemTotal:      8164240 kB
SwapTotal:     2104504 kB
PageTables:      69732 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB
sga_target=3674210304
pga_aggregate_target=1732247552

从这里得出几个信息:
1.数据库总内存8g,swap配置2g
2.数据库未使用Hugepage
3.数据库设置sga和pga信息

内存参数估算
数据库总计占用内存为:(3674210304+1732247552)/1024/1024=5156M(pga可能未使用完,也可能超过)
结合实际sga_target=3674210304,会话数.
保守估计下Oracle进程占用的系统内存3674210304/(4*1024)*1.5*750/1024/1024=960M
估算如果使用Hugepage Oracle进程占用系统内存为:3674210304/(2*1024*1024)*1.5*750/1024/1024=1.9M
通过这里分析Oracle总占用内存为:5156+960=6116M
通过保守计算留给系统的内存大概为:1.8G左右
因为系统的其他操作,最终导致该系统内存耗完,系统和数据库hang住

总结说明
这是一个实实在在因为linux中因为未配置Hugepage,因为用户突增,导致系统内存消耗光,从而使得系统和数据库hang住的例子.
这个库因为sga不是非常大,所以Oracle占用系统内存不是高到离谱,如果sga配置为32g,1000个session,那就会占用12g的系统内存
通过这些可以看出在linux中配置Hugepage的优点:Hugepage不光是为了减轻cpu的负担,还可以减少系统内存的消耗;在没有极端的情况下,建议linux的数据库系统配置Hugepage.

To find the TX Enqueue contention in a RAC or OPS environment

今天查找TX Enqueue看到的一篇文章,拿出来共享下

PURPOSE
-------------
To find the TX Enqueue contention in a RAC or OPS environment
What is TX Enqueue ?
In one word oracle is maintaining queue for transaction.
How Many Resources ?
1/ active transaction
How Many Locks?
1/transaction + 1/process waiting for a locked row by that
transaction.
How Many Users?
1 + 1/ process waiting for something locked by this transaction.
Who Uses?
All processes
What need to investigate?
The mode of TX (6/4), Holding/Waiting/Requesting
SCOPE & APPLICATION
=====================
This document will help to analyze the application design related to transaction bottlenecks
and database performance tuning.
Let start with an example:
===================
create table akdas (A1 number, Col1 Varchar2(10), Col2 Varchar2(10));
insert into akdas values(5,'Hello','Hi');
insert into akdas values(6,'Sudip','Datta');
insert into akdas values(7,'Preetam','Roy');
insert into akdas values(8,'Michael','Polaski');
From Node 1:
==========
update akdas set a1=11 where a1=6;
From Node 2:
==========
update akdas set a1=12 where a1=7;
update akdas set a1=11 where a1=6;  /* this will wait for Node1: to complete the transaction */
This Note Is Made To Analyzing Only the TX-Mode-6 (Exclusive).
1. Now run the following query to track down the problem: Who is waiting
===================================================================
prompt
prompt Query 1. Waiting for TX Enqueue where mode is Exclusive
prompt =====================================
prompt
set linesize 100
set pagesize 66
col c1 for a15
col c1 heading "Program Name "
select l.inst_id,l.SID,program c1,l.TYPE,l.ID1,l.ID2,l.LMODE,l.REQUEST
from gv$lock l,gv$session s
where l.type like 'TX' and l.REQUEST =6
and l.inst_id=s.inst_id and l.sid=s.sid
order by id1
/
Output will be here
===============
   INST_ID      SID     Program Name       TY     ID1     ID2       LMODE      REQUEST
-----------  ---------- ------------------ ---   -------- --------  ---------- --------
         2           13  sqlplus@opcbsol   TX     393236  780       0          6
                         2 (TNS V1-V3)
It is clear that SID 12 of instance 2 is doing a DML and waiting on REQUEST Mode 6.
2. Let's run the next query to find who is holding
===========================================
prompt
prompt
prompt Query 2. Holding for TX Enqueue where mode greater than 6
prompt =======================================
prompt
set linesize 100
set pagesize 66
col c1 for a15
col c1 heading "Program Name "
select l.inst_id,l.SID,program c1,l.TYPE,l.ID1,l.ID2,l.LMODE,l.REQUEST
from gv$lock l,gv$session s
where l.type like 'TX' and l.LMODE =6 and (l.ID1,l.ID2) in
(select id1,id2 from gv$lock where type like 'TX' and REQUEST =6)
and l.inst_id=s.inst_id and l.sid=s.sid
order by id1
/
Output will be here
===============
   INST_ID      SID     Program Name      TY        ID1        ID2      LMODE    REQUEST
   ----------  ---------- -------------- ---   ---------- --------   ----------- --------
         1          12    sqlplus@opcbsol TX     393236        780      6          0
                          1 (TNS V1-V3)
So holder is SID 12 on instance 1. Where LMODE = 6.
3. Let's find out the exact file#, block# and Record# where it is waiting
===============================================================
prompt
prompt
prompt Query 3. Object# ,File#, Block# and Slot# TX Enqueue in detail
prompt ========================================
prompt
set linesize 110
col c0 for 999
col c0 heading "INS"
col c1 for a15
col c1 heading "Program Name "
select inst_id c0,sid,program c1,ROW_WAIT_OBJ# object_no, ROW_WAIT_FILE# Rfile_no,
ROW_WAIT_BLOCK# Block_no ,ROW_WAIT_ROW# Row_no
from gv$session
where (inst_id,sid) in (select inst_id,sid from gv$session_wait where p1='1415053318')
/
Output Will be here
===============
 INS     SID    Program Name     OBJECT_NO RFILE_NO BLOCK_NO  ROW_NO
----- ---------- -------------   ---------------    --------- -------
   2         13     sqlplus@opcbsol  7261      9        12346     1
                      2 (TNS V1-V3)
From the output, it is clear that it is waiting on Relative_File# 9, Block# 12346, Row Number 1.
Here Row Number 1 means the slot number in the block 12346. This Row_No start from 0 (zero).
4. Let's Find the object details
=============================
prompt
prompt
prompt Query 4. Object Involve for TX Enqueue in detail
prompt ===============================
prompt
set linesize 100
set pagesize 100
col owner for a10
col object_name for a20
col object_type for a10
select owner,object_name,object_id,object_type
from dba_objects
where
object_id in (select ROW_WAIT_OBJ# from gv$session
where (inst_id, sid) in (select inst_id,sid from gv$session_wait where p1='1415053318'))
/
Output Will be here
===============
OWNER      OBJECT_NAME  OBJECT_ID   OBJECT_TYP
---------  ------------ --------    -----------
AKDAS      AKDAS        7261        TABLE
5. Let’s find the row value details
=============================
prompt
prompt
prompt Query 5. Finding the row value
prompt ====================
prompt
select * from <Owner>.<Table Name>  where rowid like
DBMS_ROWID.ROWID_CREATE(1,&Object_No,&Rfile_No, &Block_No, &Row_Number)
/
From query 3 and 4  we will get the value for all variables.
Owner = AKDAS
Table_Name = AKDAS
Object_No = 7261
Rfile_No =  9
Block_No = 12346
Row_Number = 1
Output Will be here
===============
        A1    Col1                 Col2
  ---------- --------------- ----------
         6      Hello                Hi
So we can drag down to the row value where TX Enqueue contention exists.
6. Let’s find the user activity that is "Holder" and "Waiter"
====================================================
set linesize 120
set pagesize 66
col c0 for 999
col c0 heading "INS"
col c1 for a9
col c1 heading "OS User"
col c2 for a9
col c2 heading "Oracle User"
col c3 for a15
col c3 heading "Program Name"
col b1 for a9
col b1 heading "Unix PID"
col b2 for 9999 justify left
col b2 heading "ORA SID"
col b3 for 999999 justify left
col b3 heading "SERIAL#"
col sql_text for a45
set space 1
break on b1 nodup on c0 nodup on c3 nodup on c1 nodup on c2 nodup on b2 nodup on b3 skip 2
select a.inst_id c0,b.sid b2,c.spid b1, b.program c3, b.username c2,b.serial# b3, a.sql_text
  from gv$sql a, gv$session b, gv$process c
 where
   a.address = b.sql_address
   and b.paddr = c.addr
   and a.hash_value = b.sql_hash_value
   and a.inst_id=b.inst_id and a.inst_id=c.inst_id
   and a.inst_id like '&inst_id' and b.sid like '&sid'
 order by c.spid,a.hash_value
/
This query asks the Instance Number and Sid number, which you can get from step 1 and 2.
But remember , you can see the waiter activity, but you may not see the holder activity.
Reason is, the holder is sitting idle after doing the DML operation. So SQL for Holder
should not be seen under gv$sql.
This all query can be run for single instance database, but all GV$ view need to replace to V$
and there is no INST_ID for V$ View, that part need to be taken care.

来自:How to Find TX Enqueue Contention in RAC or OPS [ID 179582.1]

Oracle 11g丢失access$恢复方法

最近接触到两个案例都是11g数据库因为异常关闭导致access$表丢失,使得数据库不能正常open.为什么这个表会丢失还未找到原因.我这里提供一种在upgrade模式下解决给问题方法.
数据库版本

SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') "xifenfei" from dual;
xifenfei
--------------------------------------
2012-06-22 05:28:57

数据库启动报ORA-00704

SQL> startup
ORACLE instance started.
Total System Global Area  523108352 bytes
Fixed Size                  1346052 bytes
Variable Size             448792060 bytes
Database Buffers           67108864 bytes
Redo Buffers                5861376 bytes
Database mounted.
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-00942: table or view does not exist
Process ID: 1782
Session ID: 125 Serial number: 5

找出ORA-00704报错原因

SQL> conn / as sysdba
Connected to an idle instance.
SQL> startup mount;
ORACLE instance started.
Total System Global Area  523108352 bytes
Fixed Size                  1346052 bytes
Variable Size             448792060 bytes
Database Buffers           67108864 bytes
Redo Buffers                5861376 bytes
Database mounted.
SQL> oradebug setmypid
Statement processed.
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
Statement processed.
SQL> oradebug TRACEFILE_NAME
/u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_2010.trc
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-00942: table or view does not exist
Process ID: 2010
Session ID: 125 Serial number: 5

查看trace文件发现

PARSE ERROR #3063868604:len=56 dep=1 uid=0 oct=3 lid=0 tim=1340312320595472 err=942
select order#,columns,types from access$ where d_obj#=:1
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-00942: table or view does not exist
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-00942: table or view does not exist
*** 2012-06-22 04:58:40.596
USER (ospid: 2010): terminating the instance due to error 704

启动数据库至upgrade模式

SQL> startup  upgrade
ORACLE instance started.
Total System Global Area  523108352 bytes
Fixed Size                  1346052 bytes
Variable Size             448792060 bytes
Database Buffers           67108864 bytes
Redo Buffers                5861376 bytes
Database mounted.
Database opened.

创建access$表和index

SQL> create table access$
  2  ( d_obj#        number not null,
  3    order#        number not null,
  4    columns       raw(126),
  5    types         number not null)
  6    storage (initial 10k next 100k maxextents unlimited pctincrease 0)
  7  /
Table created.
SQL> create index i_access1 on
  2    access$(d_obj#, order#)
  3    storage (initial 10k next 100k maxextents unlimited pctincrease 0)
  4  /
Index created.
--创建语句可以在?\RDBMS\ADMIN\dcore.bsq中找到

重启数据库

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area  523108352 bytes
Fixed Size                  1346052 bytes
Variable Size             448792060 bytes
Database Buffers           67108864 bytes
Redo Buffers                5861376 bytes
Database mounted.
Database opened.

access$表作用(感谢vmcd同学提供)
When a database object is first referenced in a PL/SQL program, the PL/SQL engine checks the ACCESS$ table (owned by SYS) to see if the executor of the program has authority on that database object.
对于access$表丢失以前记录是否对系统产生严重影响还未知,希望知道的朋友告知下

spfile被覆盖导致ORA-600[kmgs_parameter_update_timeout_1]

数据库出现如下错误ORA-00600[kmgs_parameter_update_timeout_1]

Thu Jun 21 17:42:45 BEIST 2012
alter tablespace TS_TAB_WG_SYSMGR_01 add datafile '/dev/rvgoradata3_1_01'
Thu Jun 21 17:42:58 BEIST 2012
Completed: alter tablespace TS_TAB_WG_SYSMGR_01 add datafile '/dev/rvgoradata3_1_01'
Thu Jun 21 17:45:31 BEIST 2012
System State dumped to trace file /oracle/app/oracle/admin/bomc3/bdump/bomc3_mmon_19530138.trc
Thu Jun 21 17:45:42 BEIST 2012
Errors in file /oracle/app/oracle/admin/bomc3/bdump/bomc3_mmon_19530138.trc:
ORA-00600: internal error code, arguments: [kmgs_parameter_update_timeout_1], [1565], [], [], [], [], [], []
ORA-01565: error in identifying file '/dev/rvgoradata3_1_01'
ORA-27086: unable to lock file - already in use
IBM AIX RISC System/6000 Error: 13: Permission denied
Additional information: 8
Additional information: 18874484
Thu Jun 21 17:45:49 BEIST 2012
Errors in file /oracle/app/oracle/admin/bomc3/bdump/bomc3_dbw0_18874484.trc:
ORA-00600: internal error code, arguments: [ksprcvsp1], [0], [0], [], [], [], [], []
Thu Jun 21 17:45:52 BEIST 2012
Errors in file /oracle/app/oracle/admin/bomc3/bdump/bomc3_dbw0_18874484.trc:
ORA-00600: internal error code, arguments: [kmgs_parameter_update_timeout_1], [600], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [ksprcvsp1], [0], [0], [], [], [], [], []
Thu Jun 21 17:45:53 BEIST 2012
Errors in file /oracle/app/oracle/admin/bomc3/bdump/bomc3_dbw0_18874484.trc:
ORA-00600: internal error code, arguments: [kmgs_parameter_update_timeout_1], [600], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [ksprcvsp1], [0], [0], [], [], [], [], []
Thu Jun 21 17:45:53 BEIST 2012
DBW0: terminating instance due to error 471
Instance terminated by DBW0, pid = 18874484

通过这个错误可以看出大概:TS_TAB_WG_SYSMGR_01增加数据文件/dev/rvgoradata3_1_01成功后,然后mmon启动收集统计信息,读取spfile文件信息出错.最后dbw进程读取spfile文件出错,使得dbwn进程终止,从而数据库abort掉.通过这些信息,初步怀疑是增加数据文件的时候,错误的把spfile文件的裸设备作为一个新数据文件增加到数据库中,导致spfile被覆盖,从而出现mmon和dbwn访问spfile出错.

找出证据
如果spfile使用裸设备而且文件名是dev/rvgoradata3_1_01,那很可能是通过init_SID.ora中的spfile项实现,查找该文件内容果然发现

[zwq_acc1:/home/xifenfei]cat initbomc3.ora
spfile='/dev/rvgoradata3_1_01'

通过这些可以确定是用户增加数据文件时,错误的把spfile文件当中新的控制问及爱你增加到相关表空间中导致该问题.

解决办法
1.如果有备份spfile文件,使用备份spfile文件
2.如果有pfile文件,使用pfile创建spfile
3.如果上面两个都没有,那么使用alert中相关信息创建pfile文件然后创建spfile

实现trigger集中记录所有库ddl操作

今天客户说了一个我感觉有意思的需求:在一个库上的一张表记录所有库的ddl操作,实现方式:在一个库上建立表和触发器,其他库上通过dblink+同义词+触发器实现ddl操作记录到远程的表中.他当时写了一个触发器,但是有错误,想让我协助解决.在我们的一起努力下,解决了该触发器在dblink同义词的库上出错的问题.我这里测试使用的是10g的库做为存储所有库的ddl记录的库,11g库做为一个通过dblink插入ddl操作记录的库.
在10g数据库库中操作
1.创建记录ddl操作表

SQL> conn chf/xifenfei
Connected.
SQL> create table t_ddl_audit(
  2  db_name varchar2(30),
  3  login_user varchar2(30),
  4  ddl_time date,
  5  ip_address varchar2(20),
  6  audsid varchar2(20),
  7  schema_user varchar2(30),
  8  schema_object varchar2(40),
  9  login_tool varchar2(40),
 10  os_user varchar2(40),
 11  ddl_sql varchar2(4000));
Table created.

2.创建触发器

SQL> create or replace trigger tri_ddl_audit
  2    before ddl on database
  3  declare
  4    n           number;
  5    str_stmt    varchar2(4000);
  6    sql_text    ora_name_list_t;
  7    l_trace     number;
  8    v_module    varchar2(50);
  9    v_action    varchar2(50);
 10    str_session v$session%rowtype;
 11  begin
 12    n := ora_sql_txt(sql_text);
 13    for i in 1 .. n loop
 14      str_stmt := substr(str_stmt || sql_text(i), 1, 3000);
 15    end loop;
 16    dbms_application_info.READ_MODULE(v_module, v_action);
 17    INSERT INTO chf.t_ddl_audit
 18      (db_name,
 19       login_user,
 20       ddl_time,
 21       ip_address,
 22       audsid,
 23       schema_user,
 24       schema_object,
 25       login_tool,
 26       os_user,
 27       ddl_sql)
 28    VALUES
 29      (sys_context('USERENV', 'db_name'),
 30       ora_login_user,
 31       SYSDATE,
 32       sys_context('USERENV', 'IP_ADDRESS'),
 33       userenv('SESSIONID'),
 34       ora_dict_obj_owner,
 35       ora_dict_obj_name,
 36       v_module,
 37       sys_context('userenv', 'os_user'),
 38       str_stmt);
 39  exception
 40    when no_data_found then
 41      null;
 42  end;
 43  /
Trigger created.

3.测试触发器

SQL> conn chf/xifenfei
Connected.
SQL> create table t_xff as select * from dba_tables where rownum=1;
Table created.
SQL> select db_name,login_user,ddl_sql from t_ddl_audit;
DB_NAME                        LOGIN_USER
------------------------------ ------------------------------
DDL_SQL
-----------------------------------------------------------------
XFF                            CHF
create table t_xff as select * from dba_tables where rownum=1

在11g数据库中操作
1.创建dblink和同义词

SQL> create database link "ora10g_dblink"
  2   connect to chf
  3    identified by "xifenfei"
  4     using 'ora10g';
Database link created.
SQL> create  synonym t_ddl_audit for t_ddl_audit@ora10g_dblink;
Synonym created.

2.第一次创建触发器

SQL> create or replace trigger tri_ddl_audit
  2    before ddl on database
  3  declare
  4    n           number;
  5    str_stmt    varchar2(4000);
  6    sql_text    ora_name_list_t;
  7    l_trace     number;
  8    v_module    varchar2(50);
  9    v_action    varchar2(50);
 10    str_session v$session%rowtype;
 11  begin
 12    n := ora_sql_txt(sql_text);
 13    for i in 1 .. n loop
 14      str_stmt := substr(str_stmt || sql_text(i), 1, 3000);
 15    end loop;
 16    dbms_application_info.READ_MODULE(v_module, v_action);
 17    INSERT INTO t_ddl_audit
 18      (db_name,
 19       login_user,
 20       ddl_time,
 21       ip_address,
 22       audsid,
 23       schema_user,
 24       schema_object,
 25       login_tool,
 26       os_user,
 27       ddl_sql)
 28    VALUES
 29      (sys_context('USERENV', 'db_name'),
 30       ora_login_user,
 31       SYSDATE,
 32       sys_context('USERENV', 'IP_ADDRESS'),
 33       userenv('SESSIONID'),
 34       ora_dict_obj_owner,
 35       ora_dict_obj_name,
 36       v_module,
 37       sys_context('userenv', 'os_user'),
 38       str_stmt);
 39  exception
 40    when no_data_found then
 41      null;
 42  end;
 43  /
Trigger created.

3.测试触发器

SQL> create table t_xff as select * from dba_objects where rownum<10;
create table t_xff as select * from dba_objects where rownum<10
                                    *
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-02070: database  does not support  in this context
ORA-06512: at line 15

出现ORA-02070错误,估计是类此sys_context(‘userenv’, ‘os_user’)导致。

4.第二次创建触发器

SQL> create or replace trigger tri_ddl_audit
  2    before ddl on database
  3  declare
  4    n           number;
  5    str_stmt    varchar2(4000);
  6    sql_text    ora_name_list_t;
  7    l_trace     number;
  8    v_module    varchar2(50);
  9    v_action    varchar2(50);
 10    v_db_name   varchar2(50);
 11    v_ip_addr   varchar2(50);
 12    v_os        varchar2(50);
 13    v_session_id varchar2(50);
 14    str_session v$session%rowtype;
 15  begin
 16    n := ora_sql_txt(sql_text);
 17    for i in 1 .. n loop
 18      str_stmt := substr(str_stmt || sql_text(i), 1, 3000);
 19    end loop;
 20    dbms_application_info.READ_MODULE(v_module, v_action);
 21    v_db_name :=sys_context('USERENV', 'db_name');
 22    v_ip_addr :=sys_context('USERENV', 'IP_ADDRESS');
 23    v_os:=sys_context('userenv', 'os_user');
 24    v_session_id:=userenv('SESSIONID');
 25    INSERT INTO t_ddl_audit
 26      (db_name,
 27       login_user,
 28       ddl_time,
 29       ip_address,
 30       audsid,
 31       schema_user,
 32       schema_object,
 33       login_tool,
 34       os_user,
 35       ddl_sql)
 36    VALUES
 37      (v_db_name,
 38       ora_login_user,
 39       SYSDATE,
 40       v_ip_addr,
 41      v_session_id,
 42       ora_dict_obj_owner,
 43       ora_dict_obj_name,
 44       v_module,
 45       v_os,
 46       str_stmt);
 47  exception
 48    when no_data_found then
 49      null;
 50  end;
 51  /
Trigger created.

5.继续测试触发器

SQL> drop table t3;
drop table t3
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-02069: global_names parameter must be set to TRUE for this operation
ORA-06512: at line 23

根据ORA-02069,查询资料发现是通过dblink插入数据使用了变量和常量的方式混合使用导致该错误,修改触发器全部使用变量方式

6.第三次创建触发器

SQL> create or replace trigger tri_ddl_audit
  2    before ddl on database
  3  declare
  4    n           number;
  5    str_stmt    varchar2(4000);
  6    sql_text    ora_name_list_t;
  7    l_trace     number;
  8    v_module    varchar2(50);
  9    v_action    varchar2(50);
 10    v_db_name   varchar2(50);
 11    v_ip_addr   varchar2(50);
 12    v_os        varchar2(50);
 13    v_session_id varchar2(50);
 14    v_loginuser    varchar2(50);
 15     v_obj_name varchar2(50);
 16    v_owner    varchar2(50);
 17    str_session v$session%rowtype;
 18  begin
 19    n := ora_sql_txt(sql_text);
 20    for i in 1 .. n loop
 21      str_stmt := substr(str_stmt || sql_text(i), 1, 3000);
 22    end loop;
 23    dbms_application_info.READ_MODULE(v_module, v_action);
 24    v_db_name :=sys_context('USERENV', 'db_name');
 25    v_ip_addr :=sys_context('USERENV', 'IP_ADDRESS');
 26    v_os:=sys_context('userenv', 'os_user');
 27    v_session_id:=userenv('SESSIONID');
 28    v_loginuser:= ora_login_user;
 29    v_owner:=ora_dict_obj_owner;
 30    v_obj_name:=ora_dict_obj_name;
 31    INSERT INTO t_ddl_audit
 32      (db_name,
 33       login_user,
 34       ddl_time,
 35       ip_address,
 36       audsid,
 37       schema_user,
 38       schema_object,
 39       login_tool,
 40       os_user,
 41       ddl_sql)
 42    VALUES
 43      (v_db_name,
 44       v_loginuser,
 45       SYSDATE,
 46       v_ip_addr,
 47      v_session_id,
 48       v_owner,
 49       v_obj_name,
 50       v_module,
 51       v_os,
 52       str_stmt);
 53  exception
 54    when no_data_found then
 55      null;
 56  end;
 57  /
Trigger created.

7.测试触发器

SQL> create table t_xff11 as select * from dba_tables where rownum<10;
Table created.
SQL> select db_name,login_user,ddl_sql from t_ddl_audit;
DB_NAME                        LOGIN_USER
------------------------------ ------------------------------
DDL_SQL
-----------------------------------------------------------------
ora11g                         CHF
create table t_xff11 as select * from dba_tables where rownum<10
XFF                            CHF
create table t_xff as select * from dba_tables where rownum=1

补充说明
这个方案个人感觉是一个实验室中的方案,在实际的生成环境中很难应用上
1.trigger记录ddl操作本身效率不高
2.如果某个库不能访问存储ddl操作的表的数据库,将导致该数据库所有ddl操作hang住,从而可能使得该数据库hang住的风险.