今天遇到诡异的事情,AIX 5.3 ORACLE 9I RAC的gv$视图只能查询到本地的记录,而不是所有节点.但是所有节点均运行正常,除gv$视图之外未发现其他异常.
异常时节点1信息
SQL> show parameter clu;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
cluster_database boolean TRUE
cluster_database_instances integer 2
cluster_interconnects string 192.168.6.24
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
PL/SQL Release 9.2.0.8.0 - Production
CORE 9.2.0.8.0 Production
TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production
NLSRTL Version 9.2.0.8.0 - Production
SQL> col host_name for a10
SQL> select inst_id,HOST_NAME,STATUS from gv$instance;
INST_ID HOST_NAME STATUS
---------- ---------- ------------
1 zwq_crm1 OPEN
SQL> show parameter par;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
fast_start_parallel_rollback string LOW
log_parallelism integer 1
parallel_adaptive_multi_user boolean FALSE
parallel_automatic_tuning boolean FALSE
parallel_execution_message_size integer 2152
parallel_instance_group string
parallel_max_servers integer 5
parallel_min_percent integer 0
parallel_min_servers integer 0
parallel_server boolean TRUE
parallel_server_instances integer 2
parallel_threads_per_cpu integer 2
partition_view_enabled boolean FALSE
recovery_parallelism integer 0
SQL> !ps -ef|grep p0
oracrm 1929258 1 0 09:49:19 - 0:01 ora_p005_crm1
oracrm 745844 1 0 Jun 26 - 0:20 ora_p004_crm1
oraeye 2421272 3948648 0 21:53:49 pts/0 0:00 grep p0
oracrm 3060406 1 0 Jun 26 - 0:20 ora_p002_crm1
oracrm 3170868 1 0 Jun 20 - 2:13 ora_p000_crm1
oracrm 787414 1 0 Jun 26 - 0:20 ora_p001_crm1
oracrm 2552690 1 0 Jun 26 - 0:20 ora_p003_crm1
1.节点最大允许5个并发进程,现在已经启动并发进程到p005(6个)
2.gv$视图只能查询一个节点信息
异常时节点2信息
问题所有情况和1节点完全相似
SQL> col host_name for a10
SQL> select inst_id,HOST_NAME,STATUS from gv$instance;
INST_ID HOST_NAME STATUS
---------- ---------- ------------
2 zwq_crm2 OPEN
SQL> show parameter par;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
fast_start_parallel_rollback string LOW
log_parallelism integer 1
parallel_adaptive_multi_user boolean FALSE
parallel_automatic_tuning boolean FALSE
parallel_execution_message_size integer 2152
parallel_instance_group string
parallel_max_servers integer 5
parallel_min_percent integer 0
parallel_min_servers integer 0
parallel_server boolean TRUE
parallel_server_instances integer 2
parallel_threads_per_cpu integer 2
partition_view_enabled boolean FALSE
recovery_parallelism integer 0
SQL> !ps -ef|grep p0
oracrm 1867938 1 0 15:17:25 - 0:00 ora_p004_crm2
oracrm 2633748 1 0 09:49:19 - 0:01 ora_p005_crm2
oraeye 3059876 1007714 0 21:54:01 pts/0 0:00 grep p0
oracrm 323884 1 120 Jun 20 - 10692:47 ora_p000_crm2
oracrm 1839818 1 0 06:16:32 - 0:00 ora_p003_crm2
oracrm 459660 1 107 Jun 26 - 1857:00 ora_p001_crm2
oracrm 2351894 1 0 16:52:52 - 0:00 ora_p002_crm2
在异常2节点上做10046
SQL> oradebug setmypid
Statement processed.
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 4
Statement processed.
SQL> select * from gv$version;
INST_ID BANNER
---------- ----------------------------------------------------------------
2 Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
2 PL/SQL Release 9.2.0.8.0 - Production
2 CORE 9.2.0.8.0 Production
2 TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production
2 NLSRTL Version 9.2.0.8.0 - Production
SQL> oradebug EVENT 10046 trace name context off
Statement processed.
SQL> oradebug TRACEFILE_NAME
/oracle9/app/admin/crm/udump/crm2_ora_517066.trc
分析10046内容发现
PARSING IN CURSOR #1 len=24 dep=0 uid=0 oct=3 lid=0 tim=41759005850609 hv=775381991 ad='21195808'
select * from gv$version
END OF STMT
PARSE #1:c=20000,e=33894,p=1,cr=59,cu=3,mis=1,r=0,dep=0,og=4,tim=41759005850607
BINDS #1:
kxfpg1srv
could not start P006, inst 1
kxfpg1srv
could not start local P006
EXEC #1:c=0,e=3540,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=41759005854254
FETCH #1:c=0,e=28,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=41759005854347
FETCH #1:c=0,e=23,p=0,cr=0,cu=0,mis=0,r=4,dep=0,og=4,tim=41759005854678
STAT #1 id=1 cnt=5 pid=0 pos=1 obj=0 op='VIEW '
STAT #1 id=2 cnt=5 pid=1 pos=1 obj=304 op='FIXED TABLE FULL X$VERSION '
通过这里观察10046可以看到:在2节点上查询gv$instance,需要通过使用并发进程去访问1节点,但是因为1节点的parallel_max_servers为5,而当前的并发进程已经达到最大数目,从而使得想在节点1上启动并发失败,进入使得gv$视图只能查询出来本节点数据
gv$视图异常解决方法
--重启两个节点,查询正常
SQL> col host_name for a10
SQL> select inst_id,HOST_NAME,STATUS from gv$instance;
INST_ID HOST_NAME STATUS
---------- ---------- ------------
1 zwq_crm1 OPEN
2 zwq_crm2 OPEN
针对这个问题,很可能是oracle bug(因为是9i版本,我无法深究),或者是并发进程僵死所致,当时有一个想法,kill 掉数据库并发进程,因客户不同意(采用稳妥重启方案),未能通过尝试验证我的猜想.
对gv$视图正常做10046
SQL> oradebug setmypid
Statement processed.
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
Statement processed.
SQL> select * from gv$version;
INST_ID BANNER
---------- ----------------------------------------------------------------
1 Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
1 PL/SQL Release 9.2.0.8.0 - Production
1 CORE 9.2.0.8.0 Production
1 TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production
1 NLSRTL Version 9.2.0.8.0 - Production
2 Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
2 PL/SQL Release 9.2.0.8.0 - Production
2 CORE 9.2.0.8.0 Production
2 TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production
2 NLSRTL Version 9.2.0.8.0 - Production
10 rows selected.
SQL> oradebug EVENT 10046 trace name context off
Statement processed.
SQL> oradebug TRACEFILE_NAME
/oracle9/app/admin/crm/udump/crm1_ora_1708916.trc
分析10046内容
PARSING IN CURSOR #1 len=24 dep=0 uid=0 oct=3 lid=0 tim=41752681925071 hv=775381991 ad='92ef64f0'
select * from gv$version
END OF STMT
PARSE #1:c=0,e=6770,p=0,cr=12,cu=3,mis=1,r=0,dep=0,og=4,tim=41752681925070
BINDS #1:
WAIT #1: nam='PX Deq: reap credit' ela= 21 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Join ACK' ela= 237 p1=268500992 p2=1 p3=504403208016510312
WAIT #1: nam='PX Deq: reap credit' ela= 4 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Join ACK' ela= 64 p1=268500992 p2=2 p3=504403208016510312
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Join ACK' ela= 46 p1=268566528 p2=1 p3=504403208016502096
WAIT #1: nam='PX Deq: reap credit' ela= 4 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Join ACK' ela= 855 p1=268566528 p2=2 p3=504403208016502096
WAIT #1: nam='PX Deq: reap credit' ela= 4 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq Credit: send blkd' ela= 188 p1=268566528 p2=1 p3=504403208016502096
WAIT #1: nam='PX Deq: reap credit' ela= 1 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Parse Reply' ela= 406 p1=200 p2=1 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Parse Reply' ela= 7442 p1=200 p2=2 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 4 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Parse Reply' ela= 1664 p1=200 p2=1 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
EXEC #1:c=0,e=11572,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=41752681936727
WAIT #1: nam='SQL*Net message to client' ela= 1 p1=1650815232 p2=1 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Execute Reply' ela= 166 p1=200 p2=1 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Execute Reply' ela= 310 p1=200 p2=2 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
FETCH #1:c=0,e=557,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=41752681937320
WAIT #1: nam='SQL*Net message from client' ela= 223 p1=1650815232 p2=1 p3=0
WAIT #1: nam='SQL*Net message to client' ela= 1 p1=1650815232 p2=1 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 18 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='DFS lock handle' ela= 263 p1=1128857605 p2=9 p3=5
WAIT #1: nam='DFS lock handle' ela= 254 p1=1128857605 p2=9 p3=1
WAIT #1: nam='DFS lock handle' ela= 263 p1=1128857605 p2=9 p3=3
WAIT #1: nam='DFS lock handle' ela= 73 p1=1128857605 p2=9 p3=2
WAIT #1: nam='DFS lock handle' ela= 363 p1=1128857605 p2=9 p3=2
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 24 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Signal ACK' ela= 3 p1=0 p2=1 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Signal ACK' ela= 155 p1=10 p2=2 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 1 p1=0 p2=0 p3=0
FETCH #1:c=0,e=1735,p=0,cr=0,cu=0,mis=0,r=9,dep=0,og=4,tim=41752681939329
WAIT #1: nam='DFS lock handle' ela= 249 p1=1128857605 p2=9 p3=1
WAIT #1: nam='DFS lock handle' ela= 258 p1=1128857605 p2=9 p3=3
WAIT #1: nam='DFS lock handle' ela= 66 p1=1128857605 p2=9 p3=2
WAIT #1: nam='DFS lock handle' ela= 369 p1=1128857605 p2=9 p3=2
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 19 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 1 p1=0 p2=0 p3=0
WAIT #1: nam='enqueue' ela= 201 p1=1347616774 p2=2 p3=0
WAIT #1: nam='enqueue' ela= 41 p1=1347616774 p2=2 p3=0
WAIT #1: nam='SQL*Net message from client' ela= 6981860 p1=1650815232 p2=1 p3=0
STAT #1 id=1 cnt=0 pid=0 pos=1 obj=0 op='VIEW '
STAT #1 id=2 cnt=0 pid=1 pos=1 obj=304 op='FIXED TABLE FULL X$VERSION '
通过这里可以看出,rac在正常情况下gv$视图默认的查询就是并发方式进行