DirectPath Reads 说明
在oracle 11g以前的版本中,如果对大表进行全表扫描,wait event是:db file scattered read;在11g中,如果对大表进行全表扫描,wait event是:direct path read。在11g中,大表全表扫描时数据块不经过sga而直接进pga,这样会造成每次进行大表全表扫描,物理读都是很大,而在10g中,由于全表扫描的数据块在sga中已经存在,所以执行全表扫描时,它的物理读为0。但是这里主要是oracle在优化策略上的进步,即假定大表频繁全表扫描这种现象,在生产库上不会太多,通过把数据直接读入pga,进而减少了cache buffer的繁忙交换程度,提高了cache buffer的使用效率.
DirectPath Reads 优势
1. 减少了对栓的使用,避免可能的栓争用
2. 物理IO的大小不再取决于buffer_cache中所存在的块;试想某个8个块的extent中1,3,5,7号块在高速缓存中,而2,4,6,8块没有被缓存,传统的方式在读取该extent时将会是对2,4,6,8块进行4次db file sequential read,这是一种十分可怕的状况,其效率往往要比单次读取这个区间的所有8个块还要低得多,虽然Oracle为了避免这种情况总是尽可能的不缓存大表的块(读入后总是放在队列最冷的一端);而direct path read则可以完全避免这类问题,尽可能地单次读入更多的物理块。
DirectPath Reads 噩梦
这一切听起来都很美好,但是在大并发的OLTP系统中,这东西简直是一个噩梦.通过一个awr来说明该问题:这个是一个系统的awr报告,朋友反馈说系统有段运行缓慢,请求帮忙找出原因
分析总体信息
系统这段时间会话临时大幅度增加(从102增加到223),系统出现异常繁忙(60.62*16=969.92<2,454.52)
分析Load Profile信息
通过这个截图发现系统的业务不是很大,但是Physical reads参数异常
1.物理读大小:25071.1*8192/1024/1024=195.86796875M/S
2.物理读将近逻辑读一半,这个在一般系统中很难得到这个比例,进一步说明物理读过高
分析Top 5信息
这里可以发现direct path read等待很多
分析Host CPU
可以发现iowait很大占40.5%,io等待异常高(195M/S能不高吗?)
补充说明:在这里我们看到的%Idle=1-%System-%User不包括%WIO
处理建议
通过上面的评估,可以确定大部分是由于 导致了数据库的物理读过高,从而使得系统反应变慢,处理方法就是关闭掉11g该新特性
alter system set event= ‘10949 trace name context forever, level 1’ scope=spfile;
重启数据库
undo segment header坏块异常恢复
alert日志报ORA-00600[4137]与ORA-00600 [4198]错误
数据库报如下错误,运行一段时间数据库自动down掉
Fri Jul 6 18:00:40 2012 SMON: ignoring slave err,downgrading to serial rollback Fri Jul 6 18:00:41 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_smon_16636.trc: ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], [] ORACLE Instance techdb (pid = 8) - Error 600 encountered while recovering transaction (3, 17). Fri Jul 6 18:00:41 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_smon_16636.trc: ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], [] Fri Jul 6 18:05:53 2012 SMON: Restarting fast_start parallel rollback Fri Jul 6 18:05:54 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_p000_17124.trc: ORA-00600: internal error code, arguments: [4198], [9], [], [], [], [], [], [] ………… Wed Jul 6 18:50:38 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_pmon_4473.trc: ORA-00474: SMON process terminated with error Wed Jul 6 18:50:38 2012 PMON: terminating instance due to error 474
从三个地方得出3号回滚段异常
1.trace文件
SMON: about to recover undo segment 3 Parallel Transaction recovery caught exception 12801 Parallel Transaction recovery caught error 30317 *** 2012-07-06 17:55:19.042 SMON: Restarting fast_start parallel rollback SMON: about to recover undo segment 3 SMON: mark undo segment 3 as available SMON: about to recover undo segment 3 SMON: mark undo segment 3 as available Parallel Transaction recovery caught exception 12801 Parallel Transaction recovery caught error 607 *** 2012-07-06 17:55:19.761 SMON: ignoring slave err,downgrading to serial rollback SMON: about to recover undo segment 3 XID passed in =xid: 0x0003.011.00003c2b XID from Undo block =xid: 0x0004.020.00002b35
2.alert中提示while recovering transaction (3, 17)
3.查询dba_rollback_segs发现_SYSSMU3$是NEED RECOVERY状态
尝试删除_SYSSMU3$
使用隐含参数_offline_rollback_segments= _SYSSMU3$
Fri Jul 6 18:16:19 2012 Completed: ALTER DATABASE OPEN Fri Jul 6 18:16:56 2012 drop rollback segment "_SYSSMU3$" Fri Jul 6 18:16:57 2012 Errors in file /usr/local/oracle/admin/techdb/udump/techdb_ora_17381.trc: ORA-00600: internal error code, arguments: [kddummy_blkchk], [2], [41], [38508], [], [], [], [] Fri Jul 6 18:16:57 2012 Doing block recovery for file 2 block 41 Block recovery from logseq 209591, block 183 to scn 7788878085 Fri Jul 6 18:16:57 2012 Recovery of Online Redo Log: Thread 1 Group 1 Seq 209591 Reading mem 0 Mem# 0 errs 0: /usr/local/oracle/oradata/techdb/redo01.log Block recovery completed at rba 209591.225.16, scn 1.3493910790 ORA-607 signalled during: drop rollback segment "_SYSSMU3$"... Fri Jul 6 18:16:57 2012 Corrupt Block Found TSN = 1, TSNAME = UNDOTBS1 RFN = 2, BLK = 41, RDBA = 8388649 OBJN = 0, OBJD = -1, OBJECT = _NEXT_OBJECT, SUBOBJECT = SEGMENT OWNER = SYS, SEGMENT TYPE = Invalid Type Fri Jul 6 18:16:57 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_smon_17367.trc: ORA-00600: internal error code, arguments: [kddummy_blkchk], [2], [41], [38508], [], [], [], [] Doing block recovery for file 2 block 41 Block recovery from logseq 209591, block 183 to scn 7788878085 Fri Jul 6 18:17:46 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_pmon_17355.trc: ORA-00474: SMON process terminated with error Fri Jul 6 18:17:46 2012 PMON: terminating instance due to error 474 Fri Jul 6 18:17:46 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_dbw0_17361.trc: ORA-00474: SMON process terminated with error Fri Jul 6 18:17:46 2012 Errors in file /usr/local/oracle/admin/techdb/bdump/techdb_lgwr_17363.trc: ORA-00474: SMON process terminated with error Instance terminated by PMON, pid = 17355
这里可以看出在使用隐含参数删除异常回滚段的时候,因为该回滚段有坏块出现ORA-00600[kddummy_blkchk]使得数据库donw掉,重启过几次该库都因为这个错误直接down.
查看trace文件发现
SMON: about to recover undo segment 3 SMON: mark undo segment 3 as needs recovery *** 2012-07-06 18:16:57.734 Block Checking: DBA = 8388649, Block Type = System Managed Segment Header Block ERROR: SMU Segment Header Corrupted. Error Code = 38508 ktu4smck: starting extent(0x77) of txn slot #0x11 is invalid. valid value (0 - 0x76) TRN CTL:: seq: 0xed38 chd: 0x0020 ctl: 0x002a inc: 0x00000000 nfb: 0x0000 mgc: 0x8201 xts: 0x0068 flg: 0x0001 opt: 2147483646 (0x7ffffffe) uba: 0x00a6610a.ed38.1d scn: 0x0001.d030de86 Version: 0x01
因为该库是因为undo的3号回滚段的header出现坏块,即使使用了隐含参数屏蔽该回滚段恢复,smon进程依然会去读回滚段header,从而出现该错误导致直接down掉.
处理方案
1.使用隐含参数屏蔽异常回滚段_offline_rollback_segments= _SYSSMU3$
2.修改undo_tablespace=SYSTEM/undo_management=MANUAL
3.启动数据库,快速删除包含_SYSSMU3$ undo表空间
4.新建undo表空间
5.修改undo_tablespace=new_undo/undo_management=AUTO,除掉隐含参数
6.使用新参数文件重启数据库
7.建议:使用逻辑导出导入重建数据库
分析一例 TX Enqueue contention案例
应用反馈某个业务比较慢,需要紧急处理
查询等待事件
SQL> select * from v$version; BANNER ---------------------------------------------------------------- Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production PL/SQL Release 9.2.0.8.0 - Production CORE 9.2.0.8.0 Production TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production NLSRTL Version 9.2.0.8.0 - Production SQL> select A.INST_ID,count(*), event 2 from Gv$session_wait a 3 where event not in ('SQL*Net more data to client', 4 'rdbms ipc message', 5 'smon timer', 6 'pmon timer', 7 'SQL*Net message from client', 8 'lock manager wait for remote message', 9 'ges remote message', 10 'gcs remote message', 11 'gcs for action', 12 'client message', 13 'pipe get', 14 'null event', 15 'PX Idle Wait', 16 'single-task message', 17 'PX Deq: Execution Msg', 18 'KXFQ: kxfqdeq - normal deqeue', 19 'listen endpoint status', 20 'slave wait', 21 'wakeup time manager','jobq slave wait') 22 group by INST_ID,event 23 order by 1 desc,2 desc; INST_ID COUNT(*) EVENT ---------- ---------- -------------------------------------- 2 8 enqueue 2 1 async disk IO 2 1 db file sequential read 2 1 SQL*Net message to client 2 1 PX Deq: reap credit 1 2 global cache cr request 1 1 async disk IO 1 1 PX Deq: reap credit 1 1 PX Deq: Execute Reply 9 rows selected.
发现enqueue等待有些多,怀疑是TX enquenue,查询阻塞者
SQL> set linesize 100 SQL> set pagesize 66 SQL> col c1 for a15 SQL> col c1 heading "Program Name " SQL> select l.inst_id,l.SID,program c1,l.TYPE,l.ID1,l.ID2,l.LMODE,l.REQUEST 2 from gv$lock l,gv$session s 3 where l.type like 'TX' and l.REQUEST =6 4 and l.inst_id=s.inst_id and l.sid=s.sid 5 order by id1 6 / INST_ID SID Program Name TY ID1 ID2 LMODE REQUEST ---------- ---------- --------------- -- ---------- ---------- ---------- ---------- 2 295 rtStopMain@zwq_ TX 1441805 2391806 0 6 bill_2 (TNS V1- V3) 2 992 rtStopMain@zwq_ TX 1441805 2391806 0 6 bill_2 (TNS V1- V3) 2 1238 rtStopMain@zwq_ TX 6946827 2546365 0 6 bill_2 (TNS V1- V3) 2 1298 rtStopMain@zwq_ TX 6946827 2546365 0 6 bill_2 (TNS V1- V3) 2 1684 rtStopMain@zwq_ TX 6946827 2546365 0 6 bill_2 (TNS V1- V3) 2 1553 rtStopMain@zwq_ TX 6946827 2546365 0 6 bill_2 (TNS V1- V3) 2 75 rtStopMain@zwq_ TX 12451856 199146 0 6 bill_2 (TNS V1- V3) 2 1125 rtStopMain@zwq_ TX 14352404 63837 0 6 bill_2 (TNS V1- V3)
查询持有者
SQL> set linesize 100 SQL> set pagesize 66 SQL> col c1 for a15 SQL> col c1 heading "Program Name " SQL> select l.inst_id,l.SID,program c1,l.TYPE,l.ID1,l.ID2,l.LMODE,l.REQUEST 2 from gv$lock l,gv$session s 3 where l.type like 'TX' and l.LMODE =6 and (l.ID1,l.ID2) in 4 (select id1,id2 from gv$lock where type like 'TX' and REQUEST =6) 5 and l.inst_id=s.inst_id and l.sid=s.sid 6 order by id1 7 / INST_ID SID Program Name TY ID1 ID2 LMODE REQUEST ---------- ---------- --------------- -- ---------- ---------- ---------- ---------- 2 75 rtStopMain@zwq_ TX 1441805 2391806 6 0 bill_2 (TNS V1- V3) 2 992 rtStopMain@zwq_ TX 6946827 2546365 6 0 bill_2 (TNS V1- V3) 2 295 rtStopMain@zwq_ TX 12451856 199146 6 0 bill_2 (TNS V1- V3) 2 1553 rtStopMain@zwq_ TX 14352404 63837 6 0 bill_2 (TNS V1- V3)
通过持有者和阻塞者可以得出:
1.持有者和阻塞者都是在2号实例上
2.持有者75阻塞了295/992的会话
3.持有者992阻塞了1238/1298/1684/1553的会话
4.持有者295阻塞了75的会话
5.持有者1553阻塞了1125的会话
6.同时分析发现,所有的持有者sid也在阻塞者中,也就是持有者阻塞了某个sid,而自身又被其他sid给阻塞,形成了多级阻塞或者环.如:75阻塞了295,而295有阻塞了75;992阻塞了1553,而1553阻塞了1125
查询阻塞和持有者对象
SQL> set linesize 110 SQL> col c0 for 999 SQL> col c0 heading "INS" SQL> col c1 for a15 SQL> col c1 heading "Program Name " SQL> select inst_id c0,sid,program c1,ROW_WAIT_OBJ# object_no, ROW_WAIT_FILE# Rfile_no, 2 ROW_WAIT_BLOCK# Block_no ,ROW_WAIT_ROW# Row_no 3 from gv$session 4 where (inst_id,sid) in (select inst_id,sid from gv$session_wait where p1='1415053318') 5 / INS SID Program Name OBJECT_NO RFILE_NO BLOCK_NO ROW_NO ---- ---------- --------------- ---------- ---------- ---------- ---------- 2 75 rtStopMain@zwq_ 1323132 13 122601 111 bill_2 (TNS V1- V3) 2 295 rtStopMain@zwq_ 1323132 13 122601 100 bill_2 (TNS V1- V3) 2 992 rtStopMain@zwq_ 1323132 13 122601 101 bill_2 (TNS V1- V3) 2 1125 rtStopMain@zwq_ 1323132 84 38445 70 bill_2 (TNS V1- V3) 2 1238 rtStopMain@zwq_ 1323132 15 255066 41 bill_2 (TNS V1- V3) 2 1298 rtStopMain@zwq_ 1323132 14 118411 8 bill_2 (TNS V1- V3) 2 1553 rtStopMain@zwq_ 1323132 15 255066 19 bill_2 (TNS V1- V3) 2 1684 rtStopMain@zwq_ 1323132 14 118411 21 bill_2 (TNS V1- V3) 8 rows selected. SQL> set linesize 100 SQL> set pagesize 100 SQL> col owner for a10 SQL> col object_name for a20 SQL> col object_type for a10 SQL> select owner,object_name,object_id,object_type 2 from dba_objects 3 where 4 object_id in (select ROW_WAIT_OBJ# from gv$session 5 where (inst_id, sid) in (select inst_id,sid from gv$session_wait where p1='1415053318')) 6 / OWNER OBJECT_NAME OBJECT_ID OBJECT_TYP ---------- -------------------- ---------- ---------- DBACCADM DCUSTCREDITBALANCE 1323132 TABLE
通过查询的出来,所有操作的聚焦点都是在DBACCADM.DCUSTCREDITBALANCE表上面
查询相关sql语句
SQL> SQL> set linesize 120 SQL> set pagesize 66 SQL> col c0 for 999 SQL> col c0 heading "INS" SQL> col c1 for a9 SQL> col c1 heading "OS User" SQL> col c2 for a9 SQL> col c2 heading "Oracle User" SQL> col c3 for a15 SQL> col c3 heading "Program Name" SQL> col b1 for a9 SQL> col b1 heading "Unix PID" SQL> col b2 for 9999 justify left SQL> col b2 heading "ORA SID" SQL> col b3 for 999999 justify left SQL> col b3 heading "SERIAL#" SQL> col sql_text for a45 SQL> set space 1 SQL> break on b1 nodup on c0 nodup on c3 nodup on c1 nodup on c2 nodup on b2 nodup on b3 skip 2 SQL> select a.inst_id c0,b.sid b2,c.spid b1, b.program c3, b.username c2,b.serial# b3, a.sql_text 2 from gv$sql a, gv$session b, gv$process c 3 where 4 a.address = b.sql_address 5 and b.paddr = c.addr 6 and a.hash_value = b.sql_hash_value 7 and a.inst_id=b.inst_id and a.inst_id=c.inst_id 8 and a.inst_id like '&inst_id' and b.sid like '&sid' 9 order by c.spid,a.hash_value 10 / Enter value for inst_id: 2 Enter value for sid: 75 old 8: and a.inst_id like '&inst_id' and b.sid like '&sid' new 8: and a.inst_id like '2' and b.sid like '75' INS ORA SID Unix PID Program Name Oracle Us SERIAL# SQL_TEXT ---- ------- --------- --------------- --------- ------- --------------------------------------------- 2 75 1167392 rtStopMain@zwq_ DBCUSTOPR 42815 update dcustcreditbalance set limit_owe=:b0, bill_2 (TNS V1- unbill_fee=:b1,prepay_fee=:b2,owe_fee=:b3,op_ V3) time=sysdate where id_no=:b4 SQL> / Enter value for inst_id: 2 Enter value for sid: 992 old 8: and a.inst_id like '&inst_id' and b.sid like '&sid' new 8: and a.inst_id like '2' and b.sid like '992' INS ORA SID Unix PID Program Name Oracle Us SERIAL# SQL_TEXT ---- ------- --------- --------------- --------- ------- --------------------------------------------- 2 992 2760870 rtStopMain@zwq_ DBCUSTOPR 56282 update dcustcreditbalance set limit_owe=:b0, bill_2 (TNS V1- unbill_fee=:b1,prepay_fee=:b2,owe_fee=:b3,op_ V3) time=sysdate where id_no=:b4 SQL> / Enter value for inst_id: 2 Enter value for sid: 295 old 8: and a.inst_id like '&inst_id' and b.sid like '&sid' new 8: and a.inst_id like '2' and b.sid like '295' INS ORA SID Unix PID Program Name Oracle Us SERIAL# SQL_TEXT ---- ------- --------- --------------- --------- ------- --------------------------------------------- 2 295 1639008 rtStopMain@zwq_ DBCUSTOPR 35740 update dcustcreditbalance set limit_owe=:b0, bill_2 (TNS V1- unbill_fee=:b1,prepay_fee=:b2,owe_fee=:b3,op_ V3) time=sysdate where id_no=:b4
其他阻塞者和持有者执行sql语句均和该语句相同,省略其他查询.通过这些查询可以确定是因为对dcustcreditbalance表的更新操作导致了这样的现象发生.
处理方案
1.临时处理方案:kill掉持有者
2.永久处理方案:修改这部分程序业务逻辑
关闭数据库出现ORA-00379错误
关闭数据库出现ORA-00379错误
SQL> shutdown immediate ORA-00604: 递归 SQL 级别 1 出现错误 ORA-00379: 缓冲池 DEFAULT 中无法提供 8K 块大小的空闲缓冲区
查看内存分配
SQL> show parameter sga; NAME TYPE VALUE ------------------------------------ ----------- ----------- lock_sga boolean FALSE pre_page_sga boolean FALSE sga_max_size big integer 412M sga_target big integer 0 SQL> select * from v$sgainfo; NAME BYTES RES -------------------------------- ---------- --- Fixed SGA Size 1333676 No Redo Buffers 6078464 No Buffer Cache Size 104857600 Yes Shared Pool Size 142606336 Yes Large Pool Size 4194304 Yes Java Pool Size 12582912 Yes Streams Pool Size 0 Yes Shared IO Pool Size 0 Yes Granule Size 4194304 No Maximum SGA Size 431038464 No Startup overhead in Shared Pool 46137344 No Free SGA Memory Available 159383552 --spfile中分配情况 orcl.__db_cache_size=104857600 orcl.__java_pool_size=12582912 orcl.__large_pool_size=4194304 orcl.__pga_aggregate_target=104857600 orcl.__sga_target=281018368 orcl.__shared_io_pool_size=0 orcl.__shared_pool_size=142606336 orcl.__streams_pool_size=0 --初始化参数 *.sga_max_size=0 *.sga_target=536870912 *.memory_max_target=536870912 *.memory_target=536870912
alert日志
Mon Jul 02 11:30:19 2012 DIA0 started with pid=8, OS id=1520 Errors in file e:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_dia0_1520.trc (incident=10883): ORA-04030: out of process memory when trying to allocate 29916 bytes (heap_ksdhngreq,msg_body:ksdhng) Errors in file e:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_dia0_1520.trc (incident=10884): ORA-04030: out of process memory when trying to allocate 8204 bytes (diag pga,dbgtbDefaultBucket) ORA-04030: out of process memory when trying to allocate 29916 bytes (heap_ksdhngreq,msg_body:ksdhng) ORA-4030 : opidrv aborting process DIA0 ospid (1348_1520) Errors in file e:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_dia0_1520.trc (incident=12013): ORA-04030: out of process memory when trying to allocate 8204 bytes (diag pga,dbgtbDefaultBucket) ORA-04030: out of process memory when trying to allocate 8204 bytes (diag pga,dbgtbDefaultBucket) ORA-04030: out of process memory when trying to allocate 29916 bytes (heap_ksdhngreq,msg_body:ksdhng) Process debug not enabled via parameter _debug_enable Mon Jul 02 11:33:19 2012 Trace dumping is performing id=[cdmp_20120702113319] Errors in file e:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_1820.trc: ORA-00379: no free buffers available in buffer pool DEFAULT for block size 8K ORA-00379: no free buffers available in buffer pool DEFAULT for block size 8K Mon Jul 02 11:33:49 2012 Errors in file e:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_1820.trc: ORA-00379: no free buffers available in buffer pool DEFAULT for block size 8K Mon Jul 02 11:34:38 2012 Errors in file e:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_1820.trc: ORA-00379: no free buffers available in buffer pool DEFAULT for block size 8K Mon Jul 02 11:37:05 2012 Errors in file e:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_2400.trc: ORA-00379: 缓冲池 DEFAULT 中无法提供 8K 块大小的空闲缓冲区 ORA-00379: 缓冲池 DEFAULT 中无法提供 8K 块大小的空闲缓冲区 ORA-00379: 缓冲池 DEFAULT 中无法提供 8K 块大小的空闲缓冲区 Tue Jul 03 09:58:06 2012 WARNING: sga_target 432013312 cannot be more than memory_target (432013312) - pga_aggregate_target (104857600/0) or untunable pga 104857600, 73783296
通过这里可以看出,系统的data buffe和pga都有内存不足的报错.
解决问题
问题的原因是由于内存分配不多,导致sga组件被消耗完,现在数据库不能正常关闭,修改了相关的内存参数的配置[避免该bug采用asmm内存管理]也无法生效,现在需要做的任务是重启数据库.导致数据库不能被关闭的原因是因为data buffer中的脏数据不能写入新数据.查询MOS发现是Bug 7702085.正常关闭库解决办法手工刷sga组件,然后升级数据库到11.2.0.1 (Base Release)/11.1.0.7.3 (Patch Set Update)/11.1.0.7 Patch 25 on Windows Platforms
SQL> alter system flush BUFFER_CACHE; System altered. SQL> alter system flush SHARED_POOL; System altered. SQL> shutdown immediate Database closed. Database dismounted. ORACLE instance shut down.
使用copy实现long类型转移表空间
在一次8.1.6的数据库恢复过程中,发现一个表空间的数据文件损坏,在转移该表空间相关表时,遇到让人郁闷的long类型.不能使用ctas和move来实现转移,最后通过古老的copy来实现该项工作.
模拟LONG类型表
SQL> create table chf.t_long (id number,name long) tablespace ts_xifenfei; Table created. SQL> insert into chf.t_long select object_id,object_name from dba_objects where rownum<10; 9 rows created. SQL> commit; Commit complete. SQL> desc chf.t_long Name Null? Type ----------------------------------------- -------- ----------------- ID NUMBER NAME LONG
测试ctas和move
SQL> create table chf.t_long_bak 2 as 3 select * from chf.t_long; select * from chf.t_long * ERROR at line 3: ORA-00997: illegal use of LONG datatype SQL> alter table chf.t_long move tablespace users; alter table chf.t_long move tablespace users * ERROR at line 1: ORA-00997: illegal use of LONG datatype
使用copy实现LONG表跟换表空间
SQL> SET LONG 1000 SQL> select dbms_metadata.get_ddl('TABLE','T_LONG','CHF') from dual; DBMS_METADATA.GET_DDL('TABLE','T_LONG','CHF') -------------------------------------------------------------------------------- CREATE TABLE "CHF"."T_LONG" ( "ID" NUMBER, "NAME" LONG ) SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "TS_XIFENFEI" SQL> CREATE TABLE "CHF"."T_LONG_BAK" 2 ( "ID" NUMBER, 3 "NAME" LONG 4 ) SEGMENT CREATION IMMEDIATE 5 PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 6 NOCOMPRESS LOGGING 7 STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 8 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 9 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) 10 TABLESPACE "USERS"; Table created. SQL> copy from chf/xifenfei@ora11g_d INSERT chf.t_long_bak using select * from chf.t_long; Array fetch/bind size is 15. (arraysize is 15) Will commit when done. (copycommit is 0) Maximum long size is 80. (long is 80) 9 rows selected from chf@ora11g_d. 9 rows inserted into CHF.T_LONG_BAK. 9 rows committed into CHF.T_LONG_BAK at DEFAULT HOST connection. SQL> alter table t_long rename to t_long_old; Table altered. SQL> alter table t_long_bak rename to t_long; Table altered. SQL> select tablespace_name,table_name from dba_tables where table_name like 'T_LONG%'; TABLESPACE_NAME TABLE_NAME --------------- --------------- TS_XIFENFEI T_LONG_OLD USERS T_LONG SQL> DROP TABLE T_LONG_OLD PURGE; Table dropped.
iscsiadm主要操作命令
当前包含磁盘
[root@xifenfei ~]# fdisk -l Disk /dev/sda: 21.4 GB, 21474836480 bytes 255 heads, 63 sectors/track, 2610 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 2355 18916506 83 Linux /dev/sda2 2356 2610 2048287+ 82 Linux swap / Solaris Disk /dev/sdb: 21.4 GB, 21474836480 bytes 255 heads, 63 sectors/track, 2610 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 1 2610 20964793+ 83 Linux Disk /dev/sdc: 2147 MB, 2147483648 bytes 255 heads, 63 sectors/track, 261 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdc doesn't contain a valid partition table Disk /dev/sdd: 21.4 GB, 21474836480 bytes 255 heads, 63 sectors/track, 2610 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdd1 1 100 803218+ 83 Linux /dev/sdd2 101 1000 7229250 83 Linux
查看iscsi运行情况
[root@xifenfei ~]# rpm -aq|grep iscsi iscsi-initiator-utils-6.2.0.872-10.0.1.el5 [root@xifenfei ~]# chkconfig --list |grep iscsi iscsi 0:off 1:off 2:on 3:on 4:on 5:on 6:off iscsid 0:off 1:off 2:off 3:on 4:on 5:on 6:off [root@xifenfei ~]# ps -ef|grep iscs root 2753 2 0 Jun21 ? 00:00:00 [iscsi_eh] root 15793 1 0 09:08 ? 00:00:00 brcm_iscsiuio root 15800 1 0 09:08 ? 00:00:00 iscsid root 15802 1 0 09:08 ? 00:00:00 iscsid root 19533 15269 0 10:11 pts/1 00:00:00 grep iscs
配置iscsi存储
[root@xifenfei ~]# iscsiadm -m discovery -t sendtargets -p 192.168.1.254:3260 192.168.1.254:3260,1 iqn.2006-01.com.openfiler:tsn.32b32087937b [root@xifenfei ~]# iscsiadm -m node –T iqn.2006-01.com.openfiler:tsn.32b32087937b -p 192.168.1.254:3260 -l Logging in to [iface: default, target: iqn.2006-01.com.openfiler:tsn.32b32087937b, portal: 192.168.1.254,3260] Login to [iface: default, target: iqn.2006-01.com.openfiler:tsn.32b32087937b, portal: 192.168.1.254,3260] successful. [root@xifenfei ~]# iscsiadm -m node –T iqn.2006-01.com.openfiler:tsn.32b32087937b -p 192.168.1.254:3260 >--op update -n node.startup -v automatic
当前包含磁盘
[root@xifenfei ~]# fdisk -l Disk /dev/sda: 21.4 GB, 21474836480 bytes 255 heads, 63 sectors/track, 2610 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 2355 18916506 83 Linux /dev/sda2 2356 2610 2048287+ 82 Linux swap / Solaris Disk /dev/sdb: 21.4 GB, 21474836480 bytes 255 heads, 63 sectors/track, 2610 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 1 2610 20964793+ 83 Linux Disk /dev/sdc: 2147 MB, 2147483648 bytes 255 heads, 63 sectors/track, 261 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdc doesn't contain a valid partition table Disk /dev/sdd: 21.4 GB, 21474836480 bytes 255 heads, 63 sectors/track, 2610 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdd1 1 100 803218+ 83 Linux /dev/sdd2 101 1000 7229250 83 Linux Disk /dev/sde: 1073 MB, 1073741824 bytes 34 heads, 61 sectors/track, 1011 cylinders Units = cylinders of 2074 * 512 = 1061888 bytes Disk /dev/sde doesn't contain a valid partition table Disk /dev/sdf: 1073 MB, 1073741824 bytes 34 heads, 61 sectors/track, 1011 cylinders Units = cylinders of 2074 * 512 = 1061888 bytes Disk /dev/sdf doesn't contain a valid partition table Disk /dev/sdg: 1073 MB, 1073741824 bytes 34 heads, 61 sectors/track, 1011 cylinders Units = cylinders of 2074 * 512 = 1061888 bytes Disk /dev/sdg doesn't contain a valid partition table
卸载iscsi存储
[root@xifenfei ~]# iscsiadm -m node --logoutall=all Logging out of session [sid: 3, target: iqn.2006-01.com.openfiler:tsn.32b32087937b, portal: 192.168.1.254,3260] Logout of [sid: 3, target: iqn.2006-01.com.openfiler:tsn.32b32087937b, portal: 192.168.1.254,3260] successful. [root@xifenfei ~]# iscsiadm -m node --op delete --targetname iqn.2006-01.com.openfiler:tsn.32b32087937b
当前包含磁盘
[root@xifenfei ~]# fdisk -l Disk /dev/sda: 21.4 GB, 21474836480 bytes 255 heads, 63 sectors/track, 2610 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 2355 18916506 83 Linux /dev/sda2 2356 2610 2048287+ 82 Linux swap / Solaris Disk /dev/sdb: 21.4 GB, 21474836480 bytes 255 heads, 63 sectors/track, 2610 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 1 2610 20964793+ 83 Linux Disk /dev/sdc: 2147 MB, 2147483648 bytes 255 heads, 63 sectors/track, 261 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdc doesn't contain a valid partition table Disk /dev/sdd: 21.4 GB, 21474836480 bytes 255 heads, 63 sectors/track, 2610 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdd1 1 100 803218+ 83 Linux /dev/sdd2 101 1000 7229250 83 Linux
iscsi操作总结
增加iscsi存储 (1)发现iscsi存储:iscsiadm -m discovery -t st -p ISCSI_IP (2)查看iscsi发现记录:iscsiadm -m node (3)登录iscsi存储:iscsiadm -m node -T LUN_NAME -p ISCSI_IP -l (4)开机自动: iscsiadm -m node –T LUN_NAME -p ISCSI_IP --op update -n node.startup -v automatic 删除iscsi存储 (1)登出iscsi存储 iscsiadm -m node -T LUN_NAME -p ISCSI_IP -u (2)对出iscsi所有登录 iscsiadm -m node --logoutall=all (3)删除iscsi发现记录:iscsiadm -m node -o delete -T LUN_NAME -p ISCSI_IP 登入需验证码的节点 (1)开启认证 iscsiadm -m node -T LUN_NAME -o update --name node.session.auth.authmethod --value=CHAP *.使用-o同--op (2)添加用户 iscsiadm -m node -T LUN_NAME --op update --name node.session.auth.username --value=[用户名] (3)添加密码 iscsiadm –m node –T LUN_NAME –op update –name node.session.auth.password –value=[密码]
gv$视图不能查询所有节点信息
今天遇到诡异的事情,AIX 5.3 ORACLE 9I RAC的gv$视图只能查询到本地的记录,而不是所有节点.但是所有节点均运行正常,除gv$视图之外未发现其他异常.
异常时节点1信息
SQL> show parameter clu; NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ cluster_database boolean TRUE cluster_database_instances integer 2 cluster_interconnects string 192.168.6.24 SQL> select * from v$version; BANNER ---------------------------------------------------------------- Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production PL/SQL Release 9.2.0.8.0 - Production CORE 9.2.0.8.0 Production TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production NLSRTL Version 9.2.0.8.0 - Production SQL> col host_name for a10 SQL> select inst_id,HOST_NAME,STATUS from gv$instance; INST_ID HOST_NAME STATUS ---------- ---------- ------------ 1 zwq_crm1 OPEN SQL> show parameter par; NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ fast_start_parallel_rollback string LOW log_parallelism integer 1 parallel_adaptive_multi_user boolean FALSE parallel_automatic_tuning boolean FALSE parallel_execution_message_size integer 2152 parallel_instance_group string parallel_max_servers integer 5 parallel_min_percent integer 0 parallel_min_servers integer 0 parallel_server boolean TRUE parallel_server_instances integer 2 parallel_threads_per_cpu integer 2 partition_view_enabled boolean FALSE recovery_parallelism integer 0 SQL> !ps -ef|grep p0 oracrm 1929258 1 0 09:49:19 - 0:01 ora_p005_crm1 oracrm 745844 1 0 Jun 26 - 0:20 ora_p004_crm1 oraeye 2421272 3948648 0 21:53:49 pts/0 0:00 grep p0 oracrm 3060406 1 0 Jun 26 - 0:20 ora_p002_crm1 oracrm 3170868 1 0 Jun 20 - 2:13 ora_p000_crm1 oracrm 787414 1 0 Jun 26 - 0:20 ora_p001_crm1 oracrm 2552690 1 0 Jun 26 - 0:20 ora_p003_crm1
1.节点最大允许5个并发进程,现在已经启动并发进程到p005(6个)
2.gv$视图只能查询一个节点信息
异常时节点2信息
问题所有情况和1节点完全相似
SQL> col host_name for a10 SQL> select inst_id,HOST_NAME,STATUS from gv$instance; INST_ID HOST_NAME STATUS ---------- ---------- ------------ 2 zwq_crm2 OPEN SQL> show parameter par; NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ fast_start_parallel_rollback string LOW log_parallelism integer 1 parallel_adaptive_multi_user boolean FALSE parallel_automatic_tuning boolean FALSE parallel_execution_message_size integer 2152 parallel_instance_group string parallel_max_servers integer 5 parallel_min_percent integer 0 parallel_min_servers integer 0 parallel_server boolean TRUE parallel_server_instances integer 2 parallel_threads_per_cpu integer 2 partition_view_enabled boolean FALSE recovery_parallelism integer 0 SQL> !ps -ef|grep p0 oracrm 1867938 1 0 15:17:25 - 0:00 ora_p004_crm2 oracrm 2633748 1 0 09:49:19 - 0:01 ora_p005_crm2 oraeye 3059876 1007714 0 21:54:01 pts/0 0:00 grep p0 oracrm 323884 1 120 Jun 20 - 10692:47 ora_p000_crm2 oracrm 1839818 1 0 06:16:32 - 0:00 ora_p003_crm2 oracrm 459660 1 107 Jun 26 - 1857:00 ora_p001_crm2 oracrm 2351894 1 0 16:52:52 - 0:00 ora_p002_crm2
在异常2节点上做10046
SQL> oradebug setmypid Statement processed. SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 4 Statement processed. SQL> select * from gv$version; INST_ID BANNER ---------- ---------------------------------------------------------------- 2 Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production 2 PL/SQL Release 9.2.0.8.0 - Production 2 CORE 9.2.0.8.0 Production 2 TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production 2 NLSRTL Version 9.2.0.8.0 - Production SQL> oradebug EVENT 10046 trace name context off Statement processed. SQL> oradebug TRACEFILE_NAME /oracle9/app/admin/crm/udump/crm2_ora_517066.trc
分析10046内容发现
PARSING IN CURSOR #1 len=24 dep=0 uid=0 oct=3 lid=0 tim=41759005850609 hv=775381991 ad='21195808' select * from gv$version END OF STMT PARSE #1:c=20000,e=33894,p=1,cr=59,cu=3,mis=1,r=0,dep=0,og=4,tim=41759005850607 BINDS #1: kxfpg1srv could not start P006, inst 1 kxfpg1srv could not start local P006 EXEC #1:c=0,e=3540,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=41759005854254 FETCH #1:c=0,e=28,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=41759005854347 FETCH #1:c=0,e=23,p=0,cr=0,cu=0,mis=0,r=4,dep=0,og=4,tim=41759005854678 STAT #1 id=1 cnt=5 pid=0 pos=1 obj=0 op='VIEW ' STAT #1 id=2 cnt=5 pid=1 pos=1 obj=304 op='FIXED TABLE FULL X$VERSION '
通过这里观察10046可以看到:在2节点上查询gv$instance,需要通过使用并发进程去访问1节点,但是因为1节点的parallel_max_servers为5,而当前的并发进程已经达到最大数目,从而使得想在节点1上启动并发失败,进入使得gv$视图只能查询出来本节点数据
gv$视图异常解决方法
--重启两个节点,查询正常 SQL> col host_name for a10 SQL> select inst_id,HOST_NAME,STATUS from gv$instance; INST_ID HOST_NAME STATUS ---------- ---------- ------------ 1 zwq_crm1 OPEN 2 zwq_crm2 OPEN
针对这个问题,很可能是oracle bug(因为是9i版本,我无法深究),或者是并发进程僵死所致,当时有一个想法,kill 掉数据库并发进程,因客户不同意(采用稳妥重启方案),未能通过尝试验证我的猜想.
对gv$视图正常做10046
SQL> oradebug setmypid Statement processed. SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12 Statement processed. SQL> select * from gv$version; INST_ID BANNER ---------- ---------------------------------------------------------------- 1 Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production 1 PL/SQL Release 9.2.0.8.0 - Production 1 CORE 9.2.0.8.0 Production 1 TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production 1 NLSRTL Version 9.2.0.8.0 - Production 2 Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production 2 PL/SQL Release 9.2.0.8.0 - Production 2 CORE 9.2.0.8.0 Production 2 TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production 2 NLSRTL Version 9.2.0.8.0 - Production 10 rows selected. SQL> oradebug EVENT 10046 trace name context off Statement processed. SQL> oradebug TRACEFILE_NAME /oracle9/app/admin/crm/udump/crm1_ora_1708916.trc
分析10046内容
PARSING IN CURSOR #1 len=24 dep=0 uid=0 oct=3 lid=0 tim=41752681925071 hv=775381991 ad='92ef64f0' select * from gv$version END OF STMT PARSE #1:c=0,e=6770,p=0,cr=12,cu=3,mis=1,r=0,dep=0,og=4,tim=41752681925070 BINDS #1: WAIT #1: nam='PX Deq: reap credit' ela= 21 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: Join ACK' ela= 237 p1=268500992 p2=1 p3=504403208016510312 WAIT #1: nam='PX Deq: reap credit' ela= 4 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: Join ACK' ela= 64 p1=268500992 p2=2 p3=504403208016510312 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: Join ACK' ela= 46 p1=268566528 p2=1 p3=504403208016502096 WAIT #1: nam='PX Deq: reap credit' ela= 4 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: Join ACK' ela= 855 p1=268566528 p2=2 p3=504403208016502096 WAIT #1: nam='PX Deq: reap credit' ela= 4 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq Credit: send blkd' ela= 188 p1=268566528 p2=1 p3=504403208016502096 WAIT #1: nam='PX Deq: reap credit' ela= 1 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: Parse Reply' ela= 406 p1=200 p2=1 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: Parse Reply' ela= 7442 p1=200 p2=2 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 4 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: Parse Reply' ela= 1664 p1=200 p2=1 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 EXEC #1:c=0,e=11572,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=41752681936727 WAIT #1: nam='SQL*Net message to client' ela= 1 p1=1650815232 p2=1 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: Execute Reply' ela= 166 p1=200 p2=1 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: Execute Reply' ela= 310 p1=200 p2=2 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 FETCH #1:c=0,e=557,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=41752681937320 WAIT #1: nam='SQL*Net message from client' ela= 223 p1=1650815232 p2=1 p3=0 WAIT #1: nam='SQL*Net message to client' ela= 1 p1=1650815232 p2=1 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 18 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='DFS lock handle' ela= 263 p1=1128857605 p2=9 p3=5 WAIT #1: nam='DFS lock handle' ela= 254 p1=1128857605 p2=9 p3=1 WAIT #1: nam='DFS lock handle' ela= 263 p1=1128857605 p2=9 p3=3 WAIT #1: nam='DFS lock handle' ela= 73 p1=1128857605 p2=9 p3=2 WAIT #1: nam='DFS lock handle' ela= 363 p1=1128857605 p2=9 p3=2 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 24 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: Signal ACK' ela= 3 p1=0 p2=1 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: Signal ACK' ela= 155 p1=10 p2=2 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 1 p1=0 p2=0 p3=0 FETCH #1:c=0,e=1735,p=0,cr=0,cu=0,mis=0,r=9,dep=0,og=4,tim=41752681939329 WAIT #1: nam='DFS lock handle' ela= 249 p1=1128857605 p2=9 p3=1 WAIT #1: nam='DFS lock handle' ela= 258 p1=1128857605 p2=9 p3=3 WAIT #1: nam='DFS lock handle' ela= 66 p1=1128857605 p2=9 p3=2 WAIT #1: nam='DFS lock handle' ela= 369 p1=1128857605 p2=9 p3=2 WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 19 p1=0 p2=0 p3=0 WAIT #1: nam='PX Deq: reap credit' ela= 1 p1=0 p2=0 p3=0 WAIT #1: nam='enqueue' ela= 201 p1=1347616774 p2=2 p3=0 WAIT #1: nam='enqueue' ela= 41 p1=1347616774 p2=2 p3=0 WAIT #1: nam='SQL*Net message from client' ela= 6981860 p1=1650815232 p2=1 p3=0 STAT #1 id=1 cnt=0 pid=0 pos=1 obj=0 op='VIEW ' STAT #1 id=2 cnt=0 pid=1 pos=1 obj=304 op='FIXED TABLE FULL X$VERSION '
通过这里可以看出,rac在正常情况下gv$视图默认的查询就是并发方式进行
因未配置Hugepage会话数添增悲剧案例
今天一朋友反馈他们的一个数据库hang住了,通过ssh也不能登录系统,他们没有办法重启系统解决问题,现在想让我帮忙找出问题原因
分析awr得出
询问朋友,他们的库一般session保持在200个左右,这次突然飙升到750以上,属于异常情况
分析监听日志
看到在截图的时间内,整体访问较频繁,某个ip访问异常频繁,通过这些信息,初步怀疑是用户的数据库内存使用完,导致系统数据库hang住.
查看系统日志
Jun 26 14:35:55 result01 kernel: [5613531.566617] Free swap = 0kB Jun 26 14:35:55 result01 kernel: [5613531.566618] Total swap = 2104504kB Jun 26 14:35:55 result01 kernel: [5613531.566620] Free swap: 0kB Jun 26 14:35:55 result01 kernel: [5613531.591073] 2359296 pages of RAM Jun 26 14:35:55 result01 kernel: [5613531.591074] 318236 reserved pages Jun 26 14:35:55 result01 kernel: [5613531.591075] 73353 pages shared Jun 26 14:35:56 result01 kernel: [5613531.591076] 529 pages swap cached Jun 26 14:35:56 result01 kernel: [5613531.591079] Out of Memory: Kill process 8904 (oracle) score 891 and children. Jun 26 14:35:56 result01 kernel: [5613531.591201] Out of memory: Killed process 8904 (oracle). Jun 26 14:35:56 result01 kernel: [5613531.592280] oracle invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
通过这个日志看出系统内存和交换分区都使用完,因为内存不够,系统开始kill掉部分oracle进程.通过这些确定是系统内存使用完导致hang住可以理解.
分析hang住原因
为什么session意外的从200添增到750的时候,系统内存被使用完
cat /proc/meminfo MemTotal: 8164240 kB SwapTotal: 2104504 kB PageTables: 69732 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 2048 kB sga_target=3674210304 pga_aggregate_target=1732247552
从这里得出几个信息:
1.数据库总内存8g,swap配置2g
2.数据库未使用Hugepage
3.数据库设置sga和pga信息
内存参数估算
数据库总计占用内存为:(3674210304+1732247552)/1024/1024=5156M(pga可能未使用完,也可能超过)
结合实际sga_target=3674210304,会话数.
保守估计下Oracle进程占用的系统内存3674210304/(4*1024)*1.5*750/1024/1024=960M
估算如果使用Hugepage Oracle进程占用系统内存为:3674210304/(2*1024*1024)*1.5*750/1024/1024=1.9M
通过这里分析Oracle总占用内存为:5156+960=6116M
通过保守计算留给系统的内存大概为:1.8G左右
因为系统的其他操作,最终导致该系统内存耗完,系统和数据库hang住
总结说明
这是一个实实在在因为linux中因为未配置Hugepage,因为用户突增,导致系统内存消耗光,从而使得系统和数据库hang住的例子.
这个库因为sga不是非常大,所以Oracle占用系统内存不是高到离谱,如果sga配置为32g,1000个session,那就会占用12g的系统内存
通过这些可以看出在linux中配置Hugepage的优点:Hugepage不光是为了减轻cpu的负担,还可以减少系统内存的消耗;在没有极端的情况下,建议linux的数据库系统配置Hugepage.
To find the TX Enqueue contention in a RAC or OPS environment
今天查找TX Enqueue看到的一篇文章,拿出来共享下
PURPOSE ------------- To find the TX Enqueue contention in a RAC or OPS environment What is TX Enqueue ? In one word oracle is maintaining queue for transaction. How Many Resources ? 1/ active transaction How Many Locks? 1/transaction + 1/process waiting for a locked row by that transaction. How Many Users? 1 + 1/ process waiting for something locked by this transaction. Who Uses? All processes What need to investigate? The mode of TX (6/4), Holding/Waiting/Requesting SCOPE & APPLICATION ===================== This document will help to analyze the application design related to transaction bottlenecks and database performance tuning. Let start with an example: =================== create table akdas (A1 number, Col1 Varchar2(10), Col2 Varchar2(10)); insert into akdas values(5,'Hello','Hi'); insert into akdas values(6,'Sudip','Datta'); insert into akdas values(7,'Preetam','Roy'); insert into akdas values(8,'Michael','Polaski'); From Node 1: ========== update akdas set a1=11 where a1=6; From Node 2: ========== update akdas set a1=12 where a1=7; update akdas set a1=11 where a1=6; /* this will wait for Node1: to complete the transaction */ This Note Is Made To Analyzing Only the TX-Mode-6 (Exclusive). 1. Now run the following query to track down the problem: Who is waiting =================================================================== prompt prompt Query 1. Waiting for TX Enqueue where mode is Exclusive prompt ===================================== prompt set linesize 100 set pagesize 66 col c1 for a15 col c1 heading "Program Name " select l.inst_id,l.SID,program c1,l.TYPE,l.ID1,l.ID2,l.LMODE,l.REQUEST from gv$lock l,gv$session s where l.type like 'TX' and l.REQUEST =6 and l.inst_id=s.inst_id and l.sid=s.sid order by id1 / Output will be here =============== INST_ID SID Program Name TY ID1 ID2 LMODE REQUEST ----------- ---------- ------------------ --- -------- -------- ---------- -------- 2 13 sqlplus@opcbsol TX 393236 780 0 6 2 (TNS V1-V3) It is clear that SID 12 of instance 2 is doing a DML and waiting on REQUEST Mode 6. 2. Let's run the next query to find who is holding =========================================== prompt prompt prompt Query 2. Holding for TX Enqueue where mode greater than 6 prompt ======================================= prompt set linesize 100 set pagesize 66 col c1 for a15 col c1 heading "Program Name " select l.inst_id,l.SID,program c1,l.TYPE,l.ID1,l.ID2,l.LMODE,l.REQUEST from gv$lock l,gv$session s where l.type like 'TX' and l.LMODE =6 and (l.ID1,l.ID2) in (select id1,id2 from gv$lock where type like 'TX' and REQUEST =6) and l.inst_id=s.inst_id and l.sid=s.sid order by id1 / Output will be here =============== INST_ID SID Program Name TY ID1 ID2 LMODE REQUEST ---------- ---------- -------------- --- ---------- -------- ----------- -------- 1 12 sqlplus@opcbsol TX 393236 780 6 0 1 (TNS V1-V3) So holder is SID 12 on instance 1. Where LMODE = 6. 3. Let's find out the exact file#, block# and Record# where it is waiting =============================================================== prompt prompt prompt Query 3. Object# ,File#, Block# and Slot# TX Enqueue in detail prompt ======================================== prompt set linesize 110 col c0 for 999 col c0 heading "INS" col c1 for a15 col c1 heading "Program Name " select inst_id c0,sid,program c1,ROW_WAIT_OBJ# object_no, ROW_WAIT_FILE# Rfile_no, ROW_WAIT_BLOCK# Block_no ,ROW_WAIT_ROW# Row_no from gv$session where (inst_id,sid) in (select inst_id,sid from gv$session_wait where p1='1415053318') / Output Will be here =============== INS SID Program Name OBJECT_NO RFILE_NO BLOCK_NO ROW_NO ----- ---------- ------------- --------------- --------- ------- 2 13 sqlplus@opcbsol 7261 9 12346 1 2 (TNS V1-V3) From the output, it is clear that it is waiting on Relative_File# 9, Block# 12346, Row Number 1. Here Row Number 1 means the slot number in the block 12346. This Row_No start from 0 (zero). 4. Let's Find the object details ============================= prompt prompt prompt Query 4. Object Involve for TX Enqueue in detail prompt =============================== prompt set linesize 100 set pagesize 100 col owner for a10 col object_name for a20 col object_type for a10 select owner,object_name,object_id,object_type from dba_objects where object_id in (select ROW_WAIT_OBJ# from gv$session where (inst_id, sid) in (select inst_id,sid from gv$session_wait where p1='1415053318')) / Output Will be here =============== OWNER OBJECT_NAME OBJECT_ID OBJECT_TYP --------- ------------ -------- ----------- AKDAS AKDAS 7261 TABLE 5. Let’s find the row value details ============================= prompt prompt prompt Query 5. Finding the row value prompt ==================== prompt select * from <Owner>.<Table Name> where rowid like DBMS_ROWID.ROWID_CREATE(1,&Object_No,&Rfile_No, &Block_No, &Row_Number) / From query 3 and 4 we will get the value for all variables. Owner = AKDAS Table_Name = AKDAS Object_No = 7261 Rfile_No = 9 Block_No = 12346 Row_Number = 1 Output Will be here =============== A1 Col1 Col2 ---------- --------------- ---------- 6 Hello Hi So we can drag down to the row value where TX Enqueue contention exists. 6. Let’s find the user activity that is "Holder" and "Waiter" ==================================================== set linesize 120 set pagesize 66 col c0 for 999 col c0 heading "INS" col c1 for a9 col c1 heading "OS User" col c2 for a9 col c2 heading "Oracle User" col c3 for a15 col c3 heading "Program Name" col b1 for a9 col b1 heading "Unix PID" col b2 for 9999 justify left col b2 heading "ORA SID" col b3 for 999999 justify left col b3 heading "SERIAL#" col sql_text for a45 set space 1 break on b1 nodup on c0 nodup on c3 nodup on c1 nodup on c2 nodup on b2 nodup on b3 skip 2 select a.inst_id c0,b.sid b2,c.spid b1, b.program c3, b.username c2,b.serial# b3, a.sql_text from gv$sql a, gv$session b, gv$process c where a.address = b.sql_address and b.paddr = c.addr and a.hash_value = b.sql_hash_value and a.inst_id=b.inst_id and a.inst_id=c.inst_id and a.inst_id like '&inst_id' and b.sid like '&sid' order by c.spid,a.hash_value / This query asks the Instance Number and Sid number, which you can get from step 1 and 2. But remember , you can see the waiter activity, but you may not see the holder activity. Reason is, the holder is sitting idle after doing the DML operation. So SQL for Holder should not be seen under gv$sql. This all query can be run for single instance database, but all GV$ view need to replace to V$ and there is no INST_ID for V$ View, that part need to be taken care.
来自:How to Find TX Enqueue Contention in RAC or OPS [ID 179582.1]
Oracle 11g丢失access$恢复方法
最近接触到两个案例都是11g数据库因为异常关闭导致access$表丢失,使得数据库不能正常open.为什么这个表会丢失还未找到原因.我这里提供一种在upgrade模式下解决给问题方法.
数据库版本
SQL> select * from v$version; BANNER -------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production PL/SQL Release 11.2.0.3.0 - Production CORE 11.2.0.3.0 Production TNS for Linux: Version 11.2.0.3.0 - Production NLSRTL Version 11.2.0.3.0 - Production SQL> select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') "xifenfei" from dual; xifenfei -------------------------------------- 2012-06-22 05:28:57
数据库启动报ORA-00704
SQL> startup ORACLE instance started. Total System Global Area 523108352 bytes Fixed Size 1346052 bytes Variable Size 448792060 bytes Database Buffers 67108864 bytes Redo Buffers 5861376 bytes Database mounted. ORA-01092: ORACLE instance terminated. Disconnection forced ORA-00704: bootstrap process failure ORA-00604: error occurred at recursive SQL level 1 ORA-00942: table or view does not exist Process ID: 1782 Session ID: 125 Serial number: 5
找出ORA-00704报错原因
SQL> conn / as sysdba Connected to an idle instance. SQL> startup mount; ORACLE instance started. Total System Global Area 523108352 bytes Fixed Size 1346052 bytes Variable Size 448792060 bytes Database Buffers 67108864 bytes Redo Buffers 5861376 bytes Database mounted. SQL> oradebug setmypid Statement processed. SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12 Statement processed. SQL> oradebug TRACEFILE_NAME /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_2010.trc SQL> alter database open; alter database open * ERROR at line 1: ORA-01092: ORACLE instance terminated. Disconnection forced ORA-00704: bootstrap process failure ORA-00604: error occurred at recursive SQL level 1 ORA-00942: table or view does not exist Process ID: 2010 Session ID: 125 Serial number: 5
查看trace文件发现
PARSE ERROR #3063868604:len=56 dep=1 uid=0 oct=3 lid=0 tim=1340312320595472 err=942 select order#,columns,types from access$ where d_obj#=:1 ORA-00704: bootstrap process failure ORA-00604: error occurred at recursive SQL level 1 ORA-00942: table or view does not exist ORA-00704: bootstrap process failure ORA-00604: error occurred at recursive SQL level 1 ORA-00942: table or view does not exist *** 2012-06-22 04:58:40.596 USER (ospid: 2010): terminating the instance due to error 704
启动数据库至upgrade模式
SQL> startup upgrade ORACLE instance started. Total System Global Area 523108352 bytes Fixed Size 1346052 bytes Variable Size 448792060 bytes Database Buffers 67108864 bytes Redo Buffers 5861376 bytes Database mounted. Database opened.
创建access$表和index
SQL> create table access$ 2 ( d_obj# number not null, 3 order# number not null, 4 columns raw(126), 5 types number not null) 6 storage (initial 10k next 100k maxextents unlimited pctincrease 0) 7 / Table created. SQL> create index i_access1 on 2 access$(d_obj#, order#) 3 storage (initial 10k next 100k maxextents unlimited pctincrease 0) 4 / Index created. --创建语句可以在?\RDBMS\ADMIN\dcore.bsq中找到
重启数据库
SQL> shutdown immediate; Database closed. Database dismounted. ORACLE instance shut down. SQL> startup ORACLE instance started. Total System Global Area 523108352 bytes Fixed Size 1346052 bytes Variable Size 448792060 bytes Database Buffers 67108864 bytes Redo Buffers 5861376 bytes Database mounted. Database opened.
access$表作用(感谢vmcd同学提供)
When a database object is first referenced in a PL/SQL program, the PL/SQL engine checks the ACCESS$ table (owned by SYS) to see if the executor of the program has authority on that database object.
对于access$表丢失以前记录是否对系统产生严重影响还未知,希望知道的朋友告知下