11.2.0.4 打patch遭遇extjobO: Operation not permitted处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:11.2.0.4 打patch遭遇extjobO: Operation not permitted处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有一段时间没有给11.2.0.4打psu补丁了,今天晚上在给一个客户打的过程中发现类似错误

OPatch found the word "error" in the stderr of the make command.
Please look at this stderr. You can re-run this make command.
Stderr output:
chmod: changing permissions of `/u01/app/oracle/product/11.2.0/db_1/bin/extjobO': Operation not permitted
make: [iextjob] Error 1 (ignored)


Composite patch 31537677 successfully applied.
OPatch Session completed with warnings.
Log file location: /u01/app/oracle/product/11.2.0/db_1/cfgtoollogs/opatch/opatch2021-04-16_23-11-48PM_1.log

OPatch completed with warnings.

对比extjob权限信息

[oracle@localhost trace]$ ls -l /u01/app/oracle/product/11.2.0/db_1/bin/extjob*
-rwxr-xr-x 1 oracle oinstall 1254496 Apr 16 23:15 /u01/app/oracle/product/11.2.0/db_1/bin/extjob
-rwx------ 1 oracle oinstall 1254496 Apr 16 23:15 /u01/app/oracle/product/11.2.0/db_1/bin/extjobo
-rwsr-x--- 1 root   oinstall 1254244 Feb 13  2019 /u01/app/oracle/product/11.2.0/db_1/bin/extjobO
-rw------- 1 oracle oinstall 1254244 Feb 13  2019 /u01/app/oracle/product/11.2.0/db_1/bin/extjoboO

证明新的extjob程序确实权限不对,不过根据官方的建议可以忽略这个,参考link:

https://updates.oracle.com/Orion/Services/download?type=readme&aru=20775452

Applying Proactive Bundle / PSU Patch fails with Error: “chmod: changing permissions of `$ORACLE_HOME/bin/extjobO’: Operation not permitted” (Doc ID 2265726.1)
给出来的理由是执行root.sh之后权限会变成正常的,另外这个external jobs基本上很少人会使用到,尝试执行root.sh修复

[root@localhost home]# /u01/app/oracle/product/11.2.0/db_1/root.sh
Performing root user operation for Oracle 11g 

The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /u01/app/oracle/product/11.2.0/db_1

Enter the full pathname of the local bin directory: [/usr/local/bin]: 
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Finished product-specific root actions.
[root@localhost home]# ls -l  /u01/app/oracle/product/11.2.0/db_1/bin/extjob*
-rwsr-x--- 1 root   oinstall 1254496 Apr 16 23:15 /u01/app/oracle/product/11.2.0/db_1/bin/extjob
-rwx------ 1 oracle oinstall 1254496 Apr 16 23:15 /u01/app/oracle/product/11.2.0/db_1/bin/extjobo
-rwsr-x--- 1 root   oinstall 1254244 Feb 13  2019 /u01/app/oracle/product/11.2.0/db_1/bin/extjobO
-rw------- 1 oracle oinstall 1254244 Feb 13  2019 /u01/app/oracle/product/11.2.0/db_1/bin/extjoboO

确实通过执行root.sh修复该问题.

多cpu环境中运行root.sh失败,asm报ORA-04031

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:多cpu环境中运行root.sh失败,asm报ORA-04031

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友和我反馈,说他们在装linux 6.5上面装11.2.0.3的rac出现异常,root.sh在第一个节点执行就失败了,请求帮助
root.sh-asm-fail


根据上面记录,查看asmca日志

[main] [ 2015-07-24 12:49:35.885 CST ] [SQLEngine.reInitialize:738]  Reinitializing SQLEngine...
[main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:889]  OracleHome.getVersion called.  Current Version: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:957]  Current Version From Inventory: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:889]  OracleHome.getVersion called.  Current Version: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:957]  Current Version From Inventory: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:889]  OracleHome.getVersion called.  Current Version: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:957]  Current Version From Inventory: 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.886 CST ] [SQLPlusEngine.getCmmdParams:222]  m_home 11.2.0.3.0
[main] [ 2015-07-24 12:49:35.887 CST ] [SQLPlusEngine.getCmmdParams:223]  version > 112 true
[main] [ 2015-07-24 12:49:35.887 CST ] [SQLEngine.getEnvParams:555]  Default NLS_LANG: AMERICAN_AMERICA.AL32UTF8
[main] [ 2015-07-24 12:49:35.887 CST ] [SQLEngine.getEnvParams:565]  NLS_LANG: AMERICAN_AMERICA.AL32UTF8
[main] [ 2015-07-24 12:49:35.888 CST ] [SQLEngine.initialize:325]  Execing SQLPLUS/SVRMGR process...
[main] [ 2015-07-24 12:49:35.900 CST ] [SQLEngine.initialize:362]  m_bReaderStarted: false
[main] [ 2015-07-24 12:49:35.900 CST ] [SQLEngine.initialize:366]  Starting Reader Thread...
[main] [ 2015-07-24 12:49:35.901 CST ] [SQLEngine.initialize:415]  Waiting for m_bReaderStarted to be true
[main] [ 2015-07-24 12:49:35.972 CST ] [SQLEngine.done:2189]  Done called
[main] [ 2015-07-24 12:49:35.972 CST ] [UsmcaLogger.logException:173]  SEVERE:method oracle.sysman.assistants.usmca.backend.USMInstance:configureLocalASM
[main] [ 2015-07-24 12:49:35.973 CST ] [UsmcaLogger.logException:174]  ORA-01012: not logged on
[main] [ 2015-07-24 12:49:35.973 CST ] [UsmcaLogger.logException:175]  oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-01012: not logged on
oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeImpl(SQLEngine.java:1658)
oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeQuery(SQLEngine.java:831)
oracle.sysman.assistants.usmca.backend.USMInstance.configureLocalASM(USMInstance.java:3036)
oracle.sysman.assistants.usmca.service.UsmcaService.configureLocalASM(UsmcaService.java:1049)
oracle.sysman.assistants.usmca.model.UsmcaModel.performConfigureLocalASM(UsmcaModel.java:944)
oracle.sysman.assistants.usmca.model.UsmcaModel.performOperation(UsmcaModel.java:797)
oracle.sysman.assistants.usmca.Usmca.execute(Usmca.java:174)
oracle.sysman.assistants.usmca.Usmca.main(Usmca.java:369)
[main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:173]  SEVERE:method oracle.sysman.assistants.usmca.backend.USMInstance:configureLocalASM
[main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:174]  ORA-03113: end-of-file on communication channel
[main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:175]  oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-03113: end-of-file on communication channel

这里可以看出来,asm实例无法登陆(ORA-01012和ORA-03113),根据这样的错误,分析asm日志

Reconfiguration complete
Fri Jul 24 12:49:29 2015
LCK0 started with pid=22, OS id=46913
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lmd0_46887.trc  (incident=81):
ORA-04031: unable to allocate 7072 bytes of shared memory ("shared pool","unknown object","sga heap(1,1)","ges resource ")
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_81/+ASM1_lmd0_46887_i81.trc
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lck0_46913.trc  (incident=177):
ORA-04031: unable to allocate 760 bytes of shared memory ("shared pool","unknown object","KKSSP^1343","kglss")
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_177/+ASM1_lck0_46913_i177.trc
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lmon_46885.trc  (incident=73):
ORA-04031: unable to allocate 632 bytes of shared memory ("shared pool","unknown object","sga heap(1,1)","name-service ")
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_73/+ASM1_lmon_46885_i73.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lck0_46913.trc:
ORA-04031: unable to allocate 760 bytes of shared memory ("shared pool","unknown object","KKSSP^1343","kglss")
System state dump requested by (instance=1, osid=46913 (LCK0)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_diag_46879.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
LCK0 (ospid: 46913): terminating the instance due to error 4031
Fri Jul 24 12:49:35 2015
ORA-1092 : opitsk aborting process
Instance terminated by LCK0, pid = 46913

进一步分析asm日志,发现是大家熟悉的asm的ORA-4031问题,那就是说明数据库在执行root.sh的时候使用默认参数文件启动asm的时候shared pool不够大(根据ORACLE最佳实践,建议memory_target=1536M及其以上值),从而出现该问题。类似Bug 14292825 ORA-4031 in ASM as default memory parameters values for 11.2 ASM instances low,根据官方描述该问题在11.2.0.4中修复
BUG-14292825


通过asm日志发现相关默认值配置

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options.
ORACLE_HOME = /u01/app/11.2.0/grid
System name:	Linux
Node name:	RAC01
Release:	2.6.32-358.el6.x86_64
Version:	#1 SMP Tue Jan 29 11:47:41 EST 2013
Machine:	x86_64
Using parameter settings in client-side pfile /u01/app/11.2.0/grid/dbs/init+ASM1.ora on machine RAC01
System parameters with non-default values:
  large_pool_size          = 16M
  instance_type            = "asm"
  remote_login_passwordfile= "EXCLUSIVE"
  asm_power_limit          = 1
  diagnostic_dest          = "/u01/app/grid"
Cluster communication is configured to use the following interface(s) for this instance
  10.10.10.31
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Fri Jul 24 12:49:27 2015

通过查询/proc/cpuinfo,检查cpu数量

processor	: 191
vendor_id	: GenuineIntel
cpu family	: 6
model		: 62
model name	: Intel(R) Xeon(R) CPU E7-8850 v2 @ 2.30GHz
stepping	: 7
cpu MHz		: 1200.000
cache size	: 24576 KB
physical id	: 7
siblings	: 24
core id		: 13
cpu cores	: 12
apicid		: 251
initial apicid	: 251
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes

而根据How To Determine The Default Number Of Subpools Allocated During Startup (Doc ID 455179.1)中描述
最多7个subpool(这里一共有192个cpu,因此subpool数量为7)
1


每个suppool最少512m内存,因此shared pool最小需要3.5G(而默认值几百M,远远不够)
2


由于cpu多,导致shared pool的Subpools 更加多,使得shared pool的需求量更加大。至此本次故障原因可以总结:
由于cpu较多,需要更多的shared pool,而11.2.0.3中由于asm默认内存分配较少,导致在asm启动之时出现shared pool不足(本身默认值小,而且shared pool需求大,从而出现了ORA-04031就不奇怪了),因为运行root.sh过程中asm无法正常启动,从而使得root.sh运行失败。
处理办法:临时disable部分cpu,然后重新执行root.sh,修改asm内存分配,再enable cpu.
特别说明:此故障acs的兄弟遇到过,所以这次我能够快速反应,感谢acs兄弟们的帮忙,另外有权限的朋友可以看看:3-10479952701和3-7976215751等sr描述