麒麟v10 SP3上的19c rac,optachauto安装补丁出错

发布时间 2023-10-16 14:52:01作者: 石云华

1、麒麟V10 SP3上新安装的一套19c RAC,在使用opatchauto打补丁时报错,具体信息如下所示。

[root@db01 soft]# /u01/app/19.3.0/grid/OPatch/opatchauto apply /soft/35037840/

 

OPatchauto session is initiated at Tue 0ct 10 11:04:45 2023

 

System initialization log file is /u01/app/19.3.0/grid/cfgtoollogs/opatchautodb/systemconfig2023-10-10 11-04-45AM.log

 

OPATCHAUTO-72050:Systeminstance creation failed.

OPATCHAUT0-72050:Failed while retrieving system information.

OPATCHAUT0-72050:please check log file for more details .

oPatchauto session completed at Tue 0ct 10 11:05:20 2023

Time taken to complete the session 0 minute, 35 seconds

Topology creation failed.

 

2、依据命令行的错误提示来看,是在获取系统信息时出现了错误,导致opatchauto失败,需要查看对应的log日志,截取opatchauto生成的log日志中出现的错误日志,具体如下所示。

2023-10-10 11:04:56,298 SEVERE [1] com.oracle.glcm.patch.auto.db.integration.model.productsupport.topology.TopologyCreator - Not able to retrieve system instance details :: Unable to determine if "/u01/app/19.3.0/grid" is a shared oracle home.

Failed:

Verification of shared storage accessibility was unsuccessful on all the specified nodes.

NODE_STATUS::db02:EFAIL

The result of cluvfy command contain EFAIL NODE_STATUS::db02:EFAIL

……

2023-10-10 11:04:56,298 SEVERE [1] com.oracle.glcm.patch.auto.db.integration.model.productsupport.topology.TopologyCreator - Failure reason::java.lang.Exception: The result of cluvfy command contain EFAIL NODE_STATUS::db02:EFAIL

从log日志中的错误信息可以看出,opatchauto报错,是因为opatchauto时会自动执行cluvfy命令来检测整个集群的状态,而调用cluvfy命令检测集群状态时,在检测共享存储的访问性这项时出错,无法确认GRID_HOME是否是共享的,所以最终导致opatchauto出错。

 

3、手动调用cluvfy命令检测集群的存储状态,检测结果如下所示。

[grid@db01 ~]$ cluvfy comp ssa -n all -verbose

 

Verification of shared storage accessibility was unsuccessful on all the specified nodes.

 

CVU operation performed:       shared storage accessibility

Date:                          0ct 10,2023 11:43:32 AM

CVU home :                     /u01/app/19.3.0/grid/

User:                          grid

[grid@db01 ~]$

 

4、此时,只能针对cluvfy命令开启DEBUG模式, 获取cluvfy命令的更加详细的日志信息。当前是19cRAC,方法如下。

[grid@db01~]$ rm -rf /tmp/cvutrace

[grid@db01~]$ mkdir /tmp/cvutrace

[grid@db01~]$ export CV_TRACELOC=/tmp/cvutrace

[grid@db01~]$ export SRVM_TRACE=true

[grid@db01~]$ export SRVM_TRACE_LEVEL=1

[grid@db01~]$ cluvfy comp ssa -n all -verbose

 

5、查看生成的cvutrace.log.0日志文件,搜索failed状态字。发现大量scp远程复制文件失败的日志,具体如何所示。

[Worker 3] [ 2023-10-10 11:47:24.701 CST ] [NativeSystem.isCmdScv:599]  isCmdScv: cmd=[/usr/bin/scp -p /tmp/CVU_19.0.0.0.0_grid/check_vip_restart_attempt.sh db02:'/tmp/CVU_19.0.0.0.0_grid//check_vip_restart_attempt.sh']

[Worker 3] [ 2023-10-10 11:47:24.701 CST ] [NativeSystem.isCmdScv:649]  isCmdScv: /usr/bin/scp is present.

[Worker 3] [ 2023-10-10 11:47:24.701 CST ] [NativeSystem.isCmdScv:651]  isCmdScv: /usr/bin/scp is a file.

[Worker 3] [ 2023-10-10 11:47:24.701 CST ] [NativeSystem.isCmdScv:668]  isCmdScv: returned true

[Worker 3] [ 2023-10-10 11:47:24.701 CST ] [NativeSystem.rununixcmd:1345]  NativeSystem.rununixcmd: RetString 1|Authorized users only. All activities may be monitored and reported. :successful

[Worker 3] [ 2023-10-10 11:47:24.701 CST ] [CopyCommand.execute:171]  CopyCommand.execute: native copyFile returns `1|Authorized users only. All activities may be monitored and reported. :successful'

[Worker 3] [ 2023-10-10 11:47:24.701 CST ] [NativeResult.<init>:93]  NativeResult: The String obtained is1|Authorized users only. All activities may be monitored and reported. :successful

[Worker 3] [ 2023-10-10 11:47:24.701 CST ] [NativeResult.<init>:101]  The status string is: 1

[Worker 3] [ 2023-10-10 11:47:24.701 CST ] [NativeResult.<init>:114]  The result string is: Authorized users only. All activities may be monitored and reported. :successful 1

[Worker 3] [ 2023-10-10 11:47:24.701 CST ] [CopyCommand.execute:179]  The copy command failed. Details:

Authorized users only. All activities may be monitored and reported. :successful

从日志可以看出,执行完scp命令后,返回的状态码为1,返回的命令结果为“The result string is: Authorized users only. All activities may be monitored and reported.”。

手动执行scp命令,文件远程复制成功后,也会返回这么一条记录。怀疑是不是因为这条多余的记录,导致命令执行结果检测异常?

 

6、命令执行完成后,返回The result string is: Authorized users only. All activities may be monitored and reported.这条记录,实际上是因为麒麟操作系统做的安全加固,取消该安全加固策略之后,手动调用cluvfy命令检测集群的存储状态,检测结果仍然报错。

 

7、再次开启cluvfy命令的DEBUG模式,在新生成的日志文件中发现如下错误日志。

[VerificationLogData.traceLogData:259]  ERROR: [sVerificationUtil.getUniqueDistributionID:548]  /bin/sh

[VerificationLogData.traceLogData:259]  ERROR: [sVerificationUtil.getUniqueDistributionID:552]  no package provides oraclelinux-release

[VerificationLogData.traceLogData:259]  ERROR: [sVerificationUtil.getUniqueDistributionID:548]  /bin/sh

[VerificationLogData.traceLogData:259]  ERROR: [sVerificationUtil.getUniqueDistributionID:552]  no package provides enterprise-release

[VerificationLogData.traceLogData:259]  ERROR: [sVerificationUtil.getUniqueDistributionID:548]  /bin/sh

[VerificationLogData.traceLogData:259]  ERROR: [sVerificationUtil.getUniqueDistributionID:552]  no package provides redhat-release

[VerificationLogData.traceLogData:259]  ERROR: [sVerificationUtil.getUniqueDistributionID:548]  /bin/sh

[VerificationLogData.traceLogData:259]  ERROR: [sVerificationUtil.getUniqueDistributionID:552]  package SLES-for-VMware-release is not installed

[VerificationLogData.traceLogData:259]  ERROR: [sVerificationUtil.getUniqueDistributionID:548]  /bin/sh

[VerificationLogData.traceLogData:259]  ERROR: [sVerificationUtil.getUniqueDistributionID:552]  package sles-release is not installed

[VerificationLogData.traceLogData:259]  ERROR: [sVerificationUtil.getUniqueDistributionID:548]  /bin/sh

[VerificationLogData.traceLogData:259]  ERROR: [sVerificationUtil.getUniqueDistributionID:552]  package asianux-release is not installed

[VerificationLogData.traceLogData:259]  ERROR: [Result.addErrorDescription:760]  PRVG-0282 : failed to retrieve the operating system distribution ID

从错误日志可以看出,无法获取操作系统的distribution ID。由于这是麒麟操作系统,所以只能设置CV_ASSUME_DISTID环境变量,给当前系统设置一个值。

export CV_ASSUME_DISTID=OL7

 

8、再次手动调用cluvfy命令检测集群的存储状态,检测结果终于成功。

 

9、重新执行opatchauto命令安装补丁,仍然出现新的错误,命令行的错误信息如下所示。

[root@db01 soft]# export CV_ASSUME_DISTID=OL7

[root@db01 soft]# /u01/app/19.3.0/grid/Patch/opatchauto apply /soft/35037840/

 

OPatchauto session is initiated at Tue 0ct 10 16:36:20 2023

 

System initialization log file is /u01/app/19.3.0/gr id/cfgtoollogs/opatchautodb/systemconfig2023-10-10 04-36-25PM.log

 

OPATCHAUTO-72035: Failed to create System Instance XML file.

OPATCHAUT0-72035:File creation failed due to permission.

OPATCHAUTO-72035: check user has permiss ion to create the file.

oPatchauto session completed at Tue 0ct 10 16:37:47 2023

Time taken to complete the session 1 minute, 22 seconds

Topology creation failed.

[root@dbo1 soft]#

这次opatchauto命令的错误提示是由于权限不够,导致文件无法创建成功。

 

10、查看systemconfig2023-10-10 04-36-25PM.log日志文件,没有明确的错误原因,只是在日志文件的最后有这么一个错误信息。

2023-10-10 12:44:54,498 SEVERE [1] com.oracle.glcm.patch.auto.db.integration.model.productsupport.topology.TopologyCreator - Not able to write system instance details

 

11、此时,只能使用strace命令跟踪opatchauto执行的过程。

# strace -f -T -tt -o /tmp/opatchauto.out /u01/app/19.3.0/grid/Patch/opatchauto apply /soft/35037840/

最终,发现是因为/tmp目录的权限出现问题,修改成777权限后,opatchauto成功执行。