Exadata存储节点image升级,patch_check_prereq阶段报错

发布时间 2023-06-09 21:53:59作者: 石云华

1、某客户有一台Exadata X4-2,当前的image版本为11.2.3.3.1,计划将image版本升级到18.1.34.0.0。当针对存储节点执行升级前的预升级检查工作时报错。具体如下所示:

[root@dm01dbadm01 patch_18.1.34.0.0.210717]# ./patchmgr -cells cell_group -patch_check_prereq -rolling

 

2023-05-13 11:36:42 +0800        :Working: DO: Check cells have ssh equivalence for root user. Up to 10 seconds per cell ...

2023-05-13 11:36:43 +0800        :SUCCESS: DONE: Check cells have ssh equivalence for root user.

2023-05-13 11:36:46 +0800        :Working: DO: Initialize files. Up to 1 minute ...

2023-05-13 11:36:47 +0800        :Working: DO: Setup work directory

2023-05-13 11:36:48 +0800        :SUCCESS: DONE: Setup work directory

2023-05-13 11:36:50 +0800        :SUCCESS: DONE: Initialize files.

2023-05-13 11:36:50 +0800        :Working: DO: Copy, extract prerequisite check archive to cells. If required start md11 mismatched partner size correction. Up to 40 minutes ...

2023-05-13 11:37:04 +0800        :INFO   : Wait correction of degraded md11 due to md partner size mismatch. Up to 30 minutes.

2023-05-13 11:37:05 +0800        :SUCCESS: DONE: Copy, extract prerequisite check archive to cells. If required start md11 mismatched partner size correction.

2023-05-13 11:37:05 +0800        :Working: DO: Check space and state of cell services. Up to 20 minutes ...

FAILED for following cells

dm01celadm01:  dm01celadm01 192.168.1.164 2023-05-13 12:01:04 +0800:

2023-05-13 11:37:25 +0800        :FAILED : For details, check the following files in the /u01/soft/18.1.34/cell/patch_18.1.34.0.0.210717:

2023-05-13 11:37:25 +0800        :FAILED :  - <cell_name>.log

2023-05-13 11:37:25 +0800        :FAILED :  - patchmgr.stdout

2023-05-13 11:37:25 +0800        :FAILED :  - patchmgr.stderr

2023-05-13 11:37:25 +0800        :FAILED :  - patchmgr.log

2023-05-13 11:37:25 +0800        :FAILED : DONE: Check space and state of cell services.

2023-05-13 11:39:06 +0800        :Working: DO: Check prerequisites on all cells. Up to 2 minutes ...

2023-05-13 11:39:15 +0800        :SUCCESS: DONE: Check prerequisites on all cells.

2023-05-13 11:39:15 +0800        :Working: DO: Execute plugin check for Patch Check Prereq ...

2023-05-13 11:39:15 +0800        :INFO   : Patchmgr plugin start: Prereq check for exposure to bug 22909764 v1.0.

2023-05-13 11:39:15 +0800        :INFO   : Details in logfile /u01/soft/18.1.34/cell/patch_18.1.34.0.0.210717/patchmgr.stdout.

2023-05-13 11:39:15 +0800        :INFO   : Patchmgr plugin start: Prereq check for exposure to bug 17854520 v1.3.

2023-05-13 11:39:15 +0800        :INFO   : Details in logfile /u01/soft/18.1.34/cell/patch_18.1.34.0.0.210717/patchmgr.stdout.

2023-05-13 11:39:16 +0800        :WARNING: ACTION REQUIRED: Cells to be upgraded pass version check, however other cells not being upgraded may be at version 11.2.3.1.x or 11.2.3.2.x, exposing the system to bug 17854520.  Manually check other cells for version 11.2.3.1.x or 11.2.3.2.x.

2023-05-13 11:39:16 +0800        :INFO   : Checking database homes for remote db nodes with oracle-user ssh equivalence to the local system.

2023-05-13 11:39:16 +0800        :INFO   : Database homes that exist only on remote nodes must be checked manually.

2023-05-13 11:39:19 +0800        :SUCCESS: Patchmgr plugin complete: Prereq check passed - no exposure to bug 17854520

2023-05-13 11:39:19 +0800        :INFO   : Patchmgr plugin start: Prereq check for exposure to bug 22468216 v1.0.

2023-05-13 11:39:19 +0800        :INFO   : Details in logfile /u01/soft/18.1.34/cell/patch_18.1.34.0.0.210717/patchmgr.stdout.

2023-05-13 11:39:19 +0800        :SUCCESS: Patchmgr plugin complete: Prereq check passed for the bug 22468216

2023-05-13 11:39:19 +0800        :INFO   : Patchmgr plugin start: Prereq check for exposure to bug 24625612 v1.0.

2023-05-13 11:39:19 +0800        :INFO   : Details in logfile /u01/soft/18.1.34/cell/patch_18.1.34.0.0.210717/patchmgr.stdout.

2023-05-13 11:39:19 +0800        :SUCCESS: Patchmgr plugin complete: Prereq check passed for the bug 24625612

2023-05-13 11:39:19 +0800        :INFO   : Patchmgr plugin start: Prereq check for exposure to bug 22896791 v1.0.

2023-05-13 11:39:19 +0800        :INFO   : Details in logfile /u01/soft/18.1.34/cell/patch_18.1.34.0.0.210717/patchmgr.stdout.

2023-05-13 11:39:20 +0800        :SUCCESS: Patchmgr plugin complete: Prereq check passed for the bug 22896791

2023-05-13 11:39:20 +0800        :INFO   : Patchmgr plugin start: Prereq check for exposure to bug 22651315 v1.0.

2023-05-13 11:39:20 +0800        :INFO   : Details in logfile /u01/soft/18.1.34/cell/patch_18.1.34.0.0.210717/patchmgr.stdout.

2023-05-13 11:39:21 +0800        :SUCCESS: Patchmgr plugin complete: Prereq check passed for the bug 22651315

2023-05-13 11:39:22 +0800        :SUCCESS: DONE: Execute plugin check for Patch Check Prereq.

2023-05-13 11:39:22 +0800        :Working: DO: Check ASM deactivation outcome. Up to 1 minute ...

2023-05-13 11:39:33 +0800        :SUCCESS: DONE: Check ASM deactivation outcome.

2023-05-13 11:39:33 +0800        :FAILED : Prerequisite checks failed.

2023-05-13 11:39:33 +0800        :ERROR  : Patch prerequisite checks failed. Please run cleanup before retrying.

 

[root@dm01dbadm01 patch_18.1.34.0.0.210717]#

升级前的预升级检查工作时,提示存储节点dm01celadm01,具体的错误原因,需要查看/u01/soft/18.1.34/cell/patch_18.1.34.0.0.210717目录下的<cell_name>.log和patchmgr.log等日志文件。

2、查看相关的日志文件,查找错误原因。

[root@dm01dbadm01 patch_18.1.34.0.0.210717]# view dm01celadm01.log

dm01celadm01:

dm01celadm01: 2023-05-13 10:20:59 +0800 [INFO]: patchmgr launch attempt from dm01dbadm01.nmgmzdb.com_192.168.1.162_u01_soft_18.1.34_cell_patch_18.1.34.0.0.210717.

dm01celadm01: 2023-05-13 10:20:59 +0800 [INFO]: dostep called: 4p patch_prereq is_rolling 900 3600 900 36000 default

dm01celadm01: 2023-05-13 10:20:59 +0800 [INFO]: patchmgr launched from dm01dbadm01.nmgmzdb.com_192.168.1.162_u01_soft_18.1.34_cell_patch_18.1.34.0.0.210717

dm01celadm01: _EXIT_PASS_Cell dm01celadm01 192.168.1.164 2023-05-13 10:21:00 +0800:

dm01celadm01:

dm01celadm01: 2023-05-13 10:21:11 +0800 [INFO]: patchmgr launch attempt from dm01dbadm01.nmgmzdb.com_192.168.1.162_u01_soft_18.1.34_cell_patch_18.1.34.0.0.210717.

dm01celadm01: 2023-05-13 10:21:11 +0800 [INFO]: dostep called: 4p_md11_fix_pw patch_prereq is_rolling 900 3600 900 36000 default

dm01celadm01: _EXIT_PASS_Cell dm01celadm01 192.168.1.164 2023-05-13 10:21:11 +0800:

dm01celadm01: Cell dm01celadm01 192.168.1.164

dm01celadm01: _EXIT_PASS_Cell dm01celadm01 192.168.1.164 2023-05-13 10:21:11 +0800:

dm01celadm01:

dm01celadm01: 2023-05-13 10:21:12 +0800 [INFO]: patchmgr launch attempt from dm01dbadm01.nmgmzdb.com_192.168.1.162_u01_soft_18.1.34_cell_patch_18.1.34.0.0.210717.

dm01celadm01: 2023-05-13 10:21:12 +0800 [INFO]: dostep called: prechk:no:1710 patch_prereq is_rolling 900 3600 900 36000 default noforce

dm01celadm01: _EXIT_ERROR_Cell dm01celadm01 192.168.1.164 2023-05-13 11:02:51 +0800:

dm01celadm01:

dm01celadm01: [INFO] Free space in /boot (/dev/md4) before clean up is 58MB

dm01celadm01: [INFO] Kernel version running: 2.6.39-400.128.17.el5uek

dm01celadm01: [INFO] Full kernel version : 2.6.39-400.128.17.el5uek

dm01celadm01: [INFO] Kernel version in /opt/oracle.cellos/image.id: 2.6.39-400.128.17.el5uek

dm01celadm01: [INFO] Full kernel version : 2.6.39-400.128.17.el5uek

dm01celadm01: [INFO] Total space in /boot which will be freed up after removal of

dm01celadm01: initrd-2.6.39-400.128.17.el5uekkdump.img is 10MB

dm01celadm01: [INFO] Free space in /boot (/dev/md4) after clean up is 69MB

dm01celadm01: [INFO] Size for all files for the kernel 2.6.39-400.128.17.el5uek in /boot is 14MB

dm01celadm01: [INFO] Required free space on /boot is 28MB

dm01celadm01: [INFO][check_allowed_version] Target: 18.1.34.0.0.210717 Current: 11.2.3.3.1.140708 Patch or rollback: 1710 Rolling: is_rolling

dm01celadm01: [ERROR] Can not continue. Runtime configuration is not consistent with values configured in /opt/oracle.cellos/cell.conf.

dm01celadm01: [ERROR] Run ipconf to correct the inconsistencies. Failed check: /root/_cellupd_dpullec_/_p_/ipconf -check-consistency -at-runtime -semantic -verbose

dm01celadm01: [ERROR] Details:

dm01celadm01: Checking DNS server on 192.168.1.116                                                              : FAILED

dm01celadm01: Checking DNS server on 192.168.2.202                                                               : FAILED

dm01celadm01: Check that server on 192.168.1.116 responds to the NTP requests                                   : FAILED

dm01celadm01: Check that server on 192.168.2.202 responds to the NTP requests                                    : FAILED

dm01celadm01: [Info]: Consistency check FAILED

dm01celadm01: Cell dm01celadm01 192.168.1.164

dm01celadm01: _EXIT_ERROR_Cell dm01celadm01 192.168.1.164 2023-05-13 11:02:51 +0800:

dm01celadm01:  dm01celadm01 192.168.1.164 2023-05-13 11:02:51 +0800:

dm01celadm01: 2023-05-13 12:00:46 +0800 [INFO]: dostep called: prechk:no:1710 patch_prereq is_rolling 900 3600 900 36000 default noforce

dm01celadm01: _EXIT_ERROR_Cell dm01celadm01 192.168.1.164 2023-05-13 12:01:04 +0800:

dm01celadm01:

从日志文件中可以看出,升级前的预检查工作会检测DNS和NTP配置是否正常,但当前的检测全部失败。经过了解,客户刚部署时使用了DNS和NTP,但后期DNS和NTP服务器损坏,所以当前的这套Exadata环境未再使用DNS和NTP。

3、了解报错原因后,可以调用ipconf工具,置空该存储节点的DNS和NTP。

[root@dm01celadm01 etc]# cellcli -e alter cell shutdown services all

 

Stopping the RS, CELLSRV, and MS services...

The SHUTDOWN of services was successful.

[root@dm01celadm01 etc]#

[root@dm01celadm01 etc]#

[root@dm01celadm01 etc]#

[root@dm01celadm01 etc]#

[root@dm01celadm01 etc]# ipconf

Logging started to /var/log/cellos/ipconf.log

Interface ib0 is Linked.  hca: mlx4_0

Interface ib1 is Linked.  hca: mlx4_0

Interface eth0 is Linked.  driver/mac: ixgbe/00:10:e0:56:70:e2

Interface eth1 is ... Unlinked.  driver/mac: ixgbe/00:10:e0:56:70:e3

Interface eth2 is ... Unlinked.  driver/mac: ixgbe/00:10:e0:56:70:e4

Interface eth3 is ... Unlinked.  driver/mac: ixgbe/00:10:e0:56:70:e5

 

Network interfaces

Name     State    

ib0      Linked   

ib1      Linked   

eth0     Linked   

eth1     Unlinked 

eth2     Unlinked 

eth3     Unlinked 

Warning. Some network interface(s) are disconnected. Check cables and swicthes and retry

Do you want to retry (y/n) [y]: n

 

The current nameserver(s): 192.168.1.116 192.168.2.202

Do you want to change it (y/n) [n]: y

Nameserver:

Add more nameservers (y/n) [n]: n

The current timezone: Asia/Shanghai

Do you want to change it (y/n) [n]:

The current NTP server(s): 192.168.1.116 192.168.2.202

Do you want to change it (y/n) [n]: y

Fully qualified hostname or ip address for NTP server. Press enter if none:

Continue adding more ntp servers (y/n) [n]: n

 

Network interfaces

 

4、再次运行升级前的预检查工作

[root@dm01dbadm01 patch_18.1.34.0.0.210717]# ./patchmgr -cells cell_group -patch_check_prereq -rolling

 

2023-05-13 12:09:13 +0800        :Working: DO: Check cells have ssh equivalence for root user. Up to 10 seconds per cell ...

2023-05-13 12:09:14 +0800        :SUCCESS: DONE: Check cells have ssh equivalence for root user.

2023-05-13 12:09:17 +0800        :Working: DO: Initialize files. Up to 1 minute ...

2023-05-13 12:09:17 +0800        :Working: DO: Setup work directory

2023-05-13 12:09:19 +0800        :SUCCESS: DONE: Setup work directory

2023-05-13 12:09:21 +0800        :SUCCESS: DONE: Initialize files.

2023-05-13 12:09:21 +0800        :Working: DO: Copy, extract prerequisite check archive to cells. If required start md11 mismatched partner size correction. Up to 40 minutes ...

2023-05-13 12:09:35 +0800        :INFO   : Wait correction of degraded md11 due to md partner size mismatch. Up to 30 minutes.

2023-05-13 12:09:36 +0800        :SUCCESS: DONE: Copy, extract prerequisite check archive to cells. If required start md11 mismatched partner size correction.

2023-05-13 12:09:36 +0800        :Working: DO: Check space and state of cell services. Up to 20 minutes ...

2023-05-13 12:10:10 +0800        :SUCCESS: DONE: Check space and state of cell services.

2023-05-13 12:10:10 +0800        :Working: DO: Check prerequisites on all cells. Up to 2 minutes ...

2023-05-13 12:10:20 +0800        :SUCCESS: DONE: Check prerequisites on all cells.

2023-05-13 12:10:20 +0800        :Working: DO: Execute plugin check for Patch Check Prereq ...

2023-05-13 12:10:20 +0800        :INFO   : Patchmgr plugin start: Prereq check for exposure to bug 22909764 v1.0.

2023-05-13 12:10:20 +0800        :INFO   : Details in logfile /u01/soft/18.1.34/cell/patch_18.1.34.0.0.210717/patchmgr.stdout.

2023-05-13 12:10:20 +0800        :INFO   : Patchmgr plugin start: Prereq check for exposure to bug 17854520 v1.3.

2023-05-13 12:10:20 +0800        :INFO   : Details in logfile /u01/soft/18.1.34/cell/patch_18.1.34.0.0.210717/patchmgr.stdout.

2023-05-13 12:10:21 +0800        :WARNING: ACTION REQUIRED: Cells to be upgraded pass version check, however other cells not being upgraded may be at version 11.2.3.1.x or 11.2.3.2.x, exposing the system to bug 17854520.  Manually check other cells for version 11.2.3.1.x or 11.2.3.2.x.

2023-05-13 12:10:21 +0800        :INFO   : Checking database homes for remote db nodes with oracle-user ssh equivalence to the local system.

2023-05-13 12:10:21 +0800        :INFO   : Database homes that exist only on remote nodes must be checked manually.

2023-05-13 12:10:23 +0800        :SUCCESS: Patchmgr plugin complete: Prereq check passed - no exposure to bug 17854520

2023-05-13 12:10:23 +0800        :INFO   : Patchmgr plugin start: Prereq check for exposure to bug 22468216 v1.0.

2023-05-13 12:10:23 +0800        :INFO   : Details in logfile /u01/soft/18.1.34/cell/patch_18.1.34.0.0.210717/patchmgr.stdout.

2023-05-13 12:10:24 +0800        :SUCCESS: Patchmgr plugin complete: Prereq check passed for the bug 22468216

2023-05-13 12:10:24 +0800        :INFO   : Patchmgr plugin start: Prereq check for exposure to bug 24625612 v1.0.

2023-05-13 12:10:24 +0800        :INFO   : Details in logfile /u01/soft/18.1.34/cell/patch_18.1.34.0.0.210717/patchmgr.stdout.

2023-05-13 12:10:24 +0800        :SUCCESS: Patchmgr plugin complete: Prereq check passed for the bug 24625612

2023-05-13 12:10:24 +0800        :INFO   : Patchmgr plugin start: Prereq check for exposure to bug 22896791 v1.0.

2023-05-13 12:10:24 +0800        :INFO   : Details in logfile /u01/soft/18.1.34/cell/patch_18.1.34.0.0.210717/patchmgr.stdout.

2023-05-13 12:10:25 +0800        :SUCCESS: Patchmgr plugin complete: Prereq check passed for the bug 22896791

2023-05-13 12:10:25 +0800        :INFO   : Patchmgr plugin start: Prereq check for exposure to bug 22651315 v1.0.

2023-05-13 12:10:25 +0800        :INFO   : Details in logfile /u01/soft/18.1.34/cell/patch_18.1.34.0.0.210717/patchmgr.stdout.

2023-05-13 12:10:26 +0800        :SUCCESS: Patchmgr plugin complete: Prereq check passed for the bug 22651315

2023-05-13 12:10:27 +0800        :SUCCESS: DONE: Execute plugin check for Patch Check Prereq.

2023-05-13 12:10:27 +0800        :Working: DO: Check ASM deactivation outcome. Up to 1 minute ...

2023-05-13 12:10:37 +0800        :SUCCESS: DONE: Check ASM deactivation outcome.

 

[root@dm01dbadm01 patch_18.1.34.0.0.210717]#

可见,重新设置该存储节点的DNS和NTP之后,升级前的预检查工作。后续才可以继续升级工作。