Red Hat OpenShift Container Platform 4.6.25 update to 4.7.7 is failing when running on vSphere
Environment
- Red Hat OpenShift Container Platform (OCP) 4.6 and 4.7
- Red Hat Enterprise Linux 8.4
Issue
- Based on openshift-apiserver cluster-operator goes unavailable during cluster installation or upgrade from OCP 4.6 to 4.7 the issue with updating from OCP 4.6 to 4.7 should be solved but we are still seeing the same issue when updating to OCP 4.7.
- Update to OCP 4.7.7 is stuck with error
Unable to apply 4.7.7: the control plane is reporting an internal error
Resolution
OpenShift Container Platform
- This issue has been resolved in Red Hat OpenShift Container Platform 4.7.11 via RHBA-2021:1550 to address Red Hat OpenShift Container Platform installation done on vSphere. To raise questions or obtain further information, contact Red Hat Technical Support.
- In case updating to Red Hat OpenShift Container Platform 4.7.11 or later is not possible, the below workarounds can be applied.
Red Hat Enterprise Linux 8
- The issue has been resolved with
kernel-4.18.0-348.el8
in Red Hat Enterprise Linux 8.5 GA via RHSA-2021:4356. - The issue was tracked at private bug 1941714.
Red Hat Enterprise Linux 8.4.z
- The issue has been resolved with
kernel-4.18.0-305.7.1.el8_4
in Red Hat Enterprise Linux 8.4.z via RHSA-2021:2570. - The issue was tracked at private bug 1960702.
Workaround 1
-
When stuck in the update from Red Hat OpenShift Container Platform 4.6.25 to 4.7.7, applying the following workaround on all Red Hat OpenShift Container Platform - Node(s) should resolve the problem.
ethtool -K <primary-interface> tx-udp_tnl-segmentation off
ethtool -K <primary-interface> tx-udp_tnl-csum-segmentation off
-
Important this workaround is not persistent and will vanish during the next Red Hat OpenShift Container Platform - Node(s) reboot. It's therefore only recommended to be used, to unblock an upgrade that is stuck. For planned upgrades, workaround 2 is recommended to be used.
Workaround 2
- Before attempting to update from Red Hat OpenShift Container Platform 4.6.25 to 4.7.7, the below
MachineConfig
can be applied to all Red Hat OpenShift Container Platform - Node(s) to run theethtool
command during Red Hat OpenShift Container Platform - Node(s) start-up.
-
The below script needs to be put in place in
/etc/NetworkManager/dispatcher.d/99-vsphere-disable-tx-udp-tnl
on each Red Hat OpenShift Container Platform - Node (worker and Control-Plane) using theMachineConfig
as shown in step 2.#!/bin/bash # Workaround: # https://bugzilla.redhat.com/show_bug.cgi?id=1941714 # https://bugzilla.redhat.com/show_bug.cgi?id=1935539 # https://access.redhat.com/solutions/5997331 driver=$(nmcli -t -m tabular -f general.driver dev show "${DEVICE_IFACE}") if [[ "$2" == "up" && "${driver}" == "vmxnet3" && -f /usr/sbin/ethtool ]]; then logger -s "99-vsphere-disable-tx-udp-tnl triggered by ${2} on device ${DEVICE_IFACE}." ethtool -K ${DEVICE_IFACE} tx-udp_tnl-segmentation off ethtool -K ${DEVICE_IFACE} tx-udp_tnl-csum-segmentation off fi
-
The
MachineConfig
to be created should look as following for the Red Hat OpenShift Container Platform - Control-Plane Node(s).apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 99-vsphere-networking-fix-master spec: config: ignition: config: {} security: tls: {} timeouts: {} version: 3.1.0 networkd: {} passwd: {} storage: files: - contents: source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKIyBXb3JrYXJvdW5kOgojIGh0dHBzOi8vYnVnemlsbGEucmVkaGF0LmNvbS9zaG93X2J1Zy5jZ2k/aWQ9MTk0MTcxNAojIGh0dHBzOi8vYnVnemlsbGEucmVkaGF0LmNvbS9zaG93X2J1Zy5jZ2k/aWQ9MTkzNTUzOQojIGh0dHBzOi8vYWNjZXNzLnJlZGhhdC5jb20vc29sdXRpb25zLzU5OTczMzEKCmRyaXZlcj0kKG5tY2xpIC10IC1tIHRhYnVsYXIgLWYgZ2VuZXJhbC5kcml2ZXIgZGV2IHNob3cgIiR7REVWSUNFX0lGQUNFfSIpCgppZiBbWyAiJDIiID09ICJ1cCIgJiYgIiR7ZHJpdmVyfSIgPT0gInZteG5ldDMiICYmIC1mIC91c3Ivc2Jpbi9ldGh0b29sIF1dOyB0aGVuCiAgbG9nZ2VyIC1zICI5OS12c3BoZXJlLWRpc2FibGUtdHgtdWRwLXRubCB0cmlnZ2VyZWQgYnkgJHsyfSBvbiBkZXZpY2UgJHtERVZJQ0VfSUZBQ0V9LiIKICBldGh0b29sIC1LICR7REVWSUNFX0lGQUNFfSB0eC11ZHBfdG5sLXNlZ21lbnRhdGlvbiBvZmYKICBldGh0b29sIC1LICR7REVWSUNFX0lGQUNFfSB0eC11ZHBfdG5sLWNzdW0tc2VnbWVudGF0aW9uIG9mZgpmaQo= mode: 484 overwrite: true path: /etc/NetworkManager/dispatcher.d/99-vsphere-disable-tx-udp-tnl osImageURL: ""
-
More details about
MachineConfig
and how to apply them to all Red Hat OpenShift Container Platform - Node(s) can be found in Using MachineConfig objects to configure nodes. - The workaround 2 needs to remain in place even after a successful upgrade to Red Hat OpenShift Container Platform 4.7 completed and can only be removed, once RHBZ #1952358 is solved or Red Hat Technical Support does advise accordingly.
Installation of Red Hat OpenShift Container Platform 4.7
It is not recommended to install an older 4.7.z version. If possible, please install the latest errata. However, if it is required to install a 4.7.z version lower than 4.7.11 with platform
set to none
or bare metal installation method on vSphere hardware version greater than 13, it is then necessary to apply the below MachineConfig
manifest during the installation, following the procedure documented in Customizing nodes on day 1.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: master
name: 99-vsphere-networking-fix-master
spec:
config:
ignition:
config: {}
security:
tls: {}
timeouts: {}
version: 3.1.0
networkd: {}
passwd: {}
storage:
files:
- contents:
source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKIyBXb3JrYXJvdW5kOgojIGh0dHBzOi8vYnVnemlsbGEucmVkaGF0LmNvbS9zaG93X2J1Zy5jZ2k/aWQ9MTk0MTcxNAojIGh0dHBzOi8vYnVnemlsbGEucmVkaGF0LmNvbS9zaG93X2J1Zy5jZ2k/aWQ9MTkzNTUzOQojIGh0dHBzOi8vYWNjZXNzLnJlZGhhdC5jb20vc29sdXRpb25zLzU5OTczMzEKCmRyaXZlcj0kKG5tY2xpIC10IC1tIHRhYnVsYXIgLWYgZ2VuZXJhbC5kcml2ZXIgZGV2IHNob3cgIiR7REVWSUNFX0lGQUNFfSIpCgppZiBbWyAiJDIiID09ICJ1cCIgJiYgIiR7ZHJpdmVyfSIgPT0gInZteG5ldDMiICYmIC1mIC91c3Ivc2Jpbi9ldGh0b29sIF1dOyB0aGVuCiAgbG9nZ2VyIC1zICI5OS12c3BoZXJlLWRpc2FibGUtdHgtdWRwLXRubCB0cmlnZ2VyZWQgYnkgJHsyfSBvbiBkZXZpY2UgJHtERVZJQ0VfSUZBQ0V9LiIKICBldGh0b29sIC1LICR7REVWSUNFX0lGQUNFfSB0eC11ZHBfdG5sLXNlZ21lbnRhdGlvbiBvZmYKICBldGh0b29sIC1LICR7REVWSUNFX0lGQUNFfSB0eC11ZHBfdG5sLWNzdW0tc2VnbWVudGF0aW9uIG9mZgpmaQo=
mode: 484
overwrite: true
path: /etc/NetworkManager/dispatcher.d/99-vsphere-disable-tx-udp-tnl
osImageURL: ""
Important: To apply the change to all OpenShift Container Platform - Node(s) (worker and Control-Plane) two files need to be created, to cover the predefined roles (master
and worker
) created by the Red Hat OpenShift Container Platform 4 - Installer.
Root Cause
A change in Red Hat Enterprise Linux 8.3 in the vmxnet3
driver is causing VXLAN packages that are required for the Software Defined Network (SDN) to be dropped.
Diagnostic Steps
-
The Red Hat OpenShift Container Platform 4.6.25 to 4.7.7 update is stuck on vSphere and hardware version for virtual machine is not 13 or below showing the following error reported by
oc get clusterversion
:NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.25 True True 34h Unable to apply 4.7.7: the control plane is reporting an internal error
-
Various Red Hat OpenShift Container Platform - Cluster Operator are stuck in
degraded
state as shown below:$ oc get clusteroperator NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.7.7 False True True 33h baremetal 4.7.7 True False False 33h cloud-credential 4.7.7 True False False 88d cluster-autoscaler 4.7.7 True False False 88d config-operator 4.7.7 True False False 88d console 4.7.7 False False True 32h csi-snapshot-controller 4.7.7 True False False 125m dns 4.7.7 True False False 88d etcd 4.7.7 True False False 88d image-registry 4.7.7 True False True 33h ingress 4.7.7 True False False 28h insights 4.7.7 True False False 88d kube-apiserver 4.7.7 True False False 88d kube-controller-manager 4.7.7 True False False 88d kube-scheduler 4.7.7 True False False 88d kube-storage-version-migrator 4.7.7 True False False 75m machine-api 4.7.7 True False False 88d machine-approver 4.7.7 True False False 88d machine-config 4.7.7 False False True 32h marketplace 4.7.7 True False False 91m monitoring 4.7.7 False False True 33h network 4.7.7 True False False 88d node-tuning 4.7.7 True False False 33h openshift-apiserver 4.7.7 False False False 75m openshift-controller-manager 4.7.7 True False False 33h openshift-samples 4.7.7 True False False 33h operator-lifecycle-manager 4.7.7 True False False 88d operator-lifecycle-manager-catalog 4.7.7 True False False 88d operator-lifecycle-manager-packageserver 4.7.7 True False False 6m11s service-ca 4.7.7 True False False 88d storage 4.7.7 True False False 88d
-
The file
/etc/NetworkManager/dispatcher.d/99-vsphere-disable-tx-udp-tnl
is not present on any Red Hat OpenShift Container Platform - Node.$ oc debug node/worker-0 sh-4.4# chroot /host sh-4.4# ls -l /etc/NetworkManager/dispatcher.d/99-vsphere-disable-tx-udp-tnl ls: cannot access '/etc/NetworkManager/dispatcher.d/99-vsphere-disable-tx-udp-tnl': No such file or directory
-
Check Hardware Version of Virtual Machines in vCenter:
Using VMware PowerCLI for Powershell: $ Get-Folder "<OCP_Folder_Path>" | get-VM | Select-Object Name, HardwareVersion Name HardwareVersion ---- --------------- openshift-x5mg6-worker2 vmx-15 openshift-x5mg6-worker3 vmx-15 openshift-x5mg6-worker1 vmx-15 openshift-x5mg6-master3 vmx-15 openshift-x5mg6-master2 vmx-15 openshift-x5mg6-master1 vmx-15 Using VMware govc: $ for i in $(govc find -type m -name 'openshift-x5mg6*'); do govc vm.info -json $i | jq -r '[.VirtualMachines[].Name, .VirtualMachines[].Config.Version] | join(" ")'; done openshift-x5mg6-worker2 vmx-15 openshift-x5mg6-worker3 vmx-15 openshift-x5mg6-worker1 vmx-15 openshift-x5mg6-master3 vmx-15 openshift-x5mg6-master2 vmx-15 openshift-x5mg6-master1 vmx-15
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments