ARO - SystemDLooping Update Risk
Environment
-
Clusters created at ARO (Azure Red Hat OpenShift) version:
- less than 4.12.z
- 4.11.z
- 4.12.z
-
Update from (Current ARO version):
- 4.12.z
-
Update to (Desired ARO version):
- 4.13.46
-
Fixed in ARO versions:
- 4.13.48
- 4.14.35
- 4.15.28
- 4.16.8
Issue
Red Hat has identified an update risk for clusters updating into OpenShift 4.13.46 on ARO. Since these nodes are unable to reboot this stalls out the entire update process for ARO.
Red Hat recommends not to update into OpenShift 4.13.46 on ARO.
Resolution
For Azure Red Hat OpenShift (ARO), create a ticket with Red Hat Support or Microsoft Customer Support for further guidance on unblocking any upgrade issues. Do not attempt manual remediation.
NOTE: For self-managed OpenShift Container Platform on Microsoft Azure, refer here.
Root Cause
In OpenShift 4.13.46 it was detected systemd
dependency loop. Which causes systemd
to delete some dependencies causing kubelet
and crio
to never start on the affected nodes. For more information:OCPBUGS-33694.
Diagnostic Steps
For customers looking to start an update:
- Run
oc describe clusterversion
. This will list all the versions of the cluster, the version in at the bottom of the this list will be the version the cluster was created at. If your cluster was created in a version 4.12.z or older (lower number), do not update into OpenShift 4.13.46 on ARO.
For customers experiencing stalled update:
The following commands can be used to diagnose the issue.
- Verify that the cluster version shows a failed state for upgrade to 4.13.46.
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.12.25 True True 3h8m Unable to apply 4.13.46: wait has exceeded 40 minutes for these operators: etcd, kube-apiserver
oc get mcp
The status of the MachineConfigPool should show that the pools are updating but not progressing past the first node.
$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-96b90a0c2d5dea265139c1ce8bd7106e False True False 3 0 0 0 4h33m
worker rendered-worker-349d41ff99e322a88d498b0ccc628eff False True False
oc get nodes
Should show one worker and one master node Not Ready.
$ oc get nodes
NAME STATUS ROLES AGE VERSION
aro-8zkjg-master-0 NotReady,SchedulingDisabled control-plane,master 4h41m v1.25.11+1485cc9
aro-8zkjg-master-1 Ready control-plane,master 4h41m v1.25.11+1485cc9
aro-8zkjg-master-2 Ready control-plane,master 4h42m v1.25.11+1485cc9
aro-8zkjg-worker-eastus1-44flv NotReady,SchedulingDisabled worker 4h26m v1.25.11+1485cc9
aro-8zkjg-worker-eastus2-kcklc Ready worker 4h26m v1.25.11+1485cc9
aro-8zkjg-worker-eastus3-7kwlk Ready worker 4h26m v1.25.11+1485cc9
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments