OpenShift 4 cluster upgrade pre-checks requirements

Solution Verified - Updated 2025-11-13T12:51:38+00:00 -

Environment

Red Hat OpenShift Container Platform (RHOCP)
- 4
Red Hat OpenShift Service on AWS (ROSA)
- 4
Red Hat OpenShift Dedicated (OSD)
- 4
Azure Red Hat OpenShift (ARO)
- 4

Issue

What are the initial requirements before upgrading an OpenShift Cluster?
How to check the health of the cluster objects?
How to check the resource allocation on the cluster?
How to check the status and running condition of the pods?
Other Pre-checks.

Resolution

Before upgrading the cluster, the below-mentioned checks can be considered to ensure that the cluster is running healthy and is safe to upgrade.

Creating a proactive case

Standard guidelines for upgrade proactive cases

Date/Time (including timezone) for the Scheduled Maintenance Window.
Full version number for version being upgraded from and to in 4.y.z format.
Proper Contact information.
Standard must-gather.
Special operators:
- If Red Hat OpenShift Data Foundation is installed on the cluster, refer also to implications to consider when upgrading OpenShift Data Foundation and open a separate support case for OpenShift Data Foundation.
- If Red Hat OpenShift Virtualization or MTV is installed on the cluster, refer also to how to open a Proactive case for OpenShift Virtualization/MTV and open a separate support case for OpenShift Virtualization.

Specific data

For self-managed OpenShift, refer also to how to open a PROACTIVE case for patching or upgrading Red Hat OpenShift Container Platform.
For OSD/ROSA Classic and ROSA HCP, refer also to how to open a PROACTIVE case for ROSA Classic, ROSA HCP, and OSD Clusters.

For ARO, refer also to how to open a PROACTIVE case for ARO Clusters, and provide the ARO Cluster ResourceID and Region which can be fetched using:

Resource ID:
  $ az aro show -n <cluster_name> -g <resource_group> --subscription <subscription_name> --query id
Region:
  $ az aro show -n <cluster_name> -g <resource_group> --subscription <subscription_name> --query location

Cluster Pre-Checks

Checking Operators

Check if the version of the operators running on the cluster are compatible with the desired OpenShift version. For Red Hat supported operators in OpenShift, refer to OpenShift Operator Life Cycles and Red Hat OpenShift Container Platform Operator Update Information Checker, and search for each specific operator. This is specially important when upgrading to a new OpenShift minor version (the y in 4.y.z).

IMPORTANT NOTE: if Red Hat OpenShift Data Foundations (RHODF) is installed on the cluster, in addition to check if the RHODF version is compatible with the desired RHOCP version, please refer also to OpenShift Data Foundations (RHODF) Operator Upgrade Pre-Checks, as if the status of the ODF cluster is not healthy, it will not be possible to drain the ODF nodes causing the OpenShift upgrade to hang. Check also implications to consider when upgrading OpenShift Data Foundation.

Checking the Cluster Update Path

It is important to check the update path available for the upgrade in advance, and also just before starting the upgrade (note that upgrade path can change if specific release is identified to be affected by specific issues), using the Red Hat OpenShift Container Platform Update Path:

If OCP or ARO, please follow the Standard Update Path tooling.
If OSD or ROSA (Classic or HCP), please follow the ROSA Update Path tooling.

If a cluster upgrade is delayed or scheduled for a later date, please check the Update Paths again before running the upgrade steps.

It is possible that the supported path has changed, or that upgrades to certain versions are blocked due to Development Engineering intervention to allow for patching of a newly discovered bug/issue.
If there is a version block, the details for why this block is occurring will be provided in the results after running the above mentioned Update Path tooling.

Checking removed APIs

When upgrading to a new OpenShift minor version (the y in 4.y.z), it is needed to check if there is any API removed, and if any custom application is still using it. If custom applications are using any API that is removed in the desired minor version, it will be needed to update those applications to avoid issues after the upgrade. Refer to navigating Kubernetes API deprecations and removals for additional information.

If upgrading a cluster installed in vSphere to OpenShift 4.13 or 4.14

Check the known Issues with OpenShift 4.12 to 4.13 or 4.13 to 4.14 vSphere CSI Storage Migration.

If upgrading to OpenShift 4.14

Checking for duplicated headers in requests

When upgrading to OpenShift 4.14, the HAProxy is upgraded from version 2.2 in previous versions to 2.6 in 4.14. This upgrade included a new behavior of HAProxy when duplicated headers are found as explained in Pods returns a 502 or 400 error when accessing via the application route after upgrading the RHOCP cluster to version 4.14, which includes information to identify the usage of duplicated headers before upgrading.

If upgrading to OpenShift 4.15

Checking the usage of ServiceAccount token secrets

When upgrading to OpenShift 4.15, the ServiceAccount token secrets automatically created in previous releases are removed if the Internal Image Registry is configured as Removed. Please, refer to ServiceAccount token secrets missing after upgrading to OpenShift 4.15 before upgrading to OpenShift 4.15 for additional information.

Checking if IPsec is configured in the cluster

There is a known bug when upgrading clusters with IPsec enabled to OpenShift 4.15. Please, refer to how to upgrade to or between 4.15 releases and above when IPsec is enabled for additional information about the issue, and check in the network.operator cluster resource if IPsec is enabled:

$ oc get network.operator cluster -o yaml
[...]
      ipsecConfig:
        mode: Full
[...]

If upgrading to OpenShift 4.17

Do not perform the network plugin migration at the same time that an upgrade

OpenShift SDN is no longer supported in OpenShift 4.17, and a migration to OVN-Kubernetes is required. The network plugin migration should never be done at the same time than the upgrade as explained in is it supported to upgrade the cluster to 4.17 at the same time than performing the network plugin migration? (note that the same applies to any version previous to 4.17).

When using LDAP, ensure it supports TLS 1.3 or `ECDHE` ciphers

Due to the underlying Go version used, cipher suites without ECDHE support are no longer offered by either clients or servers during pre-TLS 1.3 handshakes. Refer to LDAP authentication fails with TLS handshake failure in OpenShift 4.17 or newer for additional information.

If upgrading to OpenShift 4.19

Ensure control plane nodes have the label `node-role.kubernetes.io/control-plane`

The label node-role.kubernetes.io/control-plane could be missing in clusters installed in older versions that do not include it from the installation, which can cause issues during upgrades like machine-config-nodes-crd-cleanup pod in Pending state during upgrade from OpenShift 4.18 to 4.19. Refer to inconsistency of node-role between newly created vs. long running OpenShift 4 clusters for additional information.

Ensure only `cgroup` v2 is used in the cluster

As cgroup v1 has been removed in OpenShift 4.19 (it has been deprecated in OpenShift 4.16), it is needed to ensure the cluster is using cgroup v2 before upgrading.

If upgrading to OpenShift 4.20

Red Hat Marketplace is deprecated

In OpenShift 4.20, the Red Hat Marketplace is deprecated and it will be removed in an upcoming release. Refer to Red Hat Marketplace is deprecated for additional information.

Checking the Cluster Objects

Check the status of the nodes to ensure that none of the nodes are in a NotReady state or in a SchedulingDisabled state:

$ oc get nodes
NAME                             STATUS   ROLES    AGE     VERSION
master-0.lab.example.com         Ready    master   3d18h   v1.23.12+8a6bfe4
master-1.lab.example.com         Ready    master   3d18h   v1.23.12+8a6bfe4
master-2.lab.example.com         Ready    master   3d18h   v1.23.12+8a6bfe4
worker-0.lab.example.com         Ready    worker   3d17h   v1.23.12+8a6bfe4
worker-1.lab.example.com         Ready    worker   3d17h   v1.23.12+8a6bfe4
worker-2.lab.example.com         Ready    worker   3d17h   v1.23.12+8a6bfe4

Check the status of the cluster operators to ensure that all the cluster operators are Available and are not in a Degraded state:

$ oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
[...]   
etcd                                       4.10.54   True        False         False      3d18h   
image-registry                             4.10.54   True        False         False      3d9h    
ingress                                    4.10.54   True        False         False      3d17h   
insights                                   4.10.54   True        False         False      3d18h   
kube-apiserver                             4.10.54   True        False         False      3d18h   
kube-controller-manager                    4.10.54   True        False         False      3d18h   
kube-scheduler                             4.10.54   True        False         False      3d18h   
kube-storage-version-migrator              4.10.54   True        False         False      2d3h    
machine-api                                4.10.54   True        False         False      3d18h   
machine-approver                           4.10.54   True        False         False      3d18h   
machine-config                             4.10.54   True        False         False      2d2h    
[...]

Check the health of the PVs and PVCs to ensure that:

All the PVs and PVCs are mounted.
None of the PVs and PVCs are unmounted.
None of the PVs and PVCs are stuck in the terminating state.
No abnormal configurations exist.

$ oc get pv,pvc -A
NAMESPACE              NAME                                                       STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
[...]
openshift-compliance   persistentvolumeclaim/ocp4-cis                             Active                                                     6d10h
openshift-compliance   persistentvolumeclaim/ocp4-cis-node-master                 Active                                                     6d10h
openshift-compliance   persistentvolumeclaim/ocp4-cis-node-worker                 Active                                                     6d10h
[...]

Checks regarding machineConfigPools:
- Check that every node on the cluster is associated at least with one machineConfigPool: the way of achieving this is that every node must have a label which must be as nodeSelector in one of the existing machineConfigPools. After starting a cluster upgrade, new rendered configs will be created, hence machineConfigPools will apply the new rendered configs to the nodes. If a node is not associated with a machineConfigPool, the MachineConfigController will avoid updating this node. After the node is associated with a machineConfigPool, it will synchronize its configuration with the corresponding rendered config.
- Check that all machineConfigPools have paused: false parameter. If a machineConfigPool is on paused state, the nodes associated with this machineConfigPool will not be updated. More information on the Red Hat Solution MachineConfigPools are paused, preventing the Machine Config Operator to push out updates in OpenShift 4.
- Check the health of the machineConfigPools and make sure that MACHINECOUNT must be equal to READYMACHINECOUNT and there must be no machine stuck in UPDATEDMACHINECOUNT and DEGRADEDMACHINECOUNT:
```
$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX   True      False      False      3              3                   3                     0                      4d7h
worker   rendered-worker-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX   True      False      False      3              3                   3                     0                      4d7h
```

Checking the Cluster Node Allocation

In order to check resource allocation, it can be done in 2 ways:

Using `$ oc describe`

$ oc describe node worker-0.lab.example.com
[...]
Conditions:                          <====
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 14 Jun 2023 15:24:09 -0400   Tue, 13 Jun 2023 02:59:26 -0400   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 14 Jun 2023 15:24:09 -0400   Tue, 13 Jun 2023 02:59:26 -0400   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 14 Jun 2023 15:24:09 -0400   Tue, 13 Jun 2023 02:59:26 -0400   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Wed, 14 Jun 2023 15:24:09 -0400   Tue, 13 Jun 2023 02:59:36 -0400   KubeletReady                 kubelet is posting ready status

Capacity:                             <====
  cpu:                4
  ephemeral-storage:  41407468Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             8146240Ki
  pods:               250

Allocatable:                           <====
  cpu:                3500m
  ephemeral-storage:  37087380622
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             6995264Ki
  pods:               250

System Info:                           <====
  Machine ID:                             bc21e1755a9142238b04129b97e118c0
  System UUID:                            bc21e175-5a91-4223-8b04-129b97e118c0
  Boot ID:                                b0520f6a-09e7-4bf8-8e0c-9aa749fd14bc
  Kernel Version:                         4.18.0-372.58.1.el8_6.x86_64
  OS Image:                               Red Hat Enterprise Linux CoreOS 412.86.202305230130-0 (Ootpa)
  Operating System:                       linux
  Architecture:                           amd64
  Container Runtime Version:              cri-o://1.25.3-4.rhaos4.12.git76ceef4.el8
  Kubelet Version:                        v1.25.8+37a9a08
  Kube-Proxy Version:                     v1.25.8+37a9a08

Non-terminated Pods:                      (33 in total)                            <====
  Namespace                               Name                                                    CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                               ----                                                    ------------  ----------  ---------------  -------------  ---
  new-test                                httpd-675fd5bfdd-9s4pr                                  0 (0%)        0 (0%)      0 (0%)           0 (0%)         34h
  openshift-cluster-node-tuning-operator  tuned-wfrxw                                             10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         37h
  openshift-cnv                           cdi-operator-6ffbc46886-rfb97                           10m (0%)      0 (0%)      150Mi (2%)       0 (0%)         27h
  openshift-cnv                           cluster-network-addons-operator-675b769f6f-954wl        60m (1%)      0 (0%)      50Mi (0%)        0 (0%)         27h
  openshift-cnv                           hco-operator-7f8c48598d-c6vh5                           10m (0%)      0 (0%)      96Mi (1%)        0 (0%)         27h
  openshift-cnv                           hco-webhook-fc6b4c4b5-7zrdb                             5m (0%)       0 (0%)      48Mi (0%)        0 (0%)         27h
  openshift-cnv                           hostpath-provisioner-operator-6b6bc8bf8-6mxkp           10m (0%)      0 (0%)      150Mi (2%)       0 (0%)         27h
  openshift-cnv                           hyperconverged-cluster-cli-download-7f5844cb77-ftjbz    10m (0%)      0 (0%)      96Mi (1%)        0 (0%)         27h

Allocated resources:                           <====  
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                959m (27%)    1700m (48%)
  memory             2968Mi (43%)  1800Mi (26%)
  ephemeral-storage  0 (0%)        0 (0%)
  hugepages-1Gi      0 (0%)        0 (0%)
  hugepages-2Mi      0 (0%)        0 (0%)
Events:              <none>                    <====
[...]

The description of all the nodes at once can also be possible using:

$ oc describe nodes > nodes_description.yaml

Using `YAML Output`

$ oc get node worker.lab.example.com -oyaml

To get the Resource Allocation of all the nodes at once, the below command can be used:

$ for i in $(oc get nodes | awk '{print $1}'); do echo "==== $i ====";oc describe node $i 2> /dev/null | grep -A10 Allocated; echo; done
[...]
==== master-0.lab.example.com ====
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                1970m (56%)   400m (11%)
  memory             8022Mi (53%)  900Mi (6%)
  ephemeral-storage  0 (0%)        0 (0%)
  hugepages-1Gi      0 (0%)        0 (0%)
  hugepages-2Mi      0 (0%)        0 (0%)
Events:              <none>

==== master-1.lab.example.com ====
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                1935m (55%)   0 (0%)
  memory             8357Mi (56%)  0 (0%)
  ephemeral-storage  0 (0%)        0 (0%)
  hugepages-1Gi      0 (0%)        0 (0%)
  hugepages-2Mi      0 (0%)        0 (0%)
Events:              <none>

==== master-2.lab.example.com ====
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                1579m (45%)   0 (0%)
  memory             6282Mi (42%)  0 (0%)
  ephemeral-storage  0 (0%)        0 (0%)
  hugepages-1Gi      0 (0%)        0 (0%)
  hugepages-2Mi      0 (0%)        0 (0%)
Events:              <none>
[...]

Refer to the below documentation for more details regarding Requests/Limits and Node Overcommitment.

Checking the Pod's health and Status

Check pods for status that is not Running or Completed or Succeeded:

$ oc get pods --all-namespaces | egrep -v 'Running | Completed | Succeeded'

Check all pod's status within a namespace::

$ for i in `oc adm top pods -A | awk '{print $1}' | uniq`; do echo $i; oc get pods -owide -n $i; done

###[Using grep against node name will limit the search to get more accurate results]

$ for i in `oc adm top pods -A | awk '{print $1}' | uniq`; do echo $i; oc get pods -owide -n $i | grep <node_name>; echo '---------------------'; done

Check Pod logs using:

$ oc logs pod/<pod_name> -n <namespace_name>

Use -c parameter to fetch the logs from a particular container:

$ oc logs pod/<pod_name> -c <conatiner_name> -n <namespace_name>

Other Pre-checks:

Check the health of the etcd cluster.
Check the network health using Network Observability
Check for pending certificate signing requests:
```
$ oc get csr
```
For the Pod Disruption Budget, the below command output is required to check if there are pods that may block the node draining process during upgrades. Usually, the allowed disruptions are set to 1 to allow nodes to be drained properly. The must-gather doesn’t capture this properly so it is needed to check this output manually:
```
$ oc get pdb -A
```
Check the firing alerts in alertmanager via Web Console -> Observe -> Alerting and make sure there is no Warning or Critical alert firing, and that you are aware of the existing Info ones.
Look for Warning events in all namespaces and check if there is anything that might be concerning:
```
$ oc get events -A --field-selector type=Warning --sort-by=".lastTimestamp" 
```
Ensure that any third-party software running on the cluster is compatible with the target OpenShift version prior to upgrading. Please note that Red Hat does not verify third-party compatibility with any version of OCP. We leave this to be the sole responsibility of the vendor. Please check our Third-Party Support documentation for further information.
- Third-party applications/operators of note are the following:
- TwistLock - Compatibility Matrix [External Link]
- Dynatrace - Compatibility Matrix [External Link]
Review the release notes for the target OpenShift version to identify any platform changes that could impact your applications. This allows you to assess potential risks and implement necessary adjustments before proceeding with the upgrade.

Root Cause

Red Hat OpenShift Container Platform 4 upgrades implies the upgrade of several different components, and it is required to check the overall status of the cluster and the compatibility of any additional operator and third party components before starting an upgrade.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.