Cluster upgrade fail because one of the node failed to update OS in RHOCP4
Environment
- Red Hat OpenShift Container Platform
- 4
Issue
- Cluster upgrade fail because one of the node failed to update OS.
Resolution
This issue has been reported to Red Hat engineering. It is being tracked in Bug. For more information, please open a new support case with Red Hat Support.
Workaround
-
To resolve this issue, access the node
ocp-lab-example-infra-node
via SSH become root, then run the following command.podman pull --tls-verify=false --authfile /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:31dfb8492f2b5eefd675ee32b8e38ee4b5823a23261fdacb6ba2fd7263258b6e
-
Then if it succeed please restart the
machine-config-daemon
pods.oc delete po --all -n openshift-machine-config-operator -l k8s-app=machine-config-daemon
-
If instead it not succeed please try the following
(via SSH as root from the affected node, requires 2x reboots)
.sh-5.1# rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art dev@sha256:31dfb8492f2b5eefd675ee32b8e38ee4b5823a23261fdacb6ba2fd7263258b6e sh-5.1# systemctl reboot sh-5.1# touch /run/machine-config-daemon-force
Diagnostic Steps
-
Check the
machine-config
cluster operator for any similar error messages.- lastTransitionTime: "2025-05-14T14:07:36Z" message: One or more machine config pools are degraded, please see `oc get mcp` for further details and resolve before upgrading reason: DegradedPool status: "False" type: Upgradeable
-
Review the status of the machineconfigpool (MCP) and confirm its current state.
$ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE infra rendered-infra-xxx False True True 6 4 4 1 3y master rendered-master-xxx True False False 3 3 3 0 4y worker rendered-worker-xxx False False False 2 0 0 0 4y
-
Inspect the status of the nodes to see if any are marked as
SchedulingDisabled
.ocp-lab-example-infra-node Ready,SchedulingDisabled infra,worker 2y v1.27.16+03a907c
-
Review the node yaml files to identify if similar error messages are present.
$ oc get node ocp-lab-example-infra-node -oyaml machineconfiguration.openshift.io/reason: |- failed to update OS to quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:31dfb8492f2b5eefd675ee32b8e38ee4b5823a23261fdacb6ba2fd7263258b6e : error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:31dfb8492f2b5eefd675ee32b8e38ee4b5823a23261fdacb6ba2fd7263258b6e: error: Creating importer: Failed to invoke skopeo proxy method OpenImage: remote error: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on [::1]:53: read udp [::1]:43165->[::1]:53: read: connection refused : exit status 1 machineconfiguration.openshift.io/state: Degraded
-
Verify if the
machine-config-operator
is also reporting a similar error.$ oc logs machine-config-daemon-xxx -c machine-config-daemon 2025-05-15T19:49:07.812893595Z E0515 19:49:07.812878 3429607 writer.go:226] Marking Degraded due to: failed to update OS to quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:31dfb8492f2b5eefd675ee32b8e38ee4b5823a23261fdacb6ba2fd7263258b6e : error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art dev@sha256:31dfb8492f2b5eefd675ee32b8e38ee4b5823a23261fdacb6ba2fd7263258b6e: error: Creating importer: Failed to invoke skopeo proxy method OpenImage: remote error: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on [::1]:53: read udp [::1]:54775->[::1]:53: read: connection refused
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments