Cluster upgrade fail because one of the node failed to update OS in RHOCP4

Environment

Red Hat OpenShift Container Platform
- 4

Issue

Cluster upgrade fail because one of the node failed to update OS.

Resolution

This issue has been reported to Red Hat engineering. It is being tracked in Bug. For more information, please open a new support case with Red Hat Support.

Workaround

To resolve this issue, access the node ocp-lab-example-infra-node via SSH become root, then run the following command.

podman pull --tls-verify=false --authfile /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:31dfb8492f2b5eefd675ee32b8e38ee4b5823a23261fdacb6ba2fd7263258b6e

Then if it succeed please restart the machine-config-daemon pods.

oc delete po --all -n openshift-machine-config-operator -l k8s-app=machine-config-daemon

If instead it not succeed please try the following (via SSH as root from the affected node, requires 2x reboots).

sh-5.1# rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art dev@sha256:31dfb8492f2b5eefd675ee32b8e38ee4b5823a23261fdacb6ba2fd7263258b6e
sh-5.1# systemctl reboot
sh-5.1# touch /run/machine-config-daemon-force

Diagnostic Steps

Check the machine-config cluster operator for any similar error messages.

- lastTransitionTime: "2025-05-14T14:07:36Z"
message: One or more machine config pools are degraded, please see `oc get mcp`
  for further details and resolve before upgrading
reason: DegradedPool
status: "False"
type: Upgradeable

Review the status of the machineconfigpool (MCP) and confirm its current state.

$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
infra    rendered-infra-xxx    False     True       True       6              4                   4                     1                      3y
master   rendered-master-xxx   True      False      False      3              3                   3                     0                      4y
worker   rendered-worker-xxx   False     False      False      2              0                   0                     0                      4y

Inspect the status of the nodes to see if any are marked as SchedulingDisabled.

ocp-lab-example-infra-node   Ready,SchedulingDisabled   infra,worker     2y    v1.27.16+03a907c

Review the node yaml files to identify if similar error messages are present.

$ oc get node ocp-lab-example-infra-node  -oyaml

machineconfiguration.openshift.io/reason: |-
  failed to update OS to quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:31dfb8492f2b5eefd675ee32b8e38ee4b5823a23261fdacb6ba2fd7263258b6e : error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:31dfb8492f2b5eefd675ee32b8e38ee4b5823a23261fdacb6ba2fd7263258b6e: error: Creating importer: Failed to invoke skopeo proxy method OpenImage: remote error: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on [::1]:53: read udp [::1]:43165->[::1]:53: read: connection refused
  : exit status 1
machineconfiguration.openshift.io/state: Degraded

Verify if the machine-config-operator is also reporting a similar error.

$ oc logs machine-config-daemon-xxx -c machine-config-daemon
2025-05-15T19:49:07.812893595Z E0515 19:49:07.812878 3429607 writer.go:226] Marking Degraded due to: failed to update OS to quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:31dfb8492f2b5eefd675ee32b8e38ee4b5823a23261fdacb6ba2fd7263258b6e : error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art dev@sha256:31dfb8492f2b5eefd675ee32b8e38ee4b5823a23261fdacb6ba2fd7263258b6e: error: Creating importer: Failed to invoke skopeo proxy method OpenImage: remote error: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on [::1]:53: read udp [::1]:54775->[::1]:53: read: connection refused

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Select Your Language

Cluster upgrade fail because one of the node failed to update OS in RHOCP4

Environment

Issue

Resolution

Workaround

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Environment

Issue

Resolution

Workaround

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links