Error when upgrading the cluster: "ImagePullBackOff: error creating read-write layer with ID * no such file or directory"

Solution Verified - Updated 2025-01-03T05:39:49+00:00 -

Environment

Red Hat OpenShift Container Platform 4.12 and earlier versions.

Issue

While upgrading a cluster, a node fails to pull images. Its Machine Config Daemon pod shows an error like the following:

error creating read-write layer with ID "4372a0c382584d7752da058c5267d1d652d727585457a71c5ef3d4d17a951719": Stat /var/lib/containers/storage/overlay/27abd31f77c1e21b8897140edb61d1e52d48a3cd287c03796dbecb54684871d3: no such file or directory

Due to the latter, the upgrade job becomes blocked.

The problem can happen on several nodes in a row.

Resolution

The issue happens due to a problem in CRI-O image layer handling, which is documented in bug OCPBUGS-16874.
Upgrade to OCP 4.13, or OCP 4.12.45 or later to avoid this issue. Note that if you have experienced this issue, you must perform the below steps to wipe all CRI-O storage to clear the condition regardless of whether the node was successfully upgraded; it must be wiped at least once to clear the condition.
If a cluster is affected, the following workaround can be applied to solve the problem:

Drain the node

$ oc adm drain --ignore-daemonsets --delete-emptydir-data ${NODE}

In the node affected, run the following commands as root:

$ systemctl disable kubelet
$ systemctl disable crio
$ reboot

After the reboot, execute the following commands also as root user:

$ rm -rf /var/lib/containers/*
$ crio wipe -f
$ systemctl enable --now crio
$ systemctl enable --now kubelet

Uncordon the node
```
$ oc adm uncordon $NODE
```

Root Cause

An issue regarding how container storage handles layers was fixed and merged into CRI-O as of OCP 4.12.45.
As CRI-O utilizes a shared container storage package, the fixes can be seen in the containers/storage package which were then imported into CRI-O as of OCP 4.12.45 or later.
- https://github.com/containers/storage/pull/1138
- https://github.com/containers/storage/pull/1407
Prior to these versions, layer handling could potentially be mishandled, causing errors when reading or accessing container image layer directories.

Diagnostic Steps

If a cluster upgrade becomes blocked, check whether there is any pod which name starts with machine-config-daemon is in an unhealthy status like ContainerCreating around 5 minutes after it was created. For that, the following command can be executed:

$ oc get pod -n openshift-machine-config-operator

In case there is a pod in an unhealthy status, execute the following command:

$ oc logs <pod_name> -n openshift-machine-config-operator

The cluster should be affected by this bug if the error shared in the "Issue" section and pasted again below shows up:

error creating read-write layer with ID "4372a0c382584d7752da058c5267d1d652d727585457a71c5ef3d4d17a951719": Stat /var/lib/containers/storage/overlay/27abd31f77c1e21b8897140edb61d1e52d48a3cd287c03796dbecb54684871d3: no such file or directory

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Select Your Language

Error when upgrading the cluster: "ImagePullBackOff: error creating read-write layer with ID * no such file or directory"

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links