Why are pods mounting volumes with a huge number of files failing to start after upgrading to Openshift Data Foundation 4.12?

Solution In Progress - Updated 2024-06-13T18:04:58+00:00 -

Issue

After upgrading from ODF 4.11 to 4.12, pods attaching volumes with many files ( round to millions ) fail to start with a timeout. These are some sample events reported. In the example below, the affected pod is named simple-app-67dfcff4c8-v7gxv. Note the events timed out waiting for the condition:

oc describe pod simple-app-67dfcff4c8-v7gxv

    Type     Reason                  Age                    From                                                   Message
    ----     ------                  ----                   ----                                                   -------
    Normal   Scheduled               <unknown>                                                                     Successfully assigned test/simple-app-67dfcff4c8-v7gxv to worker-2.example.com
    Normal   SuccessfulAttachVolume  30m                    attachdetach-controller                                AttachVolume.Attach succeeded for volume "pvc-0ea2d69a-8e9d-41b2-bfa5-85d0ede6211b"
    Warning  FailedMount             23m (x2 over 25m)      kubelet, worker-2.example.com  Unable to attach or mount volumes: unmounted volumes=[volume-wz7wf], unattached volumes=[volume-wz7wf kube-api-access-hmlzb]: timed out waiting for the condition
    Warning  FailedMount             19m (x3 over 28m)      kubelet, worker-2.example.com  Unable to attach or mount volumes: unmounted volumes=[volume-wz7wf], unattached volumes=[kube-api-access-hmlzb volume-wz7wf]: timed out waiting for the condition
    Warning  FailedMount             4m32s (x5 over 15m)    kubelet, worker-2.example.com  Unable to attach or mount volumes: unmounted volumes=[volume-wz7wf], unattached volumes=[volume-wz7wf kube-api-access-hmlzb]: timed out waiting for the condition
    Warning  FailedMount             2m15s (x2 over 6m46s)  kubelet, worker-2.example.com  Unable to attach or mount volumes: unmounted volumes=[volume-wz7wf], unattached volumes=[kube-api-access-hmlzb volume-wz7wf]: timed out waiting for the condition

The volume is correctly mounted in the node hosting the pod. In this example, it's a CephFS volume:

$ oc debug node/worker-2.example.com
# chroot /host
sh-4.4# mount -l | grep cephfs
  <mon-ip-1>:6789,<mon-ip-2>:6789,<mon-ip-3>:6789,<mon-ip-4>:6789,<mon-ip-4>:6789:/volumes/csi/csi-vol-5c8b10be-6760-11ee-ada8-0a580a800215/c7feed46-8956-45c2-b71f-906ae4ad4718 on /host/var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.cephfs.csi.ceph.com/c7dcb34afe060d6cd58e994fc5c10868624970393d6415e2c085b8c6630532b0/globalmount type ceph (rw,relatime,seclabel,name=csi-cephfs-node,secret=<hidden>,acl,mds_namespace=my-filesystem)
  <mon-ip-1>:6789,<mon-ip-2>:6789,<mon-ip-3>:6789,<mon-ip-4>:6789,<mon-ip-5>:6789:/volumes/csi/csi-vol-5c8b10be-6760-11ee-ada8-0a580a800215/c7feed46-8956-45c2-b71f-906ae4ad4718 on /host/var/lib/kubelet/pods/029e640c-2db9-49dd-ae9e-5215de6b11f7/volumes/kubernetes.io~csi/pvc-0ea2d69a-8e9d-41b2-bfa5-85d0ede6211b/mount type ceph (rw,relatime,seclabel,name=csi-cephfs-node,secret=<hidden>,acl,mds_namespace=my-filesystem)

This mount point is also writable.

Why is this issue occurring? How to prevent this problem?

Environment

Red Hat Openshift Data Foundation, versions:
- v4.12
- v4.13

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

Why are pods mounting volumes with a huge number of files failing to start after upgrading to Openshift Data Foundation 4.12?

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links