Why are pods mounting volumes with a huge number of files failing to start after upgrading to Openshift Data Foundation 4.12?
Issue
-
After upgrading from ODF 4.11 to 4.12, pods attaching volumes with many files ( round to millions ) fail to start with a timeout. These are some sample events reported. In the example below, the affected pod is named
simple-app-67dfcff4c8-v7gxv
. Note the eventstimed out waiting for the condition
:oc describe pod simple-app-67dfcff4c8-v7gxv Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> Successfully assigned test/simple-app-67dfcff4c8-v7gxv to worker-2.example.com Normal SuccessfulAttachVolume 30m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-0ea2d69a-8e9d-41b2-bfa5-85d0ede6211b" Warning FailedMount 23m (x2 over 25m) kubelet, worker-2.example.com Unable to attach or mount volumes: unmounted volumes=[volume-wz7wf], unattached volumes=[volume-wz7wf kube-api-access-hmlzb]: timed out waiting for the condition Warning FailedMount 19m (x3 over 28m) kubelet, worker-2.example.com Unable to attach or mount volumes: unmounted volumes=[volume-wz7wf], unattached volumes=[kube-api-access-hmlzb volume-wz7wf]: timed out waiting for the condition Warning FailedMount 4m32s (x5 over 15m) kubelet, worker-2.example.com Unable to attach or mount volumes: unmounted volumes=[volume-wz7wf], unattached volumes=[volume-wz7wf kube-api-access-hmlzb]: timed out waiting for the condition Warning FailedMount 2m15s (x2 over 6m46s) kubelet, worker-2.example.com Unable to attach or mount volumes: unmounted volumes=[volume-wz7wf], unattached volumes=[kube-api-access-hmlzb volume-wz7wf]: timed out waiting for the condition
-
The volume is correctly mounted in the node hosting the pod. In this example, it's a CephFS volume:
$ oc debug node/worker-2.example.com # chroot /host sh-4.4# mount -l | grep cephfs <mon-ip-1>:6789,<mon-ip-2>:6789,<mon-ip-3>:6789,<mon-ip-4>:6789,<mon-ip-4>:6789:/volumes/csi/csi-vol-5c8b10be-6760-11ee-ada8-0a580a800215/c7feed46-8956-45c2-b71f-906ae4ad4718 on /host/var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.cephfs.csi.ceph.com/c7dcb34afe060d6cd58e994fc5c10868624970393d6415e2c085b8c6630532b0/globalmount type ceph (rw,relatime,seclabel,name=csi-cephfs-node,secret=<hidden>,acl,mds_namespace=my-filesystem) <mon-ip-1>:6789,<mon-ip-2>:6789,<mon-ip-3>:6789,<mon-ip-4>:6789,<mon-ip-5>:6789:/volumes/csi/csi-vol-5c8b10be-6760-11ee-ada8-0a580a800215/c7feed46-8956-45c2-b71f-906ae4ad4718 on /host/var/lib/kubelet/pods/029e640c-2db9-49dd-ae9e-5215de6b11f7/volumes/kubernetes.io~csi/pvc-0ea2d69a-8e9d-41b2-bfa5-85d0ede6211b/mount type ceph (rw,relatime,seclabel,name=csi-cephfs-node,secret=<hidden>,acl,mds_namespace=my-filesystem)
This mount point is also writable.
-
Why is this issue occurring? How to prevent this problem?
Environment
- Red Hat Openshift Data Foundation, versions:
- v4.12
- v4.13
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.