Network disconnection cause ceph mounts to fail with 'Permission denied' errors

Solution Verified - Updated -

Environment

  • Openshift Container Storage version 4.8 and below
  • Openshift Data Foundation 4.9 and above

Issue

  • Pods mounting CephFS volumes get Permission denied errors
  • Network disruption followed by client eviction which results in the mount point being in accessible because of 'Permission Denied'.
  • Pod using cephfs PV fails to start - CreateContainerError : failed to resolve symlink : lstat : permission denied
Error: failed to resolve symlink "/var/lib/kubelet/pods/44xxxxxb-8xx3-4xx3-8xx1-cxxxxxxx2/volumes/kubernetes.io~csi/pvc-axxxxb-6xx1-4xx4-bxxa-0xxxxxxxxxf/mount": lstat /var/lib/kubelet/pods/44xxxxxb-8xx3-4xx3-8xx1-cxxxxxxxx92/volumes/kubernetes.io~csi/pvc-abxxxxxb-6xx1-4xx4-bxxa-0xxxxxxf/mount: permission denied
  • This issue is related only to cephfs, BUT NOT to ceph-rbd

Resolution

  • Identify the PVC from the error log: in this exmaple pvc-axxxxxxb-6xxx1-4xx4-bxxa-01xxxxxxxf
  • Identify the node where the pod is in CreateContainerError state : oc -n app_namespace get pod -o wide
  • Identify the list of pods using the PVC on that given node.
  • Scale down all the pods using the PVC
  • Check no stale mount exists on the node (bind mounts and global mounts) : on the ocs node run: mount |grep cephfs
Example of global mount: 
172.30.72.203:3300,172.30.25.152:3300,172.30.171.68:3300:/volumes/csi/csi-vol-84xxxxxd-1xx5-1xxe-9xx4-0a5xxxxxxb/fxxxxxx9-8xx9-4xx9-9xx5-exxxxxxx3 on /var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.cephfs.csi.ceph.com/axxxxxxxxxxxxf51fb9ad188f0730xxxxxxxxxxxxxb/globalmount type ceph (rw,relatime,seclabel,name=csi-cephfs-node,secret=<hidden>,ms_mode=prefer-crc,fsid=00000000-0000-0000-0000-000000000000,acl,mds_namespace=ocs-storagecluster-cephfilesystem,_netdev)

Example of bind mount:
172.30.72.203:3300,172.30.25.152:3300,172.30.171.68:3300:/volumes/csi/csi-vol-8xxxxxd-1xx5-1xxe-9xx4-0a5xxxxxxb/fxxxx29-8xx9-4xx9-9xx5-e5xxxxxx3 on /var/lib/kubelet/pods/4xxxxb-8xx3-4xx3-8xx1-cxxxxxx2/volumes/kubernetes.io~csi/pvc-axxxxxxxb-6xx1-4xx4-bxxa-0xxxxxxf/mount type ceph (rw,relatime,seclabel,name=csi-cephfs-node,secret=<hidden>,ms_mode=prefer-crc,fsid=00000000-0000-0000-0000-000000000000,acl,mds_namespace=ocs-storagecluster-cephfilesystem,_netdev)
  • If there are still cephfs mounts present for that PVC , from the OCS node manually unmount the CSI cephfs bind and global mounts (for that PV) on the worker node : unmount /mount_path

  • Now scale up all the pods (that were identified and scale down before) : new mounts should be initiated automatically when a new pod is started, the first pod kubelet will send to the CephCSI driver a NodeStage (to mount the CepFS pvc to the stagingPath) followed by nodePublish (bind mount the stagingPath to the TargetPath)

  • As last resort reboot the node on which pod is trying to schedule and in 'CreateContainerError' state

  • In the below example, pod 'prometheus-k8s-1' is in 'CreateContainerError' state on node dell-r740xd-2.gsslab.pnq2.redhat.com , the node to be rebooted is 'dell-r740xd-2.gsslab.pnq2.redhat.com'
prometheus-k8s-0                                         6/6     Terminating            0               2d11h   10.131.0.153    dell-r740xd-3.gsslab.pnq2.redhat.com   <none>           <none>
prometheus-k8s-1                                         6/6     CreateContainerError   0               148d    10.129.0.58     dell-r740xd-2.gsslab.pnq2.redhat.com   <none>           <none>
prometheus-operator-admission-webhook-5d8ff8b977-2bgdw   1/1     CreateContainerError   0               2d11h   10.131.0.160    dell-r740xd-3.gsslab.pnq2.redhat.com   <none>           <none>

Root Cause

  • CephFS clients observe network disconnections . The affected client is added to the blocklist. Once the entry in the blocklist times out, the access to the file system is not recovered.
  • 'Permission denied' errors are expected after a CephFS client is evicted. The issue is with the design of kceph fs client (the driver that allows mounting a cephfs subvlume is inside the RHEL kernel, the kernel module ceph). Once this occurs, there's no possible CephFS re-connection. The client needs to umount / mount the file system again to recover access.
  • This issue is being tracked in bug BZ 2119852. Detailed RCA is in the bug BZ 2119852#38

Diagnostic Steps

  • Events will report error 'failed to resolve symlink ..'
Events:
  Type     Reason          Age                   From               Message

  Normal   Scheduled       3m33s                 default-scheduler  Successfully assigned cpxxxxxx/cpxxxxxx-eventprocessor-eve-29ee-ep-taskmanager-1 to ocp-d01-wor01.example.com
  Normal   AddedInterface  3m31s                 multus             Add eth0 [10.x.x.75/23] from openshift-sdn
  Warning  Failed          74s (x12 over 3m31s)  kubelet            Error: failed to resolve symlink "/var/lib/kubelet/pods/c4xxxx7-0xx4-4xx5-8xxa-4xxxxxxxxdd/volumes/kubernetes.iocsi/pvc-6xxxxxx2-0b7a-4xx8-xxxx-xxxxxx/mount": lstat /var/lib/kubelet/pods/c4xxxxx7-0xx4-4xx5-8xxa-4xxxxdd/volumes/kubernetes.iocsi/pvc-6fbf71e2-0b7a-4fd8-94f8-e683ea90531a/mount: permission denied
  Normal   Pulled          63s (x13 over 3m31s)  kubelet            Container image "cp.icr.io/cp/iaf-flink@sha256:706xxxxxxxxxxxxxxxxx4f2a43eea9f5d0c6312782e5af15" already present on machine

  • ls on the problematic directory of the node where the pod was running

$ oc debug node/<node name> $ chroot /host $ ls -lah /var/lib/kubelet/pods/e7xxxxx9-6xx8-4xx2-bxx2-6xxxxxxxx1d/volumes/kubernetes.io~csi/pvc-3xxxxxxe-8xx9-4xx7-8xxc-6xxxxxx173/ ls: cannot access '/var/lib/kubelet/pods/e7xxxxx9-6xx8-4xx2-bxx2-6xxxxxxxx1d/volumes/kubernetes.io~csi/pvc-3xxxxxxe-8xx9-4xx7-8xxc-6xxxxxx173/mount': Permission denied total 4.0K drwxr-x---. 3 root root 40 Aug 28 14:26 . drwxr-x---. 3 root root 54 Aug 28 14:26 .. d?????????? ? ? ? ? ? mount -rw-r--r--. 1 root root 369 Aug 28 14:26 vol_data.json $ cat /var/lib/kubelet/pods/e70da9d9-64b8-46b2-bfd2-6ad9e2fb391d/volumes/kubernetes.io~csi/pvc-30xxxxx7e-8xx9-4xx7-8xxc-6xxxxxxxxx3/vol_data.json

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments