Pod has an EFS mount not responding

Solution Verified - Updated -

Environment

  • Azure Red Hat Openshift (ARO)
    • v4.10.50

Issue

  • EFS stunnel process in hung state on a node
  • EFS mount not responding

Sample Errors:

[ERROR]  POD grafana-xxx with UID xxxxxxxx has an EFS mount not responding. Mount details : /var/lib/kubelet/pods/xxxxx-xxx-xxxx-xxxx-xxxx-xxxxx/mount.

Resolution

  • This is a known bug OCPBUGS-7815 resolved on upgrading cluster to 4.10.54 version.

  • Download the OCP v4.10.54 and upgrade the cluster. The latest versions also contains the fix.

  • This issue is in advisory RHBA-2023:1154

Root Cause

  • The aws-efs-csi-driver-operator seems to have an issue where it loses connectivity to the underlying EFS system. This could be related to a memory leak in stunnel which causes process to die. This is addressed in the efs-utils version v1.34.2 and leverages stunnel version v5.58.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments