Pods using Persistent Volumes with high file counts fail to start or take an excessive amount of time in OpenShift

Solution Verified - Updated -

Issue

  • Pod deployments are failing with the following message:

    Error: Failed to create pod sandbox: rpc error: code = Unknown desc = Kubelet may be retrying requests that are timing out in CRI-O due to system load: context deadline exceeded
    
  • Pods not able to start falling into CreateContainerError status:

    mypod-5-1111a           0/1     CreateContainerError   0          7m29s
    
  • When attaching volumes to pods in Red Hat OpenShift Container Platform, why do pods sometimes not start, or otherwise take an excessive amount of time to start?
  • The volumes themselves have very high file counts, measured often in tens of thousands of files and directories (or higher).
  • Starting the pods without the high file count volumes allows the pod to become Ready quickly (but without access to the data the volume provides).
  • It is possible that entire nodes sometimes are marked as NotReady due to this issue as the container runtime (docker or cri-o) is unresponsive (as seen with hung docker ps or crictl ps commands).
  • When using Persistent Volumes with high file counts in OpenShift, why do pods fail to start or take an excessive amount of time to achieve Ready state?

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 3
    • 4
  • Docker Container Engine
  • CRI-O Container Engine
  • SELinux

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content