Pods using Persistent Volumes with high file counts fail to start or take an excessive amount of time in OpenShift
Issue
-
Pod deployments are failing with the following message:
Error: Failed to create pod sandbox: rpc error: code = Unknown desc = Kubelet may be retrying requests that are timing out in CRI-O due to system load: context deadline exceeded
-
Pods not able to start falling into
CreateContainerError
status:mypod-5-1111a 0/1 CreateContainerError 0 7m29s
- When attaching volumes to pods in Red Hat OpenShift Container Platform, why do pods sometimes not start, or otherwise take an excessive amount of time to start?
- The volumes themselves have very high file counts, measured often in tens of thousands of files and directories (or higher).
- Starting the pods without the high file count volumes allows the pod to become
Ready
quickly (but without access to the data the volume provides). - It is possible that entire nodes sometimes are marked as
NotReady
due to this issue as the container runtime (docker
orcri-o
) is unresponsive (as seen with hungdocker ps
orcrictl ps
commands). - When using Persistent Volumes with high file counts in OpenShift, why do pods fail to start or take an excessive amount of time to achieve
Ready
state?
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 3
- 4
- Docker Container Engine
- CRI-O Container Engine
- SELinux
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.