Pods evicted with error "The node was low on resource: ephemeral-storage"

Solution Verified - Updated -

Environment

  • OpenShift Container Platform
    • 3.11

Issue

  • Pods are being evicted and are getting an ephemeral storage error:
# oc get pods
NAME                               READY     STATUS      RESTARTS   AGE
application-name-ID-5-3-mggtz     1/1       Running     0          1h
application-name-ID-5-3-pvrwt     1/1       Running     0          1h
application-name-ID-5-1-build    0/1       Completed   0          21d
application-name-ID-5-2-build    0/1       Completed   0          21d
application-name-ID-5-1-build        0/1       Completed   0          4h
application-name-ID-5-1-deploy       0/1       Error       0          4h
application-name-ID-5-5-29qsv        0/1       Evicted     0          1h
application-name-ID-5-5-2pm5x        1/1       Running     0          1h
application-name-ID-5-4mgpt        0/1       Evicted     0          1h
application-name-ID-5-6zx5k        1/1       Running     0          2h
application-name-ID-5-74fw8        1/1       Running     0          1h
application-name-ID-5-7zrd5        0/1       Evicted     0          1h
application-name-ID-5-8n9p9        0/1       Evicted     0          1h
application-name-ID-5-dm5cg        0/1       Evicted     0          1h
application-name-ID-5-dvkdc        0/1       Evicted     0          1h
application-name-ID-5-g7q5n        1/1       Running     0          1h
application-name-ID-5-grzbl        0/1       Evicted     0          1h
application-name-ID-5-hrf5f        0/1       Evicted     0          1h
application-name-ID-5-j5z4r        0/1       Evicted     0          1h
application-name-ID-5-j69ht        1/1       Running     0          1h
application-name-ID-5-jpmdr        0/1       Evicted     0          1h
application-name-ID-5-lxcsr        0/1       Evicted     0          1h

# oc describe pod application-name-ID-5-lxcsr
...
Status:             Failed
Reason:             Evicted
Message:            The node was low on resource: ephemeral-storage. Container ts-airsearch-eburst was using 8369512Ki, which exceeds its request of 0.
  • Garbage Collection tuning, docker pruning, oc adm pruning, and editing the eviction thresholds have all been attempted to no avail.
  • Disk pressure errors are not present.

Resolution

  • In some cases, this is because an excess of log messages are consuming the storage. Configure the Docker logging driver to limit the amount of stored logs:
{
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "5"
}
}
  • In other cases, pods that use emptyDir without storage quotas will fill up this storage, where the following error is present:
eviction manager: attempting to reclaim ephemeral-storage
  • Set a quota to limit this, as otherwise any container can write any amount of storage to its node filesystem.

Root Cause

  • The pod logs, or emptyDir usage, are filling up ephemeral storage.
  • This KCS addresses the quota and /var filesystem more directly, as it's also an option to just grow the /var filesystem to fix this.

Diagnostic Steps

Check the containers running in the node:
#docker ps|cut -f1 -d ' '

Find where the data is stored:
#docker inspect <IDofContainer> --format='{{.LogPath}}'

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

1 Comments

Hi.

I found that when existing many evicted pods in a node, it must apply this solution, drain the node, restart docker and atomic-openshift-node service and redeploy the affected pods.

Thanks.