How to detect pods which have overstepped their ephemeral-storage limit
Environment
- Red Hat OpenShift Container Platform 4.
Issue
At the time when this is being written, there is not any specific pod status to indicate that it has violated its ephemeral-storage limit. When that happens, the status of the pod becomes ContainerStatusUnknown.
Resolution
A pod in ContainerStatusUnknown status has overstepped its ephemeral-storage limit if there is an event in the project with the following text (see message):
- apiVersion: v1
count: 1
eventTime: null
firstTimestamp: "<date>"
involvedObject:
apiVersion: v1
kind: Pod
name: <pod_name>
namespace: sre-ci-test
resourceVersion: "1359040712"
uid: 425703de-3fdd-42b7-a34c-858ddb2aed8a
kind: Event
lastTimestamp: "2022-07-19T09:10:13Z"
message: 'Pod ephemeral local storage usage exceeds the total limit of containers 0. '
[...]
Root Cause
SRVKP-2552 was created to request that pods killed due to this reason get a specific status instead of the generic ContainerStatusUnknown.
Diagnostic Steps
The following command can be executed to search for these messages in a project:
oc -n <project_name> event | grep -F 'Pod ephemeral local storage usage exceeds the total limit of containers'
Output example:
1h47m Warning Evicted pod/<pod_name> Pod ephemeral local storage usage exceeds the total limit of containers 0.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments