Red Hat Training

A Red Hat training course is available for OpenShift Container Platform

Chapter 42. Application ephemeral storage sizing

42.1. Overview

Note

This section applies only if you enabled the ephemeral storage technology preview. This feature is disabled by default. To enable this feature, see configuring for ephemeral storage.

Note

Technology Preview releases are not supported with Red Hat production service-level agreements (SLAs) and might not be functionally complete, and Red Hat does not recommend using them for production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process. For more information see link:https://access.redhat.com/support/offerings/techpreview/ [Red Hat Technology Preview Features Support Scope].

You can use ephemeral storage to:

  • Determine the ephemeral storage and risk requirements of a containerized application component and configure the container ephemeral storage parameters to suit those requirements.
  • Configure containerized application runtimes, for example, OpenJDK, to adhere to the configured container ephemeral storage parameters.
  • Diagnose and resolve ephemeral storage-related error conditions that are associated with running the storage in a container.

42.2. Background

Note

Before you use ephemeral storage, review how OpenShift Container Platform uses compute resources.

For the purposes of sizing application ephemeral storage, the key points are:

  • For each kind of resource, including memory, CPU, storage, and ephemeral storage, OpenShift Container Platform allows optional request and limit values to be placed on each container in a pod.

    Ephemeral storage request
  • The ephemeral storage request value, if specified, influences the OpenShift Container Platform scheduler. The scheduler considers the ephemeral storage request when scheduling a container to a node, then fences off the requested ephemeral storage on the chosen node for the use of the container.

    Ephemeral storage limit
  • The ephemeral storage limit value, if specified, provides a hard limit on the ephemeral storage that can be allocated across all the processes in a container.
  • If both ephemeral storage request and limit are specified, the ephemeral storage limit value must be greater than or equal to the ephemeral storage request.

    Administration
  • The cluster administrator may assign quota against the ephemeral storage request value, limit value, both, or neither.
  • The cluster administrator may assign default values for the ephemeral storage request value, limit value, both, or neither.
  • The cluster administrator may override the ephemeral storage request values that a developer specifies, in order to manage cluster overcommit. This occurs on OpenShift Online, for example.

42.3. Strategy

To size application ephemeral storage on OpenShift Container Platform:

  1. Determine expected container ephemeral usage.

    If your administrator enabled the ephemeral storage technology preview, determine expected mean and peak container ephemeral storage usage, empirically if necessary, for example, by separate extended. Remember to consider all the processes that might potentially run in parallel in the container. For example, does the main application spawn any ancillary scripts that may require local storage for work files or logging?

  2. Assess risk for eviction.

    Determine the risk appetite for eviction. If the risk appetite is low, set the container to request ephemeral storage according to the expected peak usage plus a percentage safety margin. If the risk appetite is higher, set the container to request ephemeral storage according to the expected mean usage.

  3. Set container ephemeral storage request.

    Set the container ephemeral storage request based on your risk assessment. The more accurately the request represents the application ephemeral storage usage, the better. If the request is too high, cluster and quota usage will be inefficient. If the request is too low, the chances of application eviction increase.

  4. Set container ephemeral storage limits, if required.

    Set container ephemeral storage limits, if required. Setting a limit has the effect of immediately stopping a container process if the combined ephemeral storage usage of all processes in the container exceeds the limit. For example, the container might make unanticipated excess ephemeral storage usage obvious early, that is, fail fast, or the container may stop the processes abruptly.

    Note

    Some OpenShift Container Platform clusters may require a limit value to be set; some may override the request based on the limit; and some application images rely on a limit value being set as this is easier to detect than a request value.

    If these limits are set, they should not be set to less than the expected peak container resource usage plus a percentage safety margin.

  5. Tune the application.

    Ensure that the application is tuned with respect to configured request and limit values, if appropriate. This step is particularly relevant to applications that pool ephemeral storage.

42.4. Diagnosing an evicted pod

OpenShift Container Platform might evict a pod from its node when the node’s ephemeral storage is exhausted. Depending on the extent of ephemeral storage exhaustion, the eviction might or might not be graceful. In graceful eviction, the main process, PID 1, of each container receives a SIGTERM signal, then some time later, a SIGKILL signal if the process is still running. In non-graceful eviction, the main process of each container immediately receives a SIGKILL signal.

To get a list of all pods so you can review their statuses:

$ oc get pod test
NAME      READY     STATUS    RESTARTS   AGE
test      0/1       Evicted   0          1m

$ oc get pod test -o yaml
...
status:
  message: 'Pod The node was low on resource: [DiskPressure].'
  phase: Failed
  reason: Evicted

An evicted pod has phase Failed and reason Evicted. The evicted pod is not restarted, regardless of the value of restartPolicy. However, controllers such as the ReplicationController do notice the pod’s failed status and create a new pod to replace the old one.