Changing the maximum number of process IDs per pod (podPidsLimit) for ROSA

Updated -

Understanding process ID limits in Managed OpenShift

A process identifier (PID) is a unique identifier assigned by the Linux kernel to each process or thread currently running on a system. The number of processes that can run simultaneously on a system is limited to 4,194,304 by the Linux kernel, and may additionally be affected when PIDs are reserved by the system for stability purposes, or by limited access to other system resources such as memory, CPU, and disk space.

In Managed OpenShift, there are two supported limits for PID usage to consider when you schedule work on your cluster:

  • the maximum number of PIDs per pod (default: 4,096 in OpenShift 4.11 and higher)
    • This is controlled by the podPidsLimit parameter set on the node.
  • the maximum number of PIDs per node (default depends on node resources)
    • In OpenShift, this is controlled by the --system-reserved parameter, which reserves PIDs on each node based on the total resources of the node.

When a pod exceeds the allowed maximum number of PIDs per pod, the pod may stop functioning correctly and may be evicted from the node. See the Kubernetes documentation for more information on eviction: Eviction signals and thresholds.

When a node exceeds the allowed maximum number of PIDs per node, the node can become unstable because new processes cannot have PIDs assigned. If existing processes cannot complete without creating additional processes, the entire node may become unusable and require reboot. This may result in data loss, depending on the processes and applications being run. Customer administrators and Red Hat Site Reliability Engineering are notified when this threshold is reached, and you will notice a "Worker node is experiencing PIDPressure" warning in cluster logs.

Risks of setting higher process ID limits for ROSA pods

The podPidsLimit parameter for a pod controls the maximum number of processes and threads that can run simultaneously in that pod.

You can increase the value for podPidsLimit from the default of 4,096 to a maximum of 16,384. Changing this value may incur downtime for applications, because changing the podPidsLimit requires rebooting the affected node.

If you are running a large number of pods per node, and you have a high podPidsLimit value on your nodes, you risk exceeding the PID maximum for the node.

To find the maximum number of pods that you can run simultaneously on a single node without exceeding the PID maximum, divide 3,650,000 by your podPidsLimit value. For example, if your podPidsLimit value is 1,048,576, and you expect the pods to use close to that number of process IDs, you can safely run three pods on a single node.

Note that memory, CPU, and available storage can also limit the maximum number of pods that can run simultaneously, even when podPidsLimit is set appropriately. For more information, see Planning your environment and Limits and scalability in the ROSA documentation.

Setting a higher podPidsLimit on an existing ROSA cluster

You can set a higher podPidsLimit on an existing ROSA cluster by creating or editing a kubeletconfig that changes the --pod-pids-limit parameter.

IMPORTANT:
Changing the podPidsLimit on an existing cluster will trigger non-control plane nodes in the cluster to reboot one at a time. Red Hat recommends making this change outside of peak usage hours for your cluster, and avoiding upgrading or hibernating your cluster until all nodes have rebooted.

The first time you change a default value, you also need to create the kubeletconfig:

$ rosa create kubeletconfig -c <cluster_name> --pod-pids-limit=<value>

When you make subsequent changes, you can edit the existing kubeletconfig:

$ rosa edit kubeletconfig -c <cluster_name> --pod-pids-limit=<value>

For example, to set a maximum of 16,384 PIDs per pod when you have never changed the default value:

$ rosa create kubeletconfig -c <cluster_name> --pod-pids-limit=16384

This triggers a rolling reboot of worker nodes in the cluster.

When each node in the cluster has been rebooted, you can verify that the new setting is in place by checking a pod's kubelet.conf file:

$ oc debug node/<node_name>
$ chroot /host
$ grep podPidsLimit /etc/kubernetes/kubelet.conf

You should see the new podPidsLimit in the output, for example:

sh-5.1# cat /etc/kubernetes/kubelet.conf | grep podPidsLimit
"podPidsLimit": 16384

Migrating to a higher podPidsLimit on an existing ROSA cluster

In Red Hat OpenShift Service on AWS 4.11 and later, you can set a custom PIDs limit for your ROSA cluster by using the ROSA CLI to create a kubeletconfig and set the podPidsLimit parameter for each worker/infra node in the cluster. If you previously configured your cluster to add a custom PIDs limit through a support exception, you should instead set the PIDs limit using the ROSA CLI and then remove your customization.

IMPORTANT:
To change the podPidsLimit in an existing cluster, it must be running Red Hat OpenShift Service on AWS 4.11 or later. Changing the podPidsLimit on an existing cluster will trigger non-control plane nodes in the cluster to reboot one at a time. Red Hat recommends making this change outside of peak usage hours for your cluster, and avoiding upgrading or hibernating your cluster until all nodes have rebooted.

Determine your current, custom PIDs limit:

$ oc debug node/<node_name>
sh-4.4<br><br>\# chroot /host
sh-5.1<br><br>\# grep pod /etc/kubernetes/kubelet.conf 
  "podPidsLimit": 4096, 

Use the ROSA CLI to create a new kubeletconfig with a podPidsLimit that matches your current, customized limit:

$ rosa create kubeletconfig -c <cluster_name> --pod-pids-limit=<value>

Note: If you set the podPidsLimit value to be different from your previous, customized limit, then this change will trigger a rolling reboot of the worker nodes in the cluster.

Verify that the PIDs limit is the same as your previous customization by checking a pod's kubelet.conf file:

$ oc debug node/<node_name>
$ chroot /host
$ grep podPidsLimit /etc/kubernetes/kubelet.conf

Remove the customization that you added previously to set a custom PIDs limit. For example, if you previously created a custom kubeletconfig to set the PIDs limit, then you should remove that kubeletconfig.

Comments