Pods stuck in "Pending" state due to volume node affinity conflict
Environment
- Red Hat OpenShift Service on AWS (ROSA)
- 4
- Red Hat OpenShift Dedicated (OSD) on AWS
- 4
- Red Hat OpenShift Container Platform (OCP) on AWS
- 4
Issue
- Pod could not be scheduled on nodes due to the following error:
0/14 nodes are available: 1 Insufficient cpu, 3 Insufficient memory, 3 node(s) had taint {node-role.kubernetes.io/infra: }, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 4 node(s) had volume node affinity conflict.
- Pods stuck in the "Pending" state due to volume node affinity conflict.
- Pods are in "Pending" state after cluster upgrade.
Resolution
EBS volumes in AWS are tied to a specific Availability Zone (AZ). When a Persistent Volume (PV) is created and backed by an EBS volume, it is inherently created within a specific AZ. Openshift uses the concept of volume node affinity to ensure that Pods using a given PV are scheduled within the same AZ where the EBS volume exists.
Now, when we have a multi-AZ Openshift cluster and a Pod is scheduled in an AZ different from where the EBS volume exists, the Pod won't start because EBS volumes are not accessible across different AZs. This is known as the "volume node affinity conflict".
This can be resolved by choosing any one of the below options:
Option 1:
Increasing the number of nodes in the AZ where the EBS volume is located. This would allow the pods to be scheduled in the same AZ as the EBS volume. (Recommended)
Option 2:
Identify the currently available nodes in the AZ where the EBS volume is located and free up some resources from that node by moving some pods to a node available in different AZ.
Note: One should choose option 2 only when other nodes (in different AZ) with free resources are available in the cluster and some workloads can be shifted to these nodes to free up resources on the nodes in required AZ, otherwise go with the Option 1.
If you need help from Red Hat, please open a support case with us by clicking here.
Root Cause
When we have a multi-AZ Openshift cluster and a Pod is scheduled in an AZ different from where the EBS volume exists, the pod won't start because EBS volumes are not accessible across different AZs. This is known as "volume node affinity conflict". When you have a node that is currently unused in another AZ, it's important to know that you can't just "move" the PV (and hence the EBS volume) to that AZ. The EBS volume is tied to the AZ where it was originally created, and it cannot be moved to another AZ. Therefore, even though you have an available node in a different AZ, the pod cannot be scheduled there because its EBS-backed PV is tied to a different AZ. This is due to the nature of EBS itself, which is inherently bound to a specific AZ for reliability and performance reasons.
Diagnostic Steps
- Check the impacted pod's YAML to see the name of PVC being used:
oc get pod <pod-name> -o yaml
- Check the YAML output of PVC to get the associated PV name. The volume name is present under spec.volumeName:
oc get pvc <pvc-name> -o yaml
- Check the PV's volume node affinity:
oc describe pv <pv-name>
'''output omitted''
Node Affinity:
Required Terms:
Term 0: topology.ebs.csi.aws.com/zone in [us-east-1b]
In the above example, the AZ is "us-east-1b" which means that this pod can only be scheduled on a node in "us-east-1b" zone.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments