Infra node workloads are Unschedulable due to Volume Node Affinity Conflict
Environment
- Azure Red Hat OpenShift Azure Red Hat OpenShift
- 4.x
Issue
- Infra node workloads are unschedulable due to node affinity and taint conflicts.
- While migrating logging stack or monitoring stack deployments to new infra nodes, while intending to retain their existing storage volumes, the following event is seen in the respective namespace:
0/18 nodes are available: 10 node(s) didn't match Pod's node affinity/selector, 2 Insufficient cpu, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) had volume node affinity conflict. preemption: 0/18 nodes are available: 16 Preemption is not helpful for scheduling, 2 No preemption victims found for incoming pod.
Resolution
Ensure that the machineset
for the infrastructure nodes matches the Azure availability zones for any existing storage volumes you wish to retain.
.spec.template.spec.providerSpec.value.zone
must contain the appropriate Azure zone numeral and must not be null.
Please refer to the Deploy infrastructure nodes in the Azure Red Hat OpenShift documentation for explicit guidance.
Root Cause
If infrastructure nodes are created with a machineset
specification that does NOT include a valid zone:
in the providerSpec:
, Azure may place the machines in any availability zone in the region. This is known as a nonzonal
deployment.
When relocating qualified workloads to newly created infra nodes, it is important that the new infra nodes are located in the same availability zones as any storage volumes associated with those workloads.
Diagnostic Steps
Identify the pods that are not running, in this example Loki logging stack pods:
oc get po -n openshift-logging | Select-String Pending
NAME READY STATUS RESTARTS AGE
logging-loki-index-gateway-1 0/1 Pending 0 44h
logging-loki-ingester-1 0/1 Pending 0 44h
logging-loki-ruler-1 0/1 Pending 0 38h
Identify the zones where the logging stack PVC and PV reside:
oc get pods -n openshift-logging logging-loki-ingester-1 -o jsonpath='{.spec.volumes[?(@.persistentVolumeClaim)]}' | jq .
{
"name": "storage",
"persistentVolumeClaim": {
"claimName": "storage-logging-loki-ingester-1"
}
}
oc get pv (oc get pvc -n openshift-logging storage-logging-loki-ingester-1 -o json | jq .items[].spec.volumeName -r) -o json | jq .spec.nodeAffinity
{
"required": {
"nodeSelectorTerms": [
{
"matchExpressions": [
{
"key": "topology.disk.csi.azure.com/zone",
"operator": "In",
"values": [
"eastus-1" <== Note zone
]
},
{
"key": "topology.kubernetes.io/region",
"operator": "In",
"values": [
"eastus"
]
...
}
Check the machineset specification:
oc get machineset -n openshift-machine-api
NAME DESIRED CURRENT READY AVAILABLE AGE
mycluster-infra-eastus 2 2 2 2 5d
oc get machineset mycluster-infra-eastus -n openshift-machine-api -oyaml
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
labels:
machine.openshift.io/cluster-api-cluster: mycluster
machine.openshift.io/cluster-api-machine-role: infra
machine.openshift.io/cluster-api-machine-type: infra
name: mycluster-infra-eastus
namespace: openshift-machine-api
spec:
...
template:
...
spec:
...
providerSpec:
...
zone: "" <== Note empty zone
...
In this example, the storage volume for the logging-loki-ingester-1
pod resides in zone eastus-1
but no infra nodes are present in that availability zone and therefore the scheduler reports the volume node affinity conflict
error.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments