Infra node workloads are Unschedulable due to Volume Node Affinity Conflict

Solution Verified - Updated -

Environment

  • Azure Red Hat OpenShift Azure Red Hat OpenShift
    • 4.x

Issue

  • Infra node workloads are unschedulable due to node affinity and taint conflicts.
  • While migrating logging stack or monitoring stack deployments to new infra nodes, while intending to retain their existing storage volumes, the following event is seen in the respective namespace:
0/18 nodes are available: 10 node(s) didn't match Pod's node affinity/selector, 2 Insufficient cpu, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) had volume node affinity conflict. preemption: 0/18 nodes are available: 16 Preemption is not helpful for scheduling, 2 No preemption victims found for incoming pod.

Resolution

Ensure that the machineset for the infrastructure nodes matches the Azure availability zones for any existing storage volumes you wish to retain.

.spec.template.spec.providerSpec.value.zone must contain the appropriate Azure zone numeral and must not be null.

Please refer to the Deploy infrastructure nodes in the Azure Red Hat OpenShift documentation for explicit guidance.

Root Cause

If infrastructure nodes are created with a machineset specification that does NOT include a valid zone: in the providerSpec:, Azure may place the machines in any availability zone in the region. This is known as a nonzonal deployment.

When relocating qualified workloads to newly created infra nodes, it is important that the new infra nodes are located in the same availability zones as any storage volumes associated with those workloads.

Diagnostic Steps

Identify the pods that are not running, in this example Loki logging stack pods:

oc get po -n openshift-logging | Select-String Pending
NAME                                            READY   STATUS      RESTARTS   AGE
logging-loki-index-gateway-1                    0/1     Pending     0          44h
logging-loki-ingester-1                         0/1     Pending     0          44h
logging-loki-ruler-1                            0/1     Pending     0          38h

Identify the zones where the logging stack PVC and PV reside:

oc get pods -n openshift-logging logging-loki-ingester-1 -o jsonpath='{.spec.volumes[?(@.persistentVolumeClaim)]}' | jq .
{
  "name": "storage",
  "persistentVolumeClaim": {
    "claimName": "storage-logging-loki-ingester-1"
  }
}

oc get pv (oc get pvc -n openshift-logging storage-logging-loki-ingester-1 -o json | jq .items[].spec.volumeName -r) -o json | jq .spec.nodeAffinity
{
  "required": {
    "nodeSelectorTerms": [
      {
        "matchExpressions": [
          {
            "key": "topology.disk.csi.azure.com/zone",
            "operator": "In",
            "values": [
              "eastus-1"                <== Note zone
            ]
          },
          {
            "key": "topology.kubernetes.io/region",
            "operator": "In",
            "values": [
              "eastus"
            ]
...
}

Check the machineset specification:

oc get machineset -n openshift-machine-api
NAME                      DESIRED    CURRENT    READY    AVAILABLE    AGE
mycluster-infra-eastus    2          2          2        2            5d

oc get machineset mycluster-infra-eastus -n openshift-machine-api -oyaml
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
  labels:
    machine.openshift.io/cluster-api-cluster: mycluster
    machine.openshift.io/cluster-api-machine-role: infra
    machine.openshift.io/cluster-api-machine-type: infra
  name: mycluster-infra-eastus
  namespace: openshift-machine-api
spec:
...
  template:
  ...
    spec:
      ...
      providerSpec:
        ...
        zone: ""                <== Note empty zone
...

In this example, the storage volume for the logging-loki-ingester-1 pod resides in zone eastus-1 but no infra nodes are present in that availability zone and therefore the scheduler reports the volume node affinity conflict error.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments