The Ingress operator in degraded state

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Container Platform (RHOCP) 4.x

Issue

  • The cluster operator ingress is in the degraded state with below error:


    Operator: 'ingress' Issue : Degraded Reason : IngressDegraded Message : The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod "router-default-855b6c99cb-b5zq9" cannot be scheduled: 0/25 nodes are available: 19 node(s) didn't match Pod's node affinity/selector, 3 node(s) didn't have free ports for the requested pod ports, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. Make sure you have sufficient worker nodes.), DeploymentReplicasAllAvailable=False (DeploymentReplicasNotAvailable: 2/3 of replicas are available)

Resolution

  • Force delete the pending router default ingress pod:

    $ oc delete pod <router-default-XXXX> -n openshift-ingress
    

Root Cause

  • The degradation is due to the router pod in the Pending state. The router-default pod is trying to roll out but as one of the pod is in the terminating state, the new pods do not start (due to conflicting ports occupied). To get around this issue, force delete the terminating router-default pod and kill the process on the corresponding infra node.

Diagnostic Steps

  • Get the below outputs for openshift-ingress namespace

    $ oc get pod -n openshift-ingress -o wide -l ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default
    
      NAME                                  READY  STATUS      RESTARTS  AGE   IP             NODE
      router-default-855b6c99cb-56mxh       0/1    Terminating  1         12h   10.xx.xx.xxx   <nodename>
      router-default-855b6c99cb-5xnfs       1/1    Running      0         165d  10.xx.xx.xxx   <nodename>
      router-default-855b6c99cb-b5zq9       0/1    Pending      0         36m
      router-default-855b6c99cb-nzn49       1/1    Running      0         165d  10.xx.xx.xxx   <nodename>
    
  • Check the openshift-ingress namespace events:

    $ oc get event -n openshift-ingress
    
     Unknown    Warning  FailedScheduling      pod/router-default-855b6c99cb-b5zq9   0/25 nodes are available: 19 
      node(s) didn't match Pod's node affinity/selector, 3 node(s) didn't have free ports for the requested pod 
      ports, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments