The Ingress operator in degraded state
Environment
- Red Hat OpenShift Container Platform (RHOCP) 4.x
Issue
-
The cluster operator
ingress
is in the degraded state with below error:
Operator: 'ingress' Issue : Degraded Reason : IngressDegraded Message : The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod "router-default-855b6c99cb-b5zq9" cannot be scheduled: 0/25 nodes are available: 19 node(s) didn't match Pod's node affinity/selector, 3 node(s) didn't have free ports for the requested pod ports, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. Make sure you have sufficient worker nodes.), DeploymentReplicasAllAvailable=False (DeploymentReplicasNotAvailable: 2/3 of replicas are available)
Resolution
-
Force delete the pending router default ingress pod:
$ oc delete pod <router-default-XXXX> -n openshift-ingress
Root Cause
- The degradation is due to the router pod in the
Pending
state. The router-default pod is trying to roll out but as one of the pod is in the terminating state, the new pods do not start (due to conflicting ports occupied). To get around this issue, force delete the terminating router-default pod and kill the process on the corresponding infra node.
Diagnostic Steps
-
Get the below outputs for openshift-ingress namespace
$ oc get pod -n openshift-ingress -o wide -l ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default NAME READY STATUS RESTARTS AGE IP NODE router-default-855b6c99cb-56mxh 0/1 Terminating 1 12h 10.xx.xx.xxx <nodename> router-default-855b6c99cb-5xnfs 1/1 Running 0 165d 10.xx.xx.xxx <nodename> router-default-855b6c99cb-b5zq9 0/1 Pending 0 36m router-default-855b6c99cb-nzn49 1/1 Running 0 165d 10.xx.xx.xxx <nodename>
-
Check the openshift-ingress namespace events:
$ oc get event -n openshift-ingress Unknown Warning FailedScheduling pod/router-default-855b6c99cb-b5zq9 0/25 nodes are available: 19 node(s) didn't match Pod's node affinity/selector, 3 node(s) didn't have free ports for the requested pod ports, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments