AlertManager is firing an alert named: "KubeJobCompletion"
Environment
- Red Hat OpenShift Container Platform
- 4.5
Issue
- What is the meaning of the alert
KubeJobCompletion
? - Is this alert
critical
?
Resolution
- Alert KubeJobCompletion means some
job pod
in the cluster did not complete insuccess
due to some reason. - Although this alert's severity is shown
Warning
, it is a good practise to run Diagnostic Steps and look more into the `job pods to ensure cluster health is intact.
Diagnostic Steps
- Check the state of
all job
pods. They are expected to be inCompleted or Succeeded
state.
$ oc get pods -A | grep -v completed
openshift-kube-apiserver revision-pruner-42-hostname-q9zcj-master-0 0/1 Succeeded 0 11d
openshift-kube-apiserver revision-pruner-43-hostname-q9zcj-master-0 0/1 Succeeded 0 11d
openshift-kube-apiserver revision-pruner-43-hostname-q9zcj-master-1 0/1 Succeeded 0 11d
openshift-kube-apiserver revision-pruner-43-hostname-q9zcj-master-2 0/1 Succeeded 0 11d
openshift-etcd revision-pruner-6-hostname-q9zcj-master-0 0/1 Succeeded 0 11d
openshift-etcd revision-pruner-6-hostname-q9zcj-master-1 0/1 Succeeded 0 11d
openshift-kube-scheduler revision-pruner-26-hostname-q9zcj-master-0 0/1 Succeeded 0 11d
openshift-kube-scheduler revision-pruner-26-hostname-master-q9zcj-1 0/1 Succeeded 0 11d
openshift-kube-scheduler revision-pruner-26-hostname-q9zcj-master-2 0/1 Succeeded 0 11d
openshift-kube-apiserver revision-pruner-44-hostname-q9zcj-master-0 0/1 Succeeded 0 2d
openshift-kube-apiserver revision-pruner-44-hostname-q9zcj-master-1 0/1 Succeeded 0 2d
openshift-kube-apiserver revision-pruner-44-hostname-q9zcj-master-2 0/1 Succeeded 0 2d
openshift-kube-apiserver installer-44-hostname-q9zcj-master-2 0/1 Succeeded 0 2d
openshift-image-registry image-pruner-1607040000-pqrcr 0/1 Succeeded 0 2d
openshift-image-registry image-pruner-1607126400-s7w8c 0/1 Succeeded 0 1d
openshift-kube-apiserver revision-pruner-45-hostname-q9zcj-master-0 0/1 Succeeded 0 5h53m
openshift-kube-apiserver installer-45-hostname-q9zcj-master-1 0/1 Succeeded 0 6h6m
openshift-kube-apiserver installer-45-hostname-q9zcj-master-2 0/1 Succeeded 0 6h2m
openshift-kube-apiserver revision-pruner-45-hostname-q9zcj-master-1 0/1 Succeeded 0 6h2m
openshift-kube-apiserver revision-pruner-45-hostname-q9zcj-master-2 0/1 Succeeded 0 5h58m
openshift-kube-apiserver installer-45-hostname-q9zcj-master-0 0/1 Succeeded 0 5h57m
openshift-image-registry image-pruner-1607212800-cgcnr 0/1 Succeeded 0 3h44m
- If job pods are not in
Completed or Succeeded
state, troubleshoot them further by checking theirstatus.conditions
using below commands. In this case,image-pruner
job pod example is used.
$ oc get pod image-pruner-1607212800-cgcnr -oyaml
-------------------------------------8<---------------------------------------
status:
conditions:
- lastTransitionTime: '2020-11-24T14:31:00Z'
message: 'NodeControllerDegraded: All master nodes are ready'
reason: AsExpected
status: 'False'
type: Degraded
- lastTransitionTime: '2020-12-05T21:51:05Z'
message: 'NodeInstallerProgressing: 3 nodes are at revision 45'
reason: AsExpected
status: 'True'
type: Available
- lastTransitionTime: '2020-08-20T14:46:03Z'
message: 'StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 45'
reason: AsExpected
status: 'True'
type: Available
- lastTransitionTime: '2020-08-20T14:43:27Z'
reason: AsExpected
status: 'True'
type: Upgradeable
------------------------------------->8---------------------------------------
- From above outputs it can be seen that
image-pruner
job-pod was successfully completed and it's current condition show its functional and available in the cluster.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments