Cluster-logging pods with tolerations getting frequently evicted in OCP 4.x
Issue
- Elasticsearch pods restarting in almost every 2 hours, even if the health of ES cluster is green and the cluster-logging components are working as expected.
- Events observed in
openshift-logging
namespace where taint manager is evicting the elasticsearch pod.
9m13s Normal TaintManagerEviction pod/elasticsearch-cdm-xxx Marking for deletion Pod openshift-logging/elasticsearch-cdm-xxx
- Some pods inside
openshift-logging
namespace are getting recreated after certain amount of time, and having zerorestartCount
.
$ oc get pods -n openshift-logging -owide
NAME READY STATUS RESTARTS AGE IP NODE
cluster-logging-operator-xxx 1/1 Running 0 10d 10.1.1.1 worker-1
elasticsearch-cdm-xxx 2/2 Running 0 52m 10.1.1.2 logging-1
elasticsearch-cdm-xxx 2/2 Running 0 52m 10.1.1.3 logging-2
elasticsearch-cdm-xxx 2/2 Running 0 52m 10.1.1.4 logging-3
fluentd-xxx 1/1 Running 0 52m 10.1.1.5 master-0
fluentd-xxx 1/1 Running 0 52m 10.1.1.5 master-1
kibana-xxx 2/2 Running 0 52m 10.1.1.2 logging-1
Environment
- Red Hat OpenShift Container Platform
- 4.x
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.