Cluster-logging pods with tolerations getting frequently evicted in OCP 4.x

Solution Verified - Updated -

Issue

  • Elasticsearch pods restarting in almost every 2 hours, even if the health of ES cluster is green and the cluster-logging components are working as expected.
  • Events observed in openshift-logging namespace where taint manager is evicting the elasticsearch pod.
9m13s      Normal   TaintManagerEviction    pod/elasticsearch-cdm-xxx   Marking for deletion Pod openshift-logging/elasticsearch-cdm-xxx
  • Some pods inside openshift-logging namespace are getting recreated after certain amount of time, and having zero restartCount.
$ oc get pods -n openshift-logging -owide
NAME                              READY   STATUS   RESTARTS   AGE     IP              NODE                      
cluster-logging-operator-xxx       1/1       Running    0    10d     10.1.1.1       worker-1    
elasticsearch-cdm-xxx             2/2      Running      0     52m   10.1.1.2      logging-1   
elasticsearch-cdm-xxx             2/2      Running      0      52m   10.1.1.3      logging-2  
elasticsearch-cdm-xxx            2/2      Running      0      52m   10.1.1.4      logging-3   
fluentd-xxx                        1/1       Running     0      52m   10.1.1.5       master-0        
fluentd-xxx                       1/1       Running     0      52m   10.1.1.5      master-1        
kibana-xxx                        2/2    Running       0       52m    10.1.1.2      logging-1   

Environment

  • Red Hat OpenShift Container Platform
    • 4.x

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content