Elasticsearch shows duplicated logs and poor performance in OpenShift Container Platform
Issue
- In Kibana it is sometime visible a high amount of logs for a specific period of time, specially in the
.operationsindices. - It is visible duplicated logs over the time
- After the cluster has gone through a period of stress, Elasticsearch takes a lot of time to recover and indices size grow considerably
- In fluentd pods logs any of these timeout messages appear frequently:
2017-09-25 16:07:41 +0200 [warn]: buffer flush took longer time than slow_flush_log_threshold: plugin_id="object:13c4370" elapsed_time=36.999487413 slow_flush_log_threshold=20.0
2017-09-25 16:07:38 +0200 [warn]: Could not push logs to Elasticsearch, resetting connection and trying again. read timeout reached
2017-09-25 16:23:59 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2017-09-25 16:20:37 +0200 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Could not push logs to Elasticsearch after 2 retries. read timeout reached" plugin_id="object:13c4370"
Environment
- OpenShift Container Platform
- logging-fluentd images
- 3.4.1
- 3.5.0
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.