Elasticsearch shows duplicated logs and poor performance in OpenShift Container Platform
Issue
- In Kibana it is sometime visible a high amount of logs for a specific period of time, specially in the
.operationsindices. - It is visible duplicated logs over the time
- After the cluster has gone through a period of stress, Elasticsearch takes a lot of time to recover and indices size grow considerably
- In fluentd pods logs any of these timeout messages appear frequently:
2017-09-25 16:07:41 +0200 [warn]: buffer flush took longer time than slow_flush_log_threshold: plugin_id="object:13c4370" elapsed_time=36.999487413 slow_flush_log_threshold=20.0
2017-09-25 16:07:38 +0200 [warn]: Could not push logs to Elasticsearch, resetting connection and trying again. read timeout reached
2017-09-25 16:23:59 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2017-09-25 16:20:37 +0200 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Could not push logs to Elasticsearch after 2 retries. read timeout reached" plugin_id="object:13c4370"
Environment
- OpenShift Container Platform
- logging-fluentd images
- 3.4.1
- 3.5.0
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
