Elasticsearch shows duplicated logs and poor performance in OpenShift Container Platform

Solution Unverified - Updated -

Issue

  • In Kibana it is sometime visible a high amount of logs for a specific period of time, specially in the .operations indices.
  • It is visible duplicated logs over the time
  • After the cluster has gone through a period of stress, Elasticsearch takes a lot of time to recover and indices size grow considerably
  • In fluentd pods logs any of these timeout messages appear frequently:
2017-09-25 16:07:41 +0200 [warn]: buffer flush took longer time than slow_flush_log_threshold: plugin_id="object:13c4370" elapsed_time=36.999487413 slow_flush_log_threshold=20.0
2017-09-25 16:07:38 +0200 [warn]: Could not push logs to Elasticsearch, resetting connection and trying again. read timeout reached
2017-09-25 16:23:59 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2017-09-25 16:20:37 +0200 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Could not push logs to Elasticsearch after 2 retries. read timeout reached" plugin_id="object:13c4370"

Environment

  • OpenShift Container Platform
  • logging-fluentd images
    • 3.4.1
    • 3.5.0

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.