Fluentd collector pod is taking long time to complete start-up and to have the metrics endpoint ready, triggering CollectorNodeDown alert to fire

Solution Verified - Updated -

Issue

  • Since OpenShift Container Platform 4 - Cluster Logging 5.8, we are observing lots of CollectorNodeDown alerts because of the below error.

    "Get \"https://x.x.x.x:24231/metrics\": dial tcp x.x.x.x:24231: connect: connection refused"
    
  • There is an increased number of CollectorNodeDown alerts firing when collector pods are restarting or new Nodes are being added to OpenShift. When checking, we found that the metrics endpoint is taking a long time to become available, causing the target to be reported as DOWN.

Environment

  • Red Hat OpenShift Container Platform
    • 4
  • Red Hat OpenShift Logging
    • 5.8

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content