fluentd pods cannot be scraped by cluster monitoring after upgrade and node failure

Solution Verified - Updated 2024-06-13T23:20:56+00:00 -

Issue

OCP console alerting screen shows all fluentd pods cannot be scraped.
fluentd target is showing down in Prometheus UI.error as below:


"Get \"https://10.x.x.x:00000/metrics\": dial tcp 10.x.x.x:00000: connect: connection refused"

fluentd metric not available and alerts are firing as the target is down as shown below, whereas expected behaviour is the fluentd pod metric be able to scrape by Prometheus.


fluentd pod are running 

fluentd-2dbd7                                   1/1     Running     0          
fluentd-8rtxj                                   1/1     Running     0          
fluentd-bn9s5                                   1/1     Running     0          
fluentd-fxhsd                                   1/1     Running     0          
fluentd-kjn29                                   1/1     Running     0          
fluentd-x8cz6                                   1/1     Running     0

Environment

OpenShift Container Platform (OCP) 4.6

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

fluentd pods cannot be scraped by cluster monitoring after upgrade and node failure

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links