Prometheus could not scrape fluentd collector metric in dualstack cluster

Environment

Red Hat OpenShift Container Platform (RHOCP)
- 4.12+
Red Hat OpenShift Logging (RHOL)
- 5.8+
Fluentd

Issue

Prometheus could not scrape fluentd collector metric in dualstack cluster.
Alert "CollectorNodeDown" continuously firing for all fluentd collectors in dualstack cluster.

Resolution

The Issue was reported to the engineering team as a bug and can be traced under - LOG-5106.
Below are the workarounds for this issue:
- Workaround 1:
  - Migrate from fluentd to vector collector, as the issue is specific to the fluentd collector.
- Workaround 2:
  - Downgrade RHOL to version 5.7 as issue is encountered only in version 5.8.
- Workaround 3:
  
  This needs to go through Unmanaged status. Read considerations about the Unmanaged status in the Documentation section "Unsupported configurations".
  - Step 1: Put clusterlogging in Unmanaged:
```
$  oc -n openshift-logging patch clusterlogging/instance -p '{"spec":{"managementState": "Unmanaged"}}' --type=merge
```
  - Step 2: Take backup of collector-config configmap:
```
$ oc -n openshift-logging get cm collector-config -oyaml > collector-config.bkp
```
  - Step 3: Modify line: bind "# {ENV['PROM_BIND_IP']}" to bind "0.0.0.0" in collector-config configmap:
```
$ oc edit cm collector-config -n openshift-logging
```
  - Step 4: Save the configmap and restart the collector pods:
```
$ oc delete pods component=collector -n openshift-logging
```
For more information, please open a new support case with Red Hat Support.

Diagnostic Steps

Check netstat for IPv6 port listening on port 24231:

$ oc -n openshift-logging get pods -l component=collector -o=custom-columns=:metadata.name --no-headers | xargs -r -n1 -I {} oc -n openshift-logging exec {} -c collector -- bash -c 'echo -n {}:; ss -6 -lt | grep 24231;'
``
xargs: warning: options --max-args and --replace/-I/-i are mutually exclusive, ignoring previous --max-args value
collector-6pd7j:LISTEN 0      4096            [::]:24231         [::]:*
collector-9wv7l:LISTEN 0      4096            [::]:24231         [::]:*
collector-czc5r:LISTEN 0      4096            [::]:24231         [::]:*

Check netstat for IPv4 port listening on port 24231:

$ oc -n openshift-logging get pods -l component=collector -o=custom-columns=:metadata.name --no-headers | xargs -r -n1 -I {} oc -n openshift-logging exec {} -c collector -- bash -c 'echo -n {}:; ss -4 -lt | grep 24231;'
``
xargs: warning: options --max-args and --replace/-I/-i are mutually exclusive, ignoring previous --max-args value
collector-6pd7j:command terminated with exit code 1
collector-9wv7l:command terminated with exit code 1
collector-czc5r:command terminated with exit code 1

The logging is working fine, alert is firing due to prometheus not able to connect to collector:

$ oc project openshift-monitoring
$ oc rsh prometheus-k8s-0
$ sh-4.4$ curl -kv https://x.x.x.x:24231/metrics
``
Trying x.x.x.x...
TCP_NODELAY set
connect to x.x.x.x port 24231 failed: Connection refused
Failed to connect to x.x.x.x port 24231: Connection refused
Closing connection 0
curl: (7) Failed to connect to x.x.x.x port 24231: Connection refused

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Select Your Language

Prometheus could not scrape fluentd collector metric in dualstack cluster

Environment

Issue

Resolution

Workaround 1:

Workaround 2:

Workaround 3:

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Environment

Issue

Resolution

Workaround 1:

Workaround 2:

Workaround 3:

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links