Prometheus could not scrape fluentd collector metric in dualstack cluster
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4.12+
- Red Hat OpenShift Logging (RHOL)
- 5.8+
- Fluentd
Issue
Prometheus
could not scrapefluentd
collector
metric indualstack cluster
.- Alert "CollectorNodeDown" continuously firing for all
fluentd
collectors
indualstack cluster
.
Resolution
- The Issue was reported to the engineering team as a bug and can be traced under - LOG-5106.
-
Below are the workarounds for this issue:
-
Workaround 1:
- Migrate from fluentd to vector collector, as the issue is specific to the
fluentd
collector
.
- Migrate from fluentd to vector collector, as the issue is specific to the
-
Workaround 2:
- Downgrade
RHOL
to version 5.7 as issue is encountered only in version 5.8.
- Downgrade
-
Workaround 3:
This needs to go through Unmanaged status. Read considerations about the Unmanaged status in the Documentation section "Unsupported configurations".
- Step 1: Put clusterlogging in Unmanaged:
$ oc -n openshift-logging patch clusterlogging/instance -p '{"spec":{"managementState": "Unmanaged"}}' --type=merge
- Step 2: Take backup of collector-config configmap:
$ oc -n openshift-logging get cm collector-config -oyaml > collector-config.bkp
- Step 3: Modify line:
bind "# {ENV['PROM_BIND_IP']}"
tobind "0.0.0.0"
in collector-config configmap:
$ oc edit cm collector-config -n openshift-logging
- Step 4: Save the configmap and restart the
collector
pods:
$ oc delete pods component=collector -n openshift-logging
-
- For more information, please open a new support case with Red Hat Support.
Diagnostic Steps
-
Check netstat for
IPv6
port listening on port 24231:$ oc -n openshift-logging get pods -l component=collector -o=custom-columns=:metadata.name --no-headers | xargs -r -n1 -I {} oc -n openshift-logging exec {} -c collector -- bash -c 'echo -n {}:; ss -6 -lt | grep 24231;' `` xargs: warning: options --max-args and --replace/-I/-i are mutually exclusive, ignoring previous --max-args value collector-6pd7j:LISTEN 0 4096 [::]:24231 [::]:* collector-9wv7l:LISTEN 0 4096 [::]:24231 [::]:* collector-czc5r:LISTEN 0 4096 [::]:24231 [::]:*
-
Check netstat for
IPv4
port listening on port 24231:$ oc -n openshift-logging get pods -l component=collector -o=custom-columns=:metadata.name --no-headers | xargs -r -n1 -I {} oc -n openshift-logging exec {} -c collector -- bash -c 'echo -n {}:; ss -4 -lt | grep 24231;' `` xargs: warning: options --max-args and --replace/-I/-i are mutually exclusive, ignoring previous --max-args value collector-6pd7j:command terminated with exit code 1 collector-9wv7l:command terminated with exit code 1 collector-czc5r:command terminated with exit code 1
-
The logging is working fine, alert is firing due to
prometheus
not able to connect tocollector
:$ oc project openshift-monitoring $ oc rsh prometheus-k8s-0 $ sh-4.4$ curl -kv https://x.x.x.x:24231/metrics `` Trying x.x.x.x... TCP_NODELAY set connect to x.x.x.x port 24231 failed: Connection refused Failed to connect to x.x.x.x port 24231: Connection refused Closing connection 0 curl: (7) Failed to connect to x.x.x.x port 24231: Connection refused
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments