Unreliable status of containerized Collectd agent

Solution In Progress - Updated -

Issue

  • We found the output of collectd's data was inconsistent on several nodes.

  • From one of the nodes, here are logs:

[root@overcloud-controller-0 collectd]# tail collectd.log
[2021-01-22 15:23:43] [thrd:app]: nfv-collectd-events [7]: write to wake-up fd 8 failed: Broken pipe
[2021-01-22 15:23:43] [thrd:app]: nfv-collectd-events [2]: write to wake-up fd 8 failed: Broken pipe
[2021-01-22 15:23:43] [thrd:app]: nfv-collectd-events [2]: write to wake-up fd 8 failed: Broken pipe
[2021-01-22 15:23:43] [thrd:app]: nfv-collectd-events [2]: write to wake-up fd 8 failed: Broken pipe
[2021-01-22 15:23:43] [thrd:app]: nfv-collectd-events [7]: write to wake-up fd 8 failed: Broken pipe
[2021-01-22 15:23:43] [thrd:app]: nfv-collectd-events [7]: write to wake-up fd 8 failed: Broken pipe
[2021-01-22 15:23:43] [thrd:app]: nfv-collectd-events [2]: write to wake-up fd 8 failed: Broken pipe
[2021-01-22 15:23:43] [thrd:app]: nfv-collectd-events [2]: write to wake-up fd 8 failed: Broken pipe
[2021-01-22 15:23:43] [thrd:app]: nfv-collectd-events [2]: write to wake-up fd 8 failed: Broken pipe
[2021-01-22 15:23:43] [thrd:app]: nfv-collectd-events [7]: write to wake-up fd 8 failed: Broken pipe
  • The container health goes to unhealthy:
[root@overcloud-controller-0 collectd]# docker ps
CONTAINER ID        IMAGE                                                                                                         COMMAND                  CREATED             STATUS                  PORTS               NAMES
e752df371daa        satellite.localdomain:5000/is_linux-production-openstack-osp13_containers-collectd:13.0-137.1608222731   "dumb-init --singl..."   6 days ago          Up 6 days (unhealthy)                       collectd
  • We tried to restart the collectd container manually then. At the beginning, it reported loading plugins successfully.:
[root@overcloud-controller-0 collectd]# docker restart collectd
collectd
[root@overcloud-controller-0 collectd]# tail -20 collectd.log
[2021-01-22 15:25:28] plugin_load: plugin "cpu" successfully loaded.
[2021-01-22 15:25:28] plugin_load: plugin "df" successfully loaded.
[2021-01-22 15:25:28] plugin_load: plugin "disk" successfully loaded.
[2021-01-22 15:25:28] plugin_load: plugin "ethstat" successfully loaded.
[2021-01-22 15:25:28] plugin_load: plugin "hugepages" successfully loaded.
[2021-01-22 15:25:28] plugin_load: plugin "interface" successfully loaded.
[2021-01-22 15:25:28] plugin_load: plugin "load" successfully loaded.
[2021-01-22 15:25:28] plugin_load: plugin "memory" successfully loaded.
[2021-01-22 15..:25:28] plugin_load: plugin "processes" successfully loaded.
[2021-01-22 15:25:28] plugin_load: plugin "sysevent" successfully loaded.
[2021-01-22 15:25:28] plugin_load: plugin "tcpconns" successfully loaded.
[2021-01-22 15:25:28] plugin_load: plugin "unixsock" successfully loaded.
[2021-01-22 15:25:28] plugin_load: plugin "uptime" successfully loaded.
[2021-01-22 15:25:28] plugin_load: plugin "write_http" successfully loaded.
[2021-01-22 15:25:28] plugin_load: plugin "write_kafka" successfully loaded.
[2021-01-22 15:25:28] unixsock plugin: Successfully deleted socket file "/var/run/collectd-socket".
[2021-01-22 15:25:28] Initialization complete, entering read-loop.
[2021-01-22 15:25:28] tcpconns plugin: Reading from netlink succeeded. Will use the netlink method from now on.
[2021-01-22 15:25:28] write_kafka plugin: created KAFKA handle : rdkafka#producer-1
[2021-01-22 15:25:28] write_kafka plugin: handle created for topic : nfv-collectd-events
  • However, after a few minutes, it thrown errors like before.

Environment

  • Red Hat OpenStack Platform 13.0 (RHOSP)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content