Unable to login to cluster due to high resource consumption on master nodes

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Service on AWS [ROSA]
    • 4.x

Issue

Authentication to cluster failing due to high resource consumption on master nodes.

Resolution

  • Set a resource limit on the fluentd collectors in the cluster-logging-operator's configuration. This will ensure that they won't exhaust the master node memory. Refer the official documentation on setting memory requests and limits on the fluentd log collector resource.
  • Optionally, resize the master nodes which can give some breathing room for the cluster.

Root Cause

  • The high resource usage was caused by the fluentd collector pods running on the master nodes. The fluentd pods were running without any memory limits, and they have been gradually exhausting the memory on the master nodes.

Diagnostic Steps

  • Check the resource usage on the nodes.
# oc adm top nodes
  • Check which pods are having high resource usage.
# oc adm top pods --all-namespaces | grep -i <node-name>
  • Check the resource usage from Grafana console as well.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments