Unable to login to cluster due to high resource consumption on master nodes
Environment
- Red Hat OpenShift Service on AWS [ROSA]
- 4.x
Issue
Authentication to cluster failing due to high resource consumption on master nodes.
Resolution
- Set a
resource limiton thefluentdcollectors in thecluster-logging-operator's configuration. This will ensure that they won't exhaust the master node memory. Refer the official documentation on setting memory requests and limits on the fluentd log collector resource. - Optionally, resize the
masternodes which can give some breathing room for the cluster.
Root Cause
- The high resource usage was caused by the
fluentdcollector pods running on the master nodes. The fluentd pods were running without anymemory limits, and they have been gradually exhausting thememoryon the master nodes.
Diagnostic Steps
- Check the
resourceusage on the nodes.
# oc adm top nodes
- Check which pods are having high resource usage.
# oc adm top pods --all-namespaces | grep -i <node-name>
- Check the resource usage from
Grafanaconsole as well.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments