RHACS operator pod restarting due to OOM

Solution Verified - Updated -

Environment

  • Red Hat Advanced Cluster Security 3

Issue

  • RHACS operator pod is getting restarted due to OOM issue. Below are the events can be seen in rhacs-operator namespace.
Warning   ProbeError           pod/rhacs-operator-controller-manager-xxxx   Readiness probe error: Get "http://<IP>:8081/readyz": dial tcp <IP>:8081: connect: connection refused...

From describing the pod below error logs are seen:

message: back-off 2m40s restarting failed container=manager pod=rhacs-operator-controller-manager-xxxxxxx_openshift-rhacs-operator(zzzzzzzzzz)

Resolution

  • This is know to engineering and currently being worked upon. Please apply below workaround to fix the issue.
  • Take a backup and edit CSV for the operator and increase the manager container's requests.limits resource.
# oc get csv <rhacs-operator> -oyaml &> rhacs-operator-csv.bak
# oc edit csv <rhacs-operator>
  • Configure it and update the values as below:
spec:
  config:
    resources:
          limits:
            cpu: 200m
            memory: 2Gi
          requests:
            cpu: 200m
            memory: 2Gi

Diagnostic Steps

  • Check the pod status and it should be in CrashLoopBackOff state.
# oc get pods -n rhacs-operator
  • While describing the pod, it should show error code 137 (OOM)

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.