RHACS operator pod restarting due to OOM

Environment

Red Hat OpenShift Container Platform (RHOCP)
- 4+
Red Hat Advanced Cluster Security (RHACS)
- 3
- 4

Issue

RHACS operator pod is getting restarted due to OOM issue. Below are the events can be seen in `rhacs-operator namespace:

Warning   ProbeError           pod/rhacs-operator-controller-manager-xxxx   Readiness probe error: Get "http://<IP>:8081/readyz": dial tcp <IP>:8081: connect: connection refused...

From describing the pod below error logs are seen:

message: back-off 2m40s restarting failed container=manager pod=rhacs-operator-controller-manager-xxxxxxx_openshift-rhacs-operator(zzzzzzzzzz)

Resolution

The solution is to increase the memory limit value of the operator pod. There are two possible ways to do it:

Method	Advantage	Disadvantage
Set it through the ClusterServiceVersion resource	Advantage: it is possible to tune only the memory limits of the`manager` container, leaving `kube-rbac-proxy` container unchanged	Disadvantage: Setting is lost after each ACS Operator upgrade.
Set it through the Subscription resource	Advantage: the memory setting is preserved through updates	Disadvantage: the change will be applied to both containers inside the `rhacs-operator` pod, although only the `manager` container needs those resources.

Both are details below:

Set it through the ClusterServiceVersion

Get the name of rhacs-operator csv:

$ oc -n rhacs-operator get csv | grep rhacs

rhacs-operator.v4.3.4    Advanced Cluster Security for Kubernetes   4.3.4     rhacs-operator.v4.3.3   Succeeded

Take a backup of rhacs-operator csv:

$ oc -n rhacs-operator get csv <RHACS-OPERATOR-CSV-NAME> -o yaml > rhacs-operator-csv.yaml

Edit it:

$ oc -n rhacs-operator edit csv <RHACS-OPERATOR-CSV-NAME>

Increase the resources.limits value of the manager container:

spec:
  containers:
    - args:
        - --health-probe-bind-address=:8081
        - --metrics-bind-address=127.0.0.1:8080
        - --leader-elect
      [...]
      name: manager
      resources:
        limits:
          cpu: 200m
          memory: 2Gi    <-- value to increase

Set it through the Subscription

Edit the rhacs-operator Subscription and add this config block under spec:

$ oc -n rhacs-operator edit sub rhacs-operator

spec:
  config:
    resources:
      limits:
        cpu: 500m
        memory: 2Gi    <--- pod memory increased to 2Gi
      requests:
        cpu: 100m
        memory: 200Mi

Note: both limits and requests need to be defined. In order to be on the safe side, the values reported here are the higher values from both the containers.

Diagnostic Steps

Check the pod status and it should be in CrashLoopBackOff state:
```
# oc get pods -n rhacs-operator
```
While describing the pod, it should show error code 137 (OOM)

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Select Your Language

RHACS operator pod restarting due to OOM

Environment

Issue

Resolution

Set it through the ClusterServiceVersion

Set it through the Subscription

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Environment

Issue

Resolution

Set it through the ClusterServiceVersion

Set it through the Subscription

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links