RHACS operator pod restarting due to OOM
Environment
- Red Hat Advanced Cluster Security 3
Issue
- RHACS operator pod is getting restarted due to OOM issue. Below are the events can be seen in
rhacs-operator
namespace.
Warning ProbeError pod/rhacs-operator-controller-manager-xxxx Readiness probe error: Get "http://<IP>:8081/readyz": dial tcp <IP>:8081: connect: connection refused...
From describing the pod below error logs are seen:
message: back-off 2m40s restarting failed container=manager pod=rhacs-operator-controller-manager-xxxxxxx_openshift-rhacs-operator(zzzzzzzzzz)
Resolution
- This is know to engineering and currently being worked upon. Please apply below workaround to fix the issue.
- Take a backup and edit CSV for the operator and increase the manager container's requests.limits resource.
# oc get csv <rhacs-operator> -oyaml &> rhacs-operator-csv.bak
# oc edit csv <rhacs-operator>
- Configure it and update the values as below:
spec:
config:
resources:
limits:
cpu: 200m
memory: 2Gi
requests:
cpu: 200m
memory: 2Gi
Diagnostic Steps
- Check the pod status and it should be in CrashLoopBackOff state.
# oc get pods -n rhacs-operator
- While describing the pod, it should show error code 137 (OOM)
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.