RHPAM Kogito Operator 7.x installation is failing with OOMKilled and CrashLoopBackOff

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Container Platform (OCP 4)
  • Red Hat OpenShift Service on AWS (ROSA 4)
  • Red Hat OpenShift Dedicated 4 (OSD 4)
  • Azure Red Hat Openshift (ARO 4)

Issue

  • rhpam-kogito-operator-controller-manager pod with default memory limits is being OOMKilled and is stuck in a CrashLoopBackOff.
  • RHPAM Kogito Operator 7.x installation is stuck in "installing" state.
  • In clusters with a large number of namespaces the rhpam-kogito-operator-controller-manager pod is OOMKilled very quickly.

Resolution

As a workaround, customers can increase the default memory limit of the rhpam-kogito-operator-controller-manager pod by editing the subscription (for ex: from default 200mi, increase the memory limit to "2000Mi").

Root Cause

The default memory limit (i.e., 200mi) that was set on the rhpam-kogito-operator-controller-manager pod was not sufficient. After increasing the memory limit, the pod came up running.

The root cause can be verified by checking the memory consumption of rhpam-kogito-operator-controller-manager pod after increasing the memory limit:

oc adm top pod | grep rhpam
NAME                                                        CPU(cores)   MEMORY(bytes)
rhpam-kogito-operator-controller-manager-xxxxxxxxx-xxxxx    2m           250Mi   <-----

Diagnostic Steps

(1) Check the status of the rhpam-kogito-operator-controller-manager pod:

oc get pods -n openshift-operators | grep rhpam
NAME                                                       READY   STATUS             RESTARTS          AGE
rhpam-kogito-operator-controller-manager-xxxxxxxx-xxxxx    1/2     CrashLoopBackOff   262 (4m14s ago)   24h

2. Describe the rhpam-kogito-operator-controller-manager pod and check the reason:

oc describe pod rhpam-kogito-operator-controller-manager-xxxxxxxx-xxxxx

[...]
State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Tue, 20 Dec 2022 19:13:48 +0530
      Finished:     Tue, 20 Dec 2022 19:14:16 +0530
    Ready:          False
    Restart Count:  262
[...]

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments