RHACS Central pods do not schedule on a default ROSA cluster

Solution In Progress - Updated -

Environment

Red Hat OpenShift Service on AWS (ROSA) Classic
Red Hat Advanced Cluster Security for Kubernetes (RHACS) 4.x

Issue

Red Hat Advanced Cluster Security for Kubernetes (RHACS) Central components do not deploy correctly on a default Red Hat OpenShift Service on AWS (ROSA) cluster

Resolution

Scale the worker nodes to allow the RHACS components to schedule:

$ rosa list machinepools --cluster=<cluster-name>
worker  No           2         m5.xlarge                          ap-southeast-1a                  No              300 GiB

$  rosa edit machinepool --cluster=<cluster-name>  --replicas 4 worker

Alternatively, for a smaller RHACS deployment or a proof-of-concept, you can modify the RHACS Central custom resource (for operator-deployed installations) and modify the limits and requests for RHACS Central components. This is not advised for production:

apiVersion: platform.stackrox.io/v1alpha1
kind: Central
...
spec:
  central:
    resources:
      limits:
        cpu: '2'
        memory: 4Gi
      requests:
        cpu: 600m
        memory: 2Gi
    db:
      resources:
        limits:
          cpu: '2'
          memory: 4Gi
        requests:
          cpu: '2'
          memory: 4Gi
    scanner:
      db:
        resources:
          limits:
            cpu: '1'
            memory: 2Gi
          requests:
            cpu: '1'
            memory: 1000Mi

Root Cause

Red Hat OpenShift Service on AWS (ROSA) is configured by default with two infra nodes and two worker nodes. This is insufficient to deploy the Central components for Red Hat Advanced Cluster Security for Kubernetes (RHACS)

Diagnostic Steps

Verify pods in the namespace that RHACS Central components are deployed to and verify that some of the pods have not scheduled:

$ oc get pods -n acs-central

NAME                         READY   STATUS    RESTARTS   AGE
central-6666bdf54f-qnv2x     1/1     Running   0          72m
scanner-67d76779d-7bzdj      0/1     Pending   0          72m
scanner-67d76779d-8xqtr      1/1     Running   0          72m
scanner-db-85b59c7d8-mxmtf   1/1     Running   0          72m

Verify that the pending pods are unable to schedule due to resource constraints:

$ oc describe pod/scanner-67d76779d-7bzdj -n acs-central
...
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  73m                 default-scheduler  0/7 nodes are available: 2 Insufficient cpu, 2 node(s) had untolerated taint {node-role.kubernetes.io/infra: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/7 nodes are available: 2 No preemption victims found for incoming pod, 5 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  15m (x40 over 72m)  default-scheduler  0/7 nodes are available: 2 Insufficient cpu, 2 node(s) had untolerated taint {node-role.kubernetes.io/infra: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/7 nodes are available: 2 No preemption victims found for incoming pod, 5 Preemption is not helpful for scheduling.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments