RHACS Central pods do not schedule on a default ROSA cluster
Environment
- Red Hat Advanced Cluster Security for Kubernetes (RHACS) 4.x
- Red Hat OpenShift Service on AWS (ROSA) Classic
- IBM Red Hat OpenShift Kubernetes Service/Red Hat OpenShift on IBM Cloud (ROKS/RHOIC)
Issue
Red Hat Advanced Cluster Security for Kubernetes (RHACS) Central components do not deploy correctly on a default Red Hat OpenShift Service on AWS (ROSA) cluster
Resolution
Scale the worker nodes to allow the RHACS components to schedule:
$ rosa list machinepools --cluster=<cluster-name>
worker No 2 m5.xlarge ap-southeast-1a No 300 GiB
$ rosa edit machinepool --cluster=<cluster-name> --replicas 4 worker
Alternatively, for a smaller RHACS deployment or a proof-of-concept, you can modify the RHACS Central custom resource (for operator-deployed installations) and modify the limits and requests for RHACS Central components. This is not advised for production:
apiVersion: platform.stackrox.io/v1alpha1
kind: Central
...
spec:
central:
resources:
limits:
cpu: '2'
memory: 4Gi
requests:
cpu: 600m
memory: 2Gi
db:
resources:
limits:
cpu: '2'
memory: 4Gi
requests:
cpu: '2'
memory: 4Gi
scanner:
db:
resources:
limits:
cpu: '1'
memory: 2Gi
requests:
cpu: '1'
memory: 1000Mi
Root Cause
- Red Hat OpenShift Service on AWS (ROSA) is configured by default with two infra nodes and two worker nodes. This is insufficient to deploy the Central components for Red Hat Advanced Cluster Security for Kubernetes (RHACS)
- Any minimal or low resources environment can encounter this scenario when using default ACS resource requirements.
Diagnostic Steps
Verify pods in the namespace that RHACS Central components are deployed to and verify that some of the pods have not scheduled:
$ oc get pods -n acs-central
NAME READY STATUS RESTARTS AGE
central-6666bdf54f-qnv2x 1/1 Running 0 72m
scanner-67d76779d-7bzdj 0/1 Pending 0 72m
scanner-67d76779d-8xqtr 1/1 Running 0 72m
scanner-db-85b59c7d8-mxmtf 1/1 Running 0 72m
Verify that the pending pods are unable to schedule due to resource constraints:
$ oc describe pod/scanner-67d76779d-7bzdj -n acs-central
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 73m default-scheduler 0/7 nodes are available: 2 Insufficient cpu, 2 node(s) had untolerated taint {node-role.kubernetes.io/infra: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/7 nodes are available: 2 No preemption victims found for incoming pod, 5 Preemption is not helpful for scheduling.
Warning FailedScheduling 15m (x40 over 72m) default-scheduler 0/7 nodes are available: 2 Insufficient cpu, 2 node(s) had untolerated taint {node-role.kubernetes.io/infra: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/7 nodes are available: 2 No preemption victims found for incoming pod, 5 Preemption is not helpful for scheduling.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments