AMQ Streams minimum sizing guide for an OpenShift development environment
Environment
- Red Hat AMQ Streams (Streams)
Issue
- What is a good starting point for the right sizing of my environments?
- How can I set CPU, memory and storage resources to have a stable development environment?
Resolution
Kafka is designed to leverage the page cache so proper sizing is important. It is also very dependent on the system architecture and must take into account factors like message count and size, number of topics and partitions, replication factor, disaster recovery requirements, the kind of keys used for messages, how many producers and consumer groups will be used and data retention settings.
For stateful components (Kafka and ZooKeeper), we recommend to set cpu
request without limit in order to take advantage of excess cpu
, and to set memory
request equal to limit in order to reserve memory upfront. If there are Quotas
every incoming container must specify an explicit limit for resources, while with LimitRanges
default values can be applied when resources are not set. For any large Kafka cluster, we suggest to increase terminationGracePeriodSeconds
(default 30s), so that the brokers have enough time to transfer their work to another broker before they are terminated, avoiding any chance of log corruption.
Example configuration
This example is NOT suitable for a production or even a shared test environment. Nevertheless, it is useful to have an initial size recommendation for a development environment.
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: my-cluster
spec:
kafka:
replicas: 3
version: 3.2.3
config:
inter.broker.protocol.version: "3.2"
num.partitions: 3
default.replication.factor: 3
min.insync.replicas: 2
offsets.topic.replication.factor: 3
transaction.state.log.replication.factor: 3
transaction.state.log.min.isr: 2
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
- name: external
port: 9094
type: route
tls: true
resources:
limits:
memory: 2Gi
requests:
cpu: 1000m
memory: 2Gi
storage:
size: 10Gi
type: persistent-claim
deleteClaim: false
readinessProbe:
initialDelaySeconds: 60
timeoutSeconds: 10
livenessProbe:
initialDelaySeconds: 60
timeoutSeconds: 10
template:
pod:
terminationGracePeriodSeconds: 60
zookeeper:
replicas: 3
resources:
limits:
memory: 1Gi
requests:
cpu: 500m
memory: 1Gi
storage:
size: 5Gi
type: persistent-claim
deleteClaim: false
readinessProbe:
initialDelaySeconds: 60
timeoutSeconds: 10
livenessProbe:
initialDelaySeconds: 60
timeoutSeconds: 10
entityOperator:
topicOperator:
resources:
limits:
memory: 512Mi
requests:
cpu: 500m
memory: 256Mi
readinessProbe:
initialDelaySeconds: 60
timeoutSeconds: 10
livenessProbe:
initialDelaySeconds: 60
timeoutSeconds: 10
userOperator:
resources:
limits:
memory: 512Mi
requests:
cpu: 500m
memory: 256Mi
readinessProbe:
initialDelaySeconds: 60
timeoutSeconds: 10
livenessProbe:
initialDelaySeconds: 60
timeoutSeconds: 10
Diagnostic Steps
Insufficient resources may lead to one or more pod in CrashLoopBackOff
state or broker slowness.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments