AMQ Streams minimum sizing guide for an OpenShift development environment

Solution Verified - Updated -

Environment

  • Red Hat AMQ Streams (Streams)

Issue

  • What is a good starting point for the right sizing of my environments?
  • How can I set CPU, memory and storage resources to have a stable development environment?

Resolution

Kafka is designed to leverage the page cache so proper sizing is important. It is also very dependent on the system architecture and must take into account factors like message count and size, number of topics and partitions, replication factor, disaster recovery requirements, the kind of keys used for messages, how many producers and consumer groups will be used and data retention settings.

For stateful components (Kafka and ZooKeeper), we recommend to set cpu request without limit in order to take advantage of excess cpu, and to set memory request equal to limit in order to reserve memory upfront. If there are Quotas every incoming container must specify an explicit limit for resources, while with LimitRanges default values can be applied when resources are not set. For any large Kafka cluster, we suggest to increase terminationGracePeriodSeconds (default 30s), so that the brokers have enough time to transfer their work to another broker before they are terminated, avoiding any chance of log corruption.

Example configuration

This example is NOT suitable for a production or even a shared test environment. Nevertheless, it is useful to have an initial size recommendation for a development environment.

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    replicas: 3
    version: 3.2.3
    config:
      inter.broker.protocol.version: "3.2"
      num.partitions: 3
      default.replication.factor: 3
      min.insync.replicas: 2
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
      - name: external
        port: 9094
        type: route
        tls: true
    resources:
      limits:
        memory: 2Gi
      requests:
        cpu: 1000m
        memory: 2Gi
    storage:
      size: 10Gi
      type: persistent-claim
      deleteClaim: false
    readinessProbe:
      initialDelaySeconds: 60
      timeoutSeconds: 10
    livenessProbe:
      initialDelaySeconds: 60
      timeoutSeconds: 10
    template:
      pod:
        terminationGracePeriodSeconds: 60
  zookeeper:
    replicas: 3
    resources:
      limits:
        memory: 1Gi
      requests:
        cpu: 500m
        memory: 1Gi
    storage:
      size: 5Gi
      type: persistent-claim
      deleteClaim: false
    readinessProbe:
      initialDelaySeconds: 60
      timeoutSeconds: 10
    livenessProbe:
      initialDelaySeconds: 60
      timeoutSeconds: 10
  entityOperator:
    topicOperator:
      resources:
        limits:
          memory: 512Mi
        requests:
          cpu: 500m
          memory: 256Mi
      readinessProbe:
        initialDelaySeconds: 60
        timeoutSeconds: 10
      livenessProbe:
        initialDelaySeconds: 60
        timeoutSeconds: 10
    userOperator:
      resources:
        limits:
          memory: 512Mi
        requests:
          cpu: 500m
          memory: 256Mi
      readinessProbe:
        initialDelaySeconds: 60
        timeoutSeconds: 10
      livenessProbe:
        initialDelaySeconds: 60
        timeoutSeconds: 10

Diagnostic Steps

Insufficient resources may lead to one or more pod in CrashLoopBackOff state or broker slowness.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments