Chapter 6. Scaling Clusters

6.1. Scaling Kafka clusters

6.1.1. Adding brokers to a cluster

The primary way of increasing throughput for a topic is to increase the number of partitions for that topic. That works because the partitions allow the load for that topic to be shared between the brokers in the cluster. When the brokers are all constrained by some resource (typically I/O), then using more partitions will not yield an increase in throughput. Instead, you must add brokers to the cluster.

When you add an extra broker to the cluster, AMQ Streams does not assign any partitions to it automatically. You have to decide which partitions to move from the existing brokers to the new broker.

Once the partitions have been redistributed between all brokers, each broker should have a lower resource utilization.

6.1.2. Removing brokers from the cluster

Before you remove a broker from a cluster, you must ensure that it is not assigned to any partitions. You should decide which remaining brokers will be responsible for each of the partitions on the broker being decommissioned. Once the broker has no assigned partitions, you can stop it.

6.2. Reassignment of partitions

The kafka-reassign-partitions.sh utility is used to reassign partitions to different brokers.

It has three different modes:

--generate
Takes a set of topics and brokers and generates a reassignment JSON file which will result in the partitions of those topics being assigned to those brokers. It is an easy way to generate a reassignment JSON file, but it operates on whole topics, so its use is not always appropriate.
--execute
Takes a reassignment JSON file and applies it to the partitions and brokers in the cluster. Brokers which are gaining partitions will become followers of the partition leader. For a given partition, once the new broker has caught up and joined the ISR the old broker will stop being a follower and will delete its replica.
--verify
Using the same reassignment JSON file as the --execute step, --verify checks whether all of the partitions in the file have been moved to their intended brokers. If the reassignment is complete it will also remove any throttles which are in effect. Unless removed, throttles will continue to affect the cluster even after the reassignment has finished.

It is only possible to have one reassignment running in the cluster at any given time, and it is not possible to cancel a running reassignment. If you need to cancel a reassignment you have to wait for it to complete and then perform another reassignment to revert the effects of the first one. The kafka-reassign-partitions.sh will print the reassignment JSON for this reversion as part of its output. Very large reassignments should be broken down into a number of smaller reassignments in case there is a need to stop in-progress reassignment.

6.2.1. Reassignment JSON file

The reassignment JSON file has a specific structure:

{
  "version": 1,
  "partitions": [
    <PartitionObjects>
  ]
}

Where <PartitionObjects> is a comma-separated list of objects like:

{
  "topic": <TopicName>,
  "partition": <Partition>,
  "replicas": [ <AssignedBrokerIds> ],
  "log_dirs": [<LogDirs>]
}

The "log_dirs" property is optional and is used to move the partition to a specific log directory.

The following is an example reassignment JSON file that assigns topic topic-a, partition 4 to brokers 2, 4 and 7, and topic topic-b partition 2 to brokers 1, 5 and 7:

{
  "version": 1,
  "partitions": [
    {
      "topic": "topic-a",
      "partition": 4,
      "replicas": [2,4,7]
    },
    {
      "topic": "topic-b",
      "partition": 2,
      "replicas": [1,5,7]
    }
  ]
}

Partitions not included in the JSON are not changed.

6.2.2. Generating reassignment JSON files

The easiest way to assign all the partitions for a given set of topics to a given set of brokers is to generate a reassignment JSON file using the kafka-reassign-partitions.sh --generate, command.

bin/kafka-reassign-partitions.sh --zookeeper <ZooKeeper> --topics-to-move-json-file <TopicsFile> --broker-list <BrokerList> --generate

The <TopicsFile> is a JSON file which lists the topics to move. It has the following structure:

{
  "version": 1,
  "topics": [
    <TopicObjects>
  ]
}

where <TopicObjects> is a comma-separated list of objects like:

{
  "topic": <TopicName>
}

For example to move all the partitions of topic-a and topic-b to brokers 4 and 7

bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --topics-to-move-json-file topics-to-be-moved.json --broker-list 4,7 --generate

where topics-to-be-moved.json has contents:

{
  "version": 1,
  "topics": [
    { "topic": "topic-a"},
    { "topic": "topic-b"}
  ]
}

6.2.3. Creating reassignment JSON files manually

You can manually create the reassignment JSON file if you want to move specific partitions.

6.3. Reassignment throttles

Reassigning partitions can be a slow process because it can require moving lots of data between brokers. To avoid this having a detrimental impact on clients it is possible to throttle the reassignment. Using a throttle can mean the reassignment takes longer. If the throttle is too low then the newly assigned brokers will not be able to keep up with records being published and the reassignment will never complete. If the throttle is too high then clients will be impacted. For example, for producers, this could manifest as higher than normal latency waiting for acknowledgement. For consumers, this could manifest as a drop in throughput caused by higher latency between polls.

6.4. Scaling up a Kafka cluster

This procedure describes how to increase the number of brokers in a Kafka cluster.

Prerequisites

  • An existing Kafka cluster.
  • A new machine with the AMQ broker installed.
  • A reassignment JSON file of how partitions should be reassigned to brokers in the enlarged cluster.

Procedure

  1. Create a configuration file for the new broker using the same settings as for the other brokers in your cluster, except for broker.id which should be a number that is not already used by any of the other brokers.
  2. Start the new Kafka broker passing the configuration file you created in the previous step as the argument to the kafka-server-start.sh script:

    su - kafka
    /opt/kafka/bin/kafka-server-start.sh -daemon /opt/kafka/config/server.properties
  3. Verify that the Kafka broker is running.

    jcmd | grep Kafka
  4. Repeat the above steps for each new broker.
  5. Execute the partition reassignment using the kafka-reassign-partitions.sh command line tool.

    kafka-reassign-partitions.sh --zookeeper <ZooKeeperHostAndPort> --reassignment-json-file <ReassignmentJsonFile> --execute

    If you are going to throttle replication you can also pass the --throttle option with an inter-broker throttled rate in bytes per second. For example:

    kafka-reassign-partitions.sh --zookeeper zookeeper1:2181 --reassignment-json-file reassignment.json --throttle 5000000 --execute

    This command will print out two reassignment JSON objects. The first records the current assignment for the partitions being moved. You should save this to a file in case you need to revert the reassignment later on. The second JSON object is the target reassignment you have passed in your reassignment JSON file.

  6. If you need to change the throttle during reassignment you can use the same command line with a different throttled rate. For example:

    kafka-reassign-partitions.sh --zookeeper zookeeper1:2181 --reassignment-json-file reassignment.json --throttle 10000000 --execute
  7. Periodically verify whether the reassignment has completed using the kafka-reassign-partitions.sh command line tool. This is the same command as the previous step but with the --verify option instead of the --execute option.

    kafka-reassign-partitions.sh --zookeeper <ZooKeeperHostAndPort> --reassignment-json-file <ReassignmentJsonFile> --verify

    For example:

    kafka-reassign-partitions.sh --zookeeper zookeeper1:2181 --reassignment-json-file reassignment.json --verify
  8. The reassignment has finished when the --verify command reports each of the partitions being moved as completed successfully. This final --verify will also have the effect of removing any reassignment throttles. You can now delete the revert file if you saved the JSON for reverting the assignment to their original brokers.

6.5. Scaling down a Kafka cluster

Additional resources

This procedure describes how to decrease the number of brokers in a Kafka cluster.

Prerequisites

  • An existing Kafka cluster.
  • A reassignment JSON file of how partitions should be reassigned to brokers in the cluster once the broker(s) have been removed.

Procedure

  1. Execute the partition reassignment using the kafka-reassign-partitions.sh command line tool.

    kafka-reassign-partitions.sh --zookeeper <ZooKeeperHostAndPort> --reassignment-json-file <ReassignmentJsonFile> --execute

    If you are going to throttle replication you can also pass the --throttle option with an inter-broker throttled rate in bytes per second. For example:

    kafka-reassign-partitions.sh --zookeeper zookeeper1:2181 --reassignment-json-file reassignment.json --throttle 5000000 --execute

    This command will print out two reassignment JSON objects. The first records the current assignment for the partitions being moved. You should save this to a file in case you need to revert the reassignment later on. The second JSON object is the target reassignment you have passed in your reassignment JSON file.

  2. If you need to change the throttle during reassignment you can use the same command line with a different throttled rate. For example:

    kafka-reassign-partitions.sh --zookeeper zookeeper1:2181 --reassignment-json-file reassignment.json --throttle 10000000 --execute
  3. Periodically verify whether the reassignment has completed using the kafka-reassign-partitions.sh command line tool. This is the same command as the previous step but with the --verify option instead of the --execute option.

    kafka-reassign-partitions.sh --zookeeper <ZooKeeperHostAndPort> --reassignment-json-file <ReassignmentJsonFile> --verify

    For example:

    kafka-reassign-partitions.sh --zookeeper zookeeper1:2181 --reassignment-json-file reassignment.json --verify
  4. The reassignment has finished when the --verify command reports each of the partitions being moved as completed successfully. This final --verify will also have the effect of removing any reassignment throttles. You can now delete the revert file if you saved the JSON for reverting the assignment to their original brokers.
  5. Once all the partition reassignments have finished, the broker being removed should not have responsibility for any of the partitions in the cluster. You can verify this by checking each of the directories given in the broker’s log.dirs configuration parameters. If any of the log directories on the broker contains a directory that does not match the extended regular expression \.[a-z0-9]-delete$ then the broker still has live partitions and it should not be stopped.

    You can check this by executing the command:

    ls -l <LogDir> | grep -E '^d' | grep -vE '[a-zA-Z0-9.-]+\.[a-z0-9]+-delete$'

    If the above command prints any output then the broker still has live partitions. In this case, either the reassignment has not finished, or the reassignment JSON file was incorrect.

  6. Once you have confirmed that the broker has no live partitions you can stop it.

    su - kafka
    /opt/kafka/bin/kafka-server-stop.sh
  7. Confirm that the Kafka broker is stopped.

    jcmd | grep kafka

6.6. Scaling up a ZooKeeper cluster

This procedure describes how to add servers (nodes) to a ZooKeeper cluster. The dynamic reconfiguration feature of ZooKeeper maintains a stable ZooKeeper cluster during the scale up process.

Prerequisites

  • Dynamic reconfiguration is enabled in the ZooKeeper configuration file (reconfigEnabled=true).
  • ZooKeeper authentication is enabled and you can access the new server using the authentication mechanism.

Procedure

Perform the following steps for each ZooKeeper server, one at a time:

  1. Add a server to the ZooKeeper cluster as described in Section 3.3, “Running multi-node ZooKeeper cluster” and then start ZooKeeper.
  2. Note the IP address and configured access ports of the new server.
  3. Start a zookeeper-shell session for the server. Run the following command from a machine that has access to the cluster (this might be one of the ZooKeeper nodes or your local machine, if it has access).

    su - kafka
    /opt/kafka/bin/zookeeper-shell.sh <ip-address>:<zk-port>
  4. In the shell session, with the ZooKeeper node running, enter the following line to add the new server to the quorum as a voting member:

    reconfig -add server.<positive-id> = <address1>:<port1>:<port2>[:role];[<client-port-address>:]<client-port>

    For example:

    reconfig -add server.4=172.17.0.4:2888:3888:participant;172.17.0.4:2181

    Where <positive-id> is the new server ID 4.

    For the two ports, <port1> 2888 is for communication between ZooKeeper servers, and <port2> 3888 is for leader election.

    The new configuration propagates to the other servers in the ZooKeeper cluster; the new server is now a full member of the quorum.

  5. Repeat steps 1-4 for the other servers that you want to add.

6.7. Scaling down a ZooKeeper cluster

This procedure describes how to remove servers (nodes) from a ZooKeeper cluster. The dynamic reconfiguration feature of ZooKeeper maintains a stable ZooKeeper cluster during the scale down process.

Prerequisites

  • Dynamic reconfiguration is enabled in the ZooKeeper configuration file (reconfigEnabled=true).
  • ZooKeeper authentication is enabled and you can access the new server using the authentication mechanism.

Procedure

Perform the following steps for each ZooKeeper server, one at a time:

  1. Log in to the zookeeper-shell on one of the servers that will be retained after the scale down (for example, server 1).

    Note

    Access the server using the authentication mechanism configured for the ZooKeeper cluster.

  2. Remove a server, for example server 5.

    reconfig -remove 5
  3. Deactivate the server that you removed.
  4. Repeat steps 1-3 to reduce the cluster size.

Additional resources