Using AMQ Streams on Red Hat Enterprise Linux (RHEL)
Abstract
Chapter 1. Overview of AMQ Streams
Red Hat AMQ Streams is a massively-scalable, distributed, and high-performance data streaming platform based on the Apache Zookeeper and Apache Kafka projects. It consists of the following main components:
- Zookeeper
- Service for highly reliable distributed coordination.
- Kafka Broker
- Messaging broker responsible for delivering records from producing clients to consuming clients.
- Kafka Connect
- A toolkit for streaming data between Kafka brokers and other systems using Connector plugins.
- Kafka Consumer and Producer APIs
- Java based APIs for producing and consuming messages to / from Kafka brokers.
- Kafka Streams API
- API for writing stream processor applications.
Kafka Broker is the central point connecting all these components. The broker uses Apache Zookeeper for storing configuration data and for cluster coordination. Before running Apache Kafka, an Apache Zookeeper cluster has to be ready.
Figure 1.1. Example Architecture diagram of AMQ Streams

1.1. Key features
Scalability and performance
- Designed for horizontal scalability
Message ordering guarantee
- At partition level
Message rewind/replay
- "Long term" storage
- Allows to reconstruct application state by replaying the messages
- Combined with compacted topics allows to use Kafka as key-value store
1.2. Supported Configurations
In order to be running in a supported configuration, AMQ Streams must be running in one of the following JVM versions and on one of the supported operating systems.
Table 1.1. List of supported Java Virtual Machines
| Java Virtual Machine | Version |
|---|---|
| OpenJDK | 1.8 |
| OracleJDK | 1.8 |
| IBM JDK | 1.8 |
Table 1.2. List of supported Operating Systems
| Operating System | Architecture | Version |
|---|---|---|
| Red Hat Enterprise Linux | x86_64 | 7.x |
1.3. Document conventions
Replaceables
In this document, replaceable text is styled in monospace and surrounded by angle brackets.
For example, in the following code, you will want to replace <bootstrap-address> and <topic-name> with your own address and topic name:
bin/kafka-console-consumer.sh --bootstrap-server <bootstrap-address> --topic <topic-name> --from-beginning
Chapter 2. Getting started
2.1. AMQ Streams distribution
AMQ Streams is distributed as single ZIP file. This ZIP file contains all AMQ Streams components:
- Apache Zookeeper
- Apache Kafka
- Apache Kafka Connect
- Apache Kafka Mirror Maker
2.2. Downloading an AMQ Streams Archive
An archived distribution of AMQ Streams is available for download from the Red Hat website. You can download a copy of the distribution by following the steps below.
Procedure
- Download the AMQ Streams archive from the Customer Portal.
2.3. Installing AMQ Streams
Prerequisites
- Download the installation archive.
- Review the Section 1.2, “Supported Configurations”
Procedure
Add new
kafkauser and group.sudo groupadd kafka sudo useradd -g kafka kafka sudo passwd kafka
Create directory
/opt/kafka.sudo mkdir /opt/kafka
Create a temporary directory and extract the contents of the AMQ Streams ZIP file.
mkdir /tmp/kafka unzip kafka_y.y-x.x.x.zip -d /tmp/kafka
Move the extracted contents into
/opt/kafkadirectory and delete the temporary directory.sudo mv /tmp/kafka/kafka_y.y-x.x.x/* /opt/kafka/ rm -r /tmp/kafka
Change the ownership of the
/opt/kafkadirectory to thekafkauser.sudo chown -R kafka:kafka /opt/kafka
Create directory
/var/lib/zookeeperfor storing Zookeeper data and set its ownership to thekafkauser.sudo mkdir /var/lib/zookeeper sudo chown -R kafka:kafka /var/lib/zookeeper
Create directory
/var/lib/kafkafor storing Kafka data and set its ownership to thekafkauser.sudo mkdir /var/lib/kafka sudo chown -R kafka:kafka /var/lib/kafka
2.4. Running single node AMQ Streams cluster
This procedure will show you how to run a basic AMQ Streams cluster consisting of single Zookeeper and single Apache Kafka node both running on the same host. It is using the default configuration files for both Zookeeper and Kafka.
Single node AMQ Streams cluster does not provide realibility and high availability and is suitable only for development purposes.
Prerequisites
- AMQ Streams is installed on the host
Running the cluster
Edit the Zookeeper configuration file
/opt/kafka/config/zookeeper.properties. Set thedataDiroption to/var/lib/zookeeper/.dataDir=/var/lib/zookeeper/
Start Zookeeper.
su - kafka /opt/kafka/bin/zookeeper-server-start.sh -daemon /opt/kafka/config/zookeeper.properties
Make sure that Apache Zookeeper is running.
jcmd | grep zookeeper
Edit the Kafka configuration file
/opt/kafka/config/server.properties. Set thelog.dirsoption to/var/lib/kafka/.log.dirs=/var/lib/kafka/
Start Kafka.
su - kafka /opt/kafka/bin/kafka-server-start.sh -daemon /opt/kafka/config/server.properties
Make sure that Kafka is running.
jcmd | grep kafka
Additional resources
- For more information about installing AMQ Streams, see Section 2.3, “Installing AMQ Streams”.
- For more information about configuring AMQ Streams, see Section 2.7, “Configuring AMQ Streams”.
2.5. Using the cluster
Prerequisites
- AMQ Streams is installed on the host
- Zookeeper and Kafka are up and running
Procedure
Start the Kafka console producer.
bin/kafka-console-producer.sh --broker-list <bootstrap-address> --topic <topic-name>
For example:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-topic
- Type your message into the console where the producer is running.
- Press Enter to send.
- Press Ctrl+C to exit the Kafka console producer.
Start the message receiver.
bin/kafka-console-consumer.sh --bootstrap-server <bootstrap-address> --topic <topic-name> --from-beginning
For example:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic my-topic --from-beginning
- Confirm that you see the incoming messages in the consumer console.
- Press Crtl+C to exit the Kafka console consumer.
2.6. Stopping the AMQ Streams services
You can stop the Kafka and Zookeeper services by running a script. All connections to the Kafka and Zookeeper services will be terminated.
Prerequisites
- AMQ Streams is installed on the host
- Zookeeper and Kafka are up and running
Procedure
Stop the Kafka broker.
su - kafka /opt/kafka/bin/kafka-server-stop.sh
Confirm that the Kafka broker is stopped.
jcmd | grep kafka
Stop Zookeeper.
su - kafka /opt/kafka/bin/zookeeper-server-stop.sh
2.7. Configuring AMQ Streams
Prerequisites
- AMQ Streams is downloaded and installed on the host
Procedure
Open Zookeeper and Kafka broker configuration files in a text editor. The configuration files are located at :
- Zookeeper
-
/opt/kafka/config/zookeeper.properties - Kafka
-
/opt/kafka/config/server.properties
Edit the configuration options. The configuration files are in the Java properties format. Every configuration option should be on separate line in the following format:
<option> = <value>
Lines starting with
#or!will be treated as comments and will be ignored by AMQ Streams components.# This is a comment
Values can be split into multiple lines by using
\directly before the newline / carriage return.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ username="bob" \ password="bobs-password";- Save the changes
- Restart the Zookeeper or Kafka broker
- Repeat this procedure on all the nodes of the cluster.
Chapter 3. Configuring Zookeeper
Kafka uses Zookeeper to store configuration data and for cluster coordination. It is strongly recommended to run a cluster of replicated Zookeeper instances.
3.1. Basic configuration
The most important Zookeepr configuration options are:
tickTime- Zookeeper’s basic time unit in milliseconds. It is used for heartbeats and session timeouts. For example, minimum session timeout will be two ticks.
dataDir-
The directory where Zookeeper stores its transaction log and snapshots of its in-memory database. This should be set to the
/var/lib/zookeeper/directory created during installation. clientPort-
Port number where clients can connect. Defaults to
2181.
An example zookeeper configuration file config/zookeeper.properties is located in the AMQ Streams installation directory. It is recommended to place the dataDir directory on a separate disk device to minimize the latency in Zookeeper.
Zookeeper configuration file should be located in /opt/kafka/config/zookeeper.properties. A basic example of the configuration file can be found below. The configuration file has to be readable by the kafka user.
timeTick=2000 dataDir=/var/lib/zookeeper/ clientPort=2181
3.2. Zookeeper cluster configuration
For reliable ZooKeeper service, you should deploy ZooKeeper in a cluster. Hence, for production use cases, you must run a cluster of replicated Zookeeper instances. Zookeeper clusters are also referred as ensembles.
Zookeeper clusters usually consist of an odd number of nodes. Zookeeper requires a majority of the nodes in the cluster should be up and running. For example, a cluster with three nodes, at least two of them must be up and running. This means it can tolerate one node being down. A cluster consisting of five nodes, at least three nodes must be available. This means it can tolerate two nodes being down. A cluster consisting of seven nodes, at least four nodes must be available. This means it can tolerate three nodes being down. Having more nodes in the Zookeeper cluster delivers better resiliency and reliability of the whole cluster.
Zookeeper can run in clusters with an even number of nodes. The additional node, however, does not increase the resiliency of the cluster. A cluster with four nodes requires at least three nodes to be available and can tolerate only one node being down. Therefore it has exactly the same resiliency as a cluster with only three nodes.
The different Zookeeper nodes should be ideally placed into different data centers or network segments. Increasing the number of Zookeeper nodes increases the workload spent on cluster synchronization. For most Kafka use cases Zookeeper cluster with 3, 5 or 7 nodes should be fully sufficient.
Zookeeper cluster with 3 nodes can tolerate only 1 unavailable node. This means that when a cluster node crashes while you are doing maintenance on another node your Zookeeper cluster will be unavailable.
Replicated Zookeeper configuration supports all configuration options supported by the standalone configuration. Additional options are added for the clustering configuration:
initLimit-
Amount of time to allow followers to connect and sync to the cluster leader. The time is specified as a number of ticks (see the
timeTickoption for more details). syncLimit-
Amount of time for which followers can be behind the leader. The time is specified as a number of ticks (see the
timeTickoption for more details).
In addition to the options above, every configuration file should contain a list of servers which should be members of the Zookeeper cluster. The server records should be specified in the format server.id=hostname:port1:port2, where:
id- The ID of the Zookeeper cluster node.
hostname- The hostname or IP address where the node listens for connections.
port1- The port number used for intra-cluster communication.
port2- The port number used for leader election.
The following is an example configuration file of a Zookeeper cluster with three nodes:
timeTick=2000 dataDir=/var/lib/zookeeper/ clientPort=2181 initLimit=5 syncLimit=2 server.1=172.17.0.1:2888:3888 server.2=172.17.0.2:2888:3888 server.3=172.17.0.3:2888:3888
Each node in the Zookeeper cluster has to be assigned with a unique ID. Each node’s ID has to be configured in myid file and stored in the dataDir folder like /var/lib/zookeeper/. The myid files should contain only a single line with the written ID as text. The ID can be any integer from 1 to 255. You must manually create this file on each cluster node. Using this file, each Zookeeper instance will use the configuration from the corresponding server. line in the configuration file to configure its listeners. It will also use all other server. lines to identify other cluster members.
In the above example, there are three nodes, so each one will have a different myid with values 1, 2, and 3 respectively.
3.3. Running multi-node Zookeeper cluster
This procedure will show you how to configure and run Zookeeper as a multi-node cluster.
Prerequisites
- AMQ Streams is installed on all hosts which will be used as Zookeeper cluster nodes.
Running the cluster
Create the
myidfile in/var/lib/zookeeper/. Enter ID1for the first Zookeeper node,2for the second Zookeeper node, and so on.su - kafka echo "_<NodeID>_" > /var/lib/zookeeper/myid
For example:
su - kafka echo "1" > /var/lib/zookeeper/myid
Edit the Zookeeper
/opt/kafka/config/zookeeper.propertiesconfiguration file for the following:-
Set the option
dataDirto/var/lib/zookeeper/. + Configure theinitLimitandsyncLimitoptions. Add a list of all Zookeeper nodes. The list should include also the current node.
Example configuration for a node of Zookeeper cluster with five members
timeTick=2000 dataDir=/var/lib/zookeeper/ clientPort=2181 initLimit=5 syncLimit=2 server.1=172.17.0.1:2888:3888 server.2=172.17.0.2:2888:3888 server.3=172.17.0.3:2888:3888 server.4=172.17.0.4:2888:3888 server.5=172.17.0.5:2888:3888
-
Set the option
Start Zookeeper with the default configuration file.
su - kafka /opt/kafka/bin/zookeeper-server-start.sh -daemon /opt/kafka/config/zookeeper.properties
Verify that the Zookeeper is running.
jcmd | grep zookeeper
- Repeat this procedure on all the nodes of the cluster.
Once all nodes of the clusters are up and running, verify that all nodes are members of the cluster by sending a
statcommand to each of the nodes usingncatutility.Use ncat stat to check the node status
echo stat | ncat localhost 2181
In the output you should see information that the node is either
leaderorfollower.Example output from the
ncatcommandZookeeper version: 3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on 06/29/2018 00:39 GMT Clients: /0:0:0:0:0:0:0:1:59726[0](queued=0,recved=1,sent=0) Latency min/avg/max: 0/0/0 Received: 2 Sent: 1 Connections: 1 Outstanding: 0 Zxid: 0x200000000 Mode: follower Node count: 4
Additional resources
- For more information about installing AMQ Streams, see Section 2.3, “Installing AMQ Streams”.
- For more information about configuring AMQ Streams, see Section 2.7, “Configuring AMQ Streams”.
3.4. Authentication
By default, Zookeeper does not use any form of authentication and allows anonymous connections. However, it supports Java Authentication and Authorization Service (JAAS) which can be used to set up authentication using Simple Authentication and Security Layer (SASL). Zookeeper supports authentication using the DIGEST-MD5 SASL mechanism with locally stored credentials.
3.4.1. Authentication with SASL
JAAS is configured using a separate configuration file. It is recommended to place the JAAS configuration file in the same directory as the Zookeeper configuration (/opt/kafka/config/). The recommended file name is zookeeper-jaas.conf. When using a Zookeeper cluster with multiple nodes, the JAAS configuration file has to be created on all cluster nodes.
JAAS is configured using contexts. Separate parts such as the server and client are always configured with a separate context. The context is a configuration option and has the following format:
ContextName {
param1
param2;
};SASL Authentication is configured separately for server-to-server communication (communication between Zookeeper instances) and client-to-server communication (communication between Kafka and Zookeeper). Server-to-server authentication is relevant only for Zookeeper clusters with multiple nodes.
Server-to-Server authentication
For server-to-server authentication, the JAAS configuration file contains two parts:
- The server configuration
- The client configuration
When using DIGEST-MD5 SASL mechanism, the QuorumServer context is used to configure the authentication server. It must contain all the usernames to be allowed to connect together with their passwords in an unencrypted form. The second context, QuorumLearner, has to be configured for the client which is built into Zookeeper. It also contains the password in an unencrypted form. An example of the JAAS configuration file for DIGEST-MD5 mechanism can be found below:
QuorumServer {
org.apache.zookeeper.server.auth.DigestLoginModule required
user_zookeeper="123456";
};
QuorumLearner {
org.apache.zookeeper.server.auth.DigestLoginModule required
username="zookeeper"
password="123456";
};In addition to the JAAS configuration file, you must enable the server-to-server authentication in the regular Zookeeper configuration file by specifying the following options:
quorum.auth.enableSasl=true quorum.auth.learnerRequireSasl=true quorum.auth.serverRequireSasl=true quorum.auth.learner.loginContext=QuorumLearner quorum.auth.server.loginContext=QuorumServer quorum.cnxn.threads.size=20
Use the EXTRA_ARGS environment variable to pass the JAAS configuration file to the Zookeeper server as a Java property:
su - kafka export EXTRA_ARGS="-Djava.security.auth.login.config=/opt/kafka/config/zookeeper-jaas.conf"; /opt/kafka/bin/zookeeper-server-start.sh -daemon /opt/kafka/config/zookeeper.properties
For more information about server-to-server authentication, see Zookeeper wiki.
Client-to-Server authentication
Client-to-server authentication is configured in the same JAAS file as the server-to-server authentication. However, unlike the server-to-server authentication, it contains only the server configuration. The client part of the configuration has to be done in the client. For information on how to configure a Kafka broker to connect to Zookeeper using authentication, see the Kafka installation section.
Add the Server context to the JAAS configuration file to configure client-to-server authentication. For DIGEST-MD5 mechanism it configures all usernames and passwords:
Server {
org.apache.zookeeper.server.auth.DigestLoginModule required
user_super="123456"
user_kafka="123456"
user_someoneelse="123456";
};After configuring the JAAS context, enable the client-to-server authentication in the Zookeeper configuration file by adding the following line:
requireClientAuthScheme=sasl authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider authProvider.2=org.apache.zookeeper.server.auth.SASLAuthenticationProvider authProvider.3=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
You must add the authProvider.<ID> property for every server that is part of the Zookeeper cluster.
Use the EXTRA_ARGS environment variable to pass the JAAS configuration file to the Zookeeper server as a Java property:
su - kafka export EXTRA_ARGS="-Djava.security.auth.login.config=/opt/kafka/config/zookeeper-jaas.conf"; /opt/kafka/bin/zookeeper-server-start.sh -daemon /opt/kafka/config/zookeeper.properties
For more information about configuring Zookeeper authentication in Kafka brokers, see Section 4.6, “Zookeeper authentication”.
3.4.2. Enabling Server-to-server authentication using DIGEST-MD5
This procedure describes how to enable authentication using the SASL DIGEST-MD5 mechanism between the nodes of the Zookeeper cluster.
Prerequisites
- AMQ Streams is installed on the host
- Zookeeper cluster is configured with multiple nodes.
Enabling SASL DIGEST-MD5 authentication
On all Zookeeper nodes, create or edit the
/opt/kafka/config/zookeeper-jaas.confJAAS configuration file and add the following contexts:QuorumServer { org.apache.zookeeper.server.auth.DigestLoginModule required user__<Username>_="<Password>"; }; QuorumLearner { org.apache.zookeeper.server.auth.DigestLoginModule required username="<Username>" password="<Password>"; };The username and password must be the same in both JAAS contexts. For example:
QuorumServer { org.apache.zookeeper.server.auth.DigestLoginModule required user_zookeeper="123456"; }; QuorumLearner { org.apache.zookeeper.server.auth.DigestLoginModule required username="zookeeper" password="123456"; };On all Zookeeper nodes, edit the
/opt/kafka/config/zookeeper.propertiesZookeeper configuration file and set the following options:quorum.auth.enableSasl=true quorum.auth.learnerRequireSasl=true quorum.auth.serverRequireSasl=true quorum.auth.learner.loginContext=QuorumLearner quorum.auth.server.loginContext=QuorumServer quorum.cnxn.threads.size=20
Restart all Zookeeper nodes one by one. To pass the JAAS configuration to Zookeeper, use the
EXTRA_ARGSenvironment variable.su - kafka export EXTRA_ARGS="-Djava.security.auth.login.config=/opt/kafka/config/zookeeper-jaas.conf"; /opt/kafka/bin/zookeeper-server-start.sh -daemon /opt/kafka/config/zookeeper.properties
Additional resources
- For more information about installing AMQ Streams, see Section 2.3, “Installing AMQ Streams”.
- For more information about configuring AMQ Streams, see Section 2.7, “Configuring AMQ Streams”.
- For more information about running a Zookeeper cluster, see Section 3.3, “Running multi-node Zookeeper cluster”.
3.4.3. Enabling Client-to-server authentication using DIGEST-MD5
This procedure describes how to enable authentication using the SASL DIGEST-MD5 mechanism between Zookeeper clients and Zookeeper.
Prerequisites
- AMQ Streams is installed on the host
- Zookeeper cluster is configured and running.
Enabling SASL DIGEST-MD5 authentication
On all Zookeeper nodes, create or edit the
/opt/kafka/config/zookeeper-jaas.confJAAS configuration file and add the following context:Server { org.apache.zookeeper.server.auth.DigestLoginModule required user_super="<SuperUserPassword>" user<Username1>_="<Password1>" user<USername2>_="<Password2>"; };The
superwill have automatically administrator priviledges. The file can contain multiple users, but only one additional user is required by the Kafka brokers. The recommended name for the Kafka user iskafka.The following example shows the
Servercontext for client-to-server authentication:Server { org.apache.zookeeper.server.auth.DigestLoginModule required user_super="123456" user_kafka="123456"; };On all Zookeeper nodes, edit the
/opt/kafka/config/zookeeper.propertiesZookeeper configuration file and set the following options:requireClientAuthScheme=sasl authProvider.<IdOfBroker1>=org.apache.zookeeper.server.auth.SASLAuthenticationProvider authProvider.<IdOfBroker2>=org.apache.zookeeper.server.auth.SASLAuthenticationProvider authProvider.<IdOfBroker3>=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
The
authProvider.<ID>property has to be added for every node which is part of the Zookeeper cluster. An example three-node Zookeeper cluster configuration must look like the following:requireClientAuthScheme=sasl authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider authProvider.2=org.apache.zookeeper.server.auth.SASLAuthenticationProvider authProvider.3=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
Restart all Zookeeper nodes one by one. To pass the JAAS configuration to Zookeeper, use the
EXTRA_ARGSenvironment variable.su - kafka export EXTRA_ARGS="-Djava.security.auth.login.config=/opt/kafka/config/zookeeper-jaas.conf"; /opt/kafka/bin/zookeeper-server-start.sh -daemon /opt/kafka/config/zookeeper.properties
Additional resources
- For more information about installing AMQ Streams, see Section 2.3, “Installing AMQ Streams”.
- For more information about configuring AMQ Streams, see Section 2.7, “Configuring AMQ Streams”.
- For more information about running a Zookeeper cluster, see Section 3.3, “Running multi-node Zookeeper cluster”.
3.5. Authorization
Zookeeper supports access control lists (ACLs) to protect data stored inside it. Kafka brokers can automatically configure the ACL rights for all Zookeeper records they create so no other Zookeeper user can modify them.
For more information about enabling Zookeeper ACLs in Kafka brokers, see Section 4.7, “Zookeeper authorization”.
3.6. TLS
The version of Zookeeper which is part of AMQ Streams currently does not support TLS for encryption or authentication.
3.7. Additional configuration options
You can set the following options based on your use case:
maxClientCnxns- The maximum number of concurrent client connections to a single member of the ZooKeeper cluster.
autopurge.snapRetainCount-
Number of snapshots of Zookeeper’s in-memory database which will be retained. Default value is
3. autopurge.purgeInterval-
The time interval in hours for purging snapshots. The default value is
0and this option is disabled.
All available configuration options can be found in Zookeeper documentation.
3.8. Logging
Zookeeper is using log4j as their logging infrastructure. Logging configuration is by default read from the log4j.propeties configuration file which should be placed either in the /opt/kafka/config/ directory or in the classpath. The location and name of the configuration file can be changed using the Java property log4j.configuration which can be passed to Zookeeper using the KAFKA_LOG4J_OPTS environment variable:
su - kafka export KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:/my/path/to/log4j.properties"; /opt/kafka/bin/zookeeper-server-start.sh -daemon /opt/kafka/config/zookeeper.properties
For more information about Log4j configurations, see Log4j documentation.
Chapter 4. Configuring Kafka
Kafka uses a properties file to store static configuration. The recommended location for the configuration file is /opt/kafka/config/server.properties. The configuration file has to be readable by the kafka user.
AMQ Streams ships an example configuration file that highlights various basic and advanced features of the product. It can be found under config/server.properties in the AMQ Streams installation directory.
This chapter explains the most important configuration options. For a complete list of supported Kafka broker configuration options, see Appendix A, Broker configuration parameters.
4.1. Zookeeper
Kafka brokers need Zookeeper to store some parts of their configuration as well as to coordinate the cluster (for example to decide which node is a leader for which partition). Connection details for the Zookeeper cluster are stored in the configuration file. The field zookeeper.connect contains a comma-separated list of hostnames and ports of members of the zookeeper cluster.
For example:
zookeeper.connect=zoo1.my-domain.com:2181,zoo2.my-domain.com:2181,zoo3.my-domain.com:2181
Kafka will use these addresses to connect to the Zookeeper cluster. With this configuration, all Kafka znodes will be created directly in the root of Zookeeper database. Therefore, such a Zookeeper cluster could be used only for a single Kafka cluster. To configure multiple Kafka clusters to use single Zookeeper cluster, specify a base (prefix) path at the end of the Zookeeper connection string in the Kafka configuration file:
zookeeper.connect=zoo1.my-domain.com:2181,zoo2.my-domain.com:2181,zoo3.my-domain.com:2181/my-cluster-1
4.2. Listeners
Kafka brokers can be configured to use multiple listeners. Each listener can be used to listen on a different port or network interface and can have different configuration. Listeners are configured in the listeners property in the configuration file. The listeners property contains a list of listeners with each listener configured as <listenerName>://<hostname>:_<port>_. When the hostname value is empty, Kafka will use java.net.InetAddress.getCanonicalHostName() as hostname. The following example shows how multiple listeners might be configured:
listeners=INT1://:9092,INT2://:9093,REPLICATION://:9094
When a Kafka client wants to connect to a Kafka cluster, it first connects to a bootstrap server. The bootstrap server is one of the cluster nodes. It will provide the client with a list of all other brokers which are part of the cluster and the client will connect to them individually. By default the bootstrap server will provide the client with a list of nodes based on the listeners field.
Advertised listeners
It is possible to give the client a different set of addresses than given in the listeners property. It is useful in situations when additional network infrastructure, such as a proxy, is between the client and the broker, or when an external DNS name should be used instead of an IP address. Here, the broker allows defining the advertised addresses of the listeners in the advertised.listeners configuration property. This property has the same format as the listeners property. The following example shows how to configure advertised listeners:
listeners=INT1://:9092,INT2://:9093 advertised.listeners=INT1://my-broker-1.my-domain.com:1234,INT2://my-broker-1.my-domain.com:1234:9093
The names of the listeners have to match the names of the listeners from the listeners property.
Inter-broker listeners
When the cluster has replicated topics, the brokers responsible for such topics need to communicate with each other in order to replicate the messages in those topics. When multiple listeners are configured, the configuration field inter.broker.listener.name can be used to specify the name of the listener which should be used for replication between brokers. For example:
inter.broker.listener.name=REPLICATION
4.3. Commit logs
Apache Kafka stores all records it receives from producers in commit logs. The commit logs contain the actual data, in the form of records, that Kafka needs to deliver. These are not the application log files which record what the broker is doing.
Log directories
You can configure log directories using the log.dirs property filed to store commit logs in one or multiple log directories. It should be set to /var/lib/zookeeper directory created during installation:
log.dirs=/var/lib/zookeeper
For performance reasons, you can configure log.dirs to multiple directories and place each of them on a different physical device to improve disk I/O performance. For example:
log.dirs=/var/lib/zookeeper1,/var/lib/zookeeper2,/var/lib/zookeeper3
4.4. Broker ID
Broker ID is a unique identifier for each broker in the cluster. You can assign an integer greater than or equal to 0 as broker ID. The broker ID is used to identify the brokers after restarts or crashes and it is therefore important that the id is stable and does not change over time. The broker ID is configured in the broker properties file:
broker.id=1
4.5. Running a multi-node Kafka cluster
This procedure will show you how to configure and run Kafka as a multi-node cluster.
Prerequisites
- AMQ Streams is installed on all hosts which will be used as Kafka brokers.
- Zookeeper cluster is configured and running.
Running the cluster
Edit the
/opt/kafka/config/server.propertiesKafka configuration file for the following:Set the
broker.idfield to0for the first broker,1for the second broker, and so on. + Configure the details for connecting to Zookeeper in thezookeeper.connectoption. + Configure the Kafka listeners. + Set the directories where the commit logs should be stored in thelogs.dirdirectory.The following example shows the configuration for a Kafka broker:
broker.id=0 zookeeper.connect=zoo1.my-domain.com:2181,zoo2.my-domain.com:2181,zoo3.my-domain.com:2181 listeners=REPLICATION://:9091,PLAINTEXT://:9092 inter.broker.listener.name=REPLICATION log.dirs=/var/lib/zookeeper
In a typical installation where each Kafka broker is running on identical hardware, only the
broker.idconfiguration property will differ between each broker config.
Start the Kafka broker with the default configuration file.
su - kafka /opt/kafka/bin/kafka-server-start.sh -daemon /opt/kafka/config/server.properties
Verify that the Kafka broker is running.
jcmd | grep Kafka
- Repeat this procedure on all the nodes of the Kafka cluster.
Once all nodes of the clusters are up and running, verify that all nodes are members of the Kafka cluster by sending a
dumpcommand to one of the Zookeeper nodes using thencatutility. The command will print all Kafka brokers registered in Zookeeper.Use ncat stat to check the node status
echo dump | ncat zoo1.my-domain.com 2181
The output should contain all Kafka brokers you just configured and started.
Example output from the
ncatcommand for Kafka cluster with 3 nodesSessionTracker dump: org.apache.zookeeper.server.quorum.LearnerSessionTracker@28848ab9 ephemeral nodes dump: Sessions with Ephemerals (3): 0x20000015dd00000: /brokers/ids/1 0x10000015dc70000: /controller /brokers/ids/0 0x10000015dc70001: /brokers/ids/2
Additional resources
- For more information about installing AMQ Streams, see Section 2.3, “Installing AMQ Streams”.
- For more information about configuring AMQ Streams, see Section 2.7, “Configuring AMQ Streams”.
- For more information about running a Zookeeper cluster, see Section 3.3, “Running multi-node Zookeeper cluster”.
- For a complete list of supported Kafka broker configuration options, see Appendix A, Broker configuration parameters.
4.6. Zookeeper authentication
By default, connections between Zookeeper and Kafka are not authenticated. However, Kafka and Zookeeper support Java Authentication and Authorization Service (JAAS) which can be used to set up authentication using Simple Authentication and Security Layer (SASL). Zookeeper supports authentication using the DIGEST-MD5 SASL mechanism with locally stored credentials.
4.6.1. JAAS Configuration
SASL authentication for Zookeeper connections has to be configured in the JAAS configuration file. By default, Kafka will use the JAAS context named Client for connecting to Zookeeper. The Client context should be configured in the /opt/kafka/config/jass.conf file. The context has to enable the PLAIN SASL authentication, as in the following example:
Client {
org.apache.kafka.common.security.plain.PlainLoginModule required
username="kafka"
password="123456";
};4.6.2. Enabling Zookeeper authentication
This procedure describes how to enable authentication using the SASL DIGEST-MD5 mechanism when connecting to Zookeeper.
Prerequisites
- Client-to-server authentication is enabled in Zookeeper
Enabling SASL DIGEST-MD5 authentication
On all Kafka broker nodes, create or edit the
/opt/kafka/config/jaas.confJAAS configuration file and add the following context:Client { org.apache.kafka.common.security.plain.PlainLoginModule required username="<Username>" password="<Password>"; };The username and password should be the same as configured in Zookeeper.
Following example shows the
Clientcontext:Client { org.apache.kafka.common.security.plain.PlainLoginModule required username="kafka" password="123456"; };Restart all Kafka broker nodes one by one. To pass the JAAS configuration to Kafka brokers, use the
KAFKA_OPTSenvironment variable.su - kafka export KAFKA_OPTS="-Djava.security.auth.login.config=/opt/kafka/config/jaas.conf"; /opt/kafka/bin/kafka-server-start.sh -daemon /opt/kafka/config/server.properties
Additional resources
- For more information about configuring client-to-server authentication in Zookeeper, see Section 3.4, “Authentication”.
4.7. Zookeeper authorization
When authentication is enabled between Kafka and Zookeeper, Kafka can be configured to automatically protect all its records with Access Control List (ACL) rules which will allow only the Kafka user to change the data. All other users will have read-only access.
4.7.1. ACL Configuration
Enforcement of ACL rules is controlled by the zookeeper.set.acl property in the config/server.properties Kafka configuration file and is disabled by default. To enabled the ACL protection set zookeeper.set.acl to true:
zookeeper.set.acl=true
Kafka will set the ACL rules only for newly created Zookeeper znodes. When the ACLs are only enabled after the first start of the cluster, the tool zookeeper-security-migration.sh can be used to set ACLs on all existing znodes.
The data stored in Zookeeper includes information such as topic names and their configuration. The Zookeeper database also contains the salted and hashed user credentials when SASL SCRAM authentication is used. But it does not include any records sent and received using Kafka. Kafka, in general, considers the data stored in Zookeeper as non-confidential. In case these data are considered confidential (for example because topic names contain customer identification) the only way how to protect them is by isolating Zookeeper on the network level and allowing access only to Kafka brokers.
4.7.2. Enabling Zookeeper ACLs for a new Kafka cluster
This procedure describes how to enable Zookeeper ACLs in Kafka configuration for a new Kafka cluster. Use this procedure only before the first start of the Kafka cluster. For enabling Zookeeper ACLs in already running cluster, see Section 4.7.3, “Enabling Zookeeper ACLs in an existing Kafka cluster”.
Prerequisites
- AMQ Streams is installed on all hosts which will be used as Kafka brokers.
- Zookeeper cluster is configured and running.
- Client-to-server authentication is enabled in Zookeeper.
- Zookeeper authentication is enabled in the Kafka brokers.
- Kafka broker have not yet been started.
Procedure
Edit the
/opt/kafka/config/server.propertiesKafka configuration file to set thezookeeper.set.aclfield totrueon all cluster nodes.zookeeper.set.acl=true
- Start the Kafka brokers.
Additional resources
- For more information about installing AMQ Streams, see Section 2.3, “Installing AMQ Streams”.
- For more information about configuring AMQ Streams, see Section 2.7, “Configuring AMQ Streams”.
- For more information about running a Zookeeper cluster, see Section 3.3, “Running multi-node Zookeeper cluster”.
- For more information about running a Kafka cluster, see Section 4.5, “Running a multi-node Kafka cluster”.
4.7.3. Enabling Zookeeper ACLs in an existing Kafka cluster
The zookeeper-security-migration.sh tool can to be used to set Zookeeper ACLs on all existing znodes. The zookeeper-security-migration.sh is available as part of AMQ Streams and can be found in the bin directory.
Prerequisites
- Kafka cluster is configured and running.
Enabling the Zookeeper ACLs
Edit the
/opt/kafka/config/server.propertiesKafka configuration file to set thezookeeper.set.aclfield totrueon all cluster nodes.zookeeper.set.acl=true
- Restart all Kafka brokers one by one
Set the ACLs on all existing Zookeeper
znodesusing thezookeeper-security-migration.shtool.su - kafka cd /opt/kafka KAFKA_OPTS="-Djava.security.auth.login.config=./config/jaas.conf"; ./bin/zookeeper-security-migration.sh --zookeeper.acl=secure --zookeeper.connect=_<ZookeeperURL>_ exit
For example:
su - kafka cd /opt/kafka KAFKA_OPTS="-Djava.security.auth.login.config=./config/jaas.conf"; ./bin/zookeeper-security-migration.sh --zookeeper.acl=secure --zookeeper.connect=zoo1.my-domain.com:2181 exit
4.8. Encryption and authentication
Kafka supports TLS for encrypting the communication with Kafka clients. Additionally, it supports two types of authentication:
- TLS client authentication based on X.509 certificates
- SASL Authentication based on a username and password
4.8.1. Listener configuration
Encryption and authentication in Kafka brokers is configured per listener. For more information about Kafka listener configuration, see Section 4.2, “Listeners”.
Each listener in the Kafka broker is configured with its own security protocol. The configuration property listener.security.protocol.map defines which listener uses which security protocol. It maps each listener name to its security protocol. Supported security protocols are:
PLAINTEXT- Listener without any encryption or authentication.
SSL- Listener using TLS encryption and, optionally, authentication using TLS client certificates.
SASL_PLAINTEXT- Listener without encryption but with SASL-based authentication.
SASL_SSL- Listener with TLS-based encryption and SASL-based authentication.
Given the following listeners configuration:
listeners=INT1://:9092,INT2://:9093,REPLICATION://:9094
the listener.security.protocol.map might look like this:
listener.security.protocol.map=INT1:SASL_PLAINTEXT,INT2:SASL_SSL,REPLICATION:SSL
This would configure the listener INT1 to use unencrypted connections with SASL authentication, the listener INT2 to use encrypted connections with SASL authentication and the REPLICATION interface to use TLS encryption (possibly with TLS client authentication). The same security protocol can be used multiple times. The following example is also a valid configuration:
listener.security.protocol.map=INT1:SSL,INT2:SSL,REPLICATION:SSL
Such a configuration would use TLS encryption and TLS authentication for all interfaces. The following chapters will explain in more detail how to configure TLS and SASL.
4.8.2. TLS Encryption
In order to use TLS encryption and server authentication, a keystore containing private and public keys has to be provided. This is usually done using a file in the Java Keystore (JKS) format. A path to this file is set in the ssl.keystore.location property. The ssl.keystore.password property should be used to set the password protecting the keystore. For example:
ssl.keystore.location=/path/to/keystore/server-1.jks ssl.keystore.password=123456
In some cases, an additional password is used to protect the private key. Any such password can be set using the ssl.key.password property.
Kafka is able to use keys signed by certification authorities as well as self-signed keys. Using keys signed by certification authorities should always be the preferred method. In order to allow clients to verify the identity of the Kafka broker they are connecting to, the certificate should always contain the advertised hostname(s) as its Common Name (CN) or in the Subject Alternative Names (SAN).
It is possible to use different SSL configurations for different listeners. All options starting with ssl. can be prefixed with listener.name.<NameOfTheListener>., where the name of the listener has to be always in lower case. This will override the default SSL configuration for that specific listener. The following example shows how to use different SSL configurations for different listeners:
listeners=INT1://:9092,INT2://:9093,REPLICATION://:9094 listener.security.protocol.map=INT1:SSL,INT2:SSL,REPLICATION:SSL # Default configuration - will be used for listeners INT1 and INT2 ssl.keystore.location=/path/to/keystore/server-1.jks ssl.keystore.password=123456 # Different configuration for listener REPLICATION listener.name.replication.ssl.keystore.location=/path/to/keystore/server-1.jks listener.name.replication.ssl.keystore.password=123456
Additional TLS configuration options
In addition to the main TLS configuration options described above, Kafka supports many options for fine-tuning the TLS configuration. For example, to enable or disable TLS / SSL protocols or cipher suites:
ssl.cipher.suites- List of enabled cipher suites. Each cipher suite is a combination of authentication, encryption, MAC and key exchange algorithms used for the TLS connection. By default, all available cipher suites are enabled.
ssl.enabled.protocols-
List of enabled TLS / SSL protocols. Defaults to
TLSv1.2,TLSv1.1,TLSv1.
For a complete list of supported Kafka broker configuration options, see Appendix A, Broker configuration parameters.
4.8.3. Enabling TLS encryption
This procedure describes how to enable encryption in Kafka brokers.
Prerequisites
- AMQ Streams is installed on all hosts which will be used as Kafka brokers.
Procedure
- Generate TLS certificates for all Kafka brokers in your cluster. The certificates should have their advertised and bootstrap addresses in their Common Name or Subject Alternative Name.
Edit the
/opt/kafka/config/server.propertiesKafka configuration file on all cluster nodes for the following:-
Change the
listener.security.protocol.mapfield to specify theSSLprotocol for the listener where you want to use TLS encryption. -
Set the
ssl.keystore.locationoption to the path to the JKS keystore with the broker certificate. Set the
ssl.keystore.passwordoption to the password you used to protect the keystore.For example:
listeners=UNENCRYPTED://:9092,ENCRYPTED://:9093,REPLICATION://:9094 listener.security.protocol.map=UNENCRYPTED:PLAINTEXT,ENCRYPTED:SSL,REPLICATION:PLAINTEXT ssl.keystore.location=/path/to/keystore/server-1.jks ssl.keystore.password=123456
-
Change the
- (Re)start the Kafka brokers
Additional resources
- For more information about configuring AMQ Streams, see Section 2.7, “Configuring AMQ Streams”.
- For more information about running a Kafka cluster, see Section 4.5, “Running a multi-node Kafka cluster”.
- For more information about configuring TLS encryption in clients, see Appendix D, Producer configuration parameters and Appendix C, Consumer configuration parameters.
4.8.4. Authentication
Kafka supports two methods of authentication. On all connections, authentication using one of the supported SASL (Simple Authentication and Security Layer) mechanisms can be used. On encrypted connections, TLS client authentication based on X.509 certificates can be used.
4.8.4.1. TLS client authentication
TLS client authentication can be used only on connections which are already using TLS encryption. To use TLS client authentication, a truststore with public keys can be provided to the broker. These keys can be used to authenticate clients connecting to the broker. The truststore should be provided in Java Keystore (JKS) format and should contain public keys of the certification authorities. All clients with public and private keys signed by one of the certification authorities included in the truststore will be authenticated. The location of the truststore is set using field ssl.truststore.location. In case the truststore is password protected, the password should be set in the ssl.truststore.password property. For example:
ssl.truststore.location=/path/to/keystore/server-1.jks ssl.truststore.password=123456
Once the truststore is configured, TLS client authentication has to be enabled using the ssl.client.auth property. This property can be set to one of three different values:
none- TLS client authentication is switched off. (Default value)
requested- TLS client authentication is optional. Clients will be asked to authenticate using TLS client certificate but they can choose not to.
required- Clients are required to authenticate using TLS client certificate.
When a client authenticates using TLS client authentication, the authenticated principal name is the distinguished name from the authenticated client certificate. For example, a user with a certificate which has a distinguished name CN=someuser will be authenticated with the following principal CN=someuser,OU=Unknown,O=Unknown,L=Unknown,ST=Unknown,C=Unknown. When TLS client authentication is not used and SASL is disabled, the principal name will be ANONYMOUS.
4.8.4.2. SASL authentication
SASL authentication is configured using Java Authentication and Authorization Service (JAAS). JAAS is also used for authentication of connections between Kafka and Zookeeper. JAAS uses its own configuration file. The recommended location for this file is /opt/kafka/config/jaas.conf. The file has to be readable by the kafka user. When running Kafka, the location of this file is specified using Java system property java.security.auth.login.config. This property has to be passed to Kafka when starting the broker nodes:
KAFKA_OPTS="-Djava.security.auth.login.config=/path/to/my/jaas.config"; bin/kafka-server-start.sh
SASL authentication is supported both through plain unencrypted connections as well as through TLS connections. SASL can be enabled individually for each listener. To enable it, the security protocol in listener.security.protocol.map has to be either SASL_PLAINTEXT or SASL_SSL.
SASL authentication in Kafka supports several different mechanisms:
PLAIN- Implements authentication based on username and passwords. Usernames and passwords are stored locally in Kafka configuration.
SCRAM-SHA-256andSCRAM-SHA-512- Implements authentication using Salted Challenge Response Authentication Mechanism (SCRAM). SCRAM credentials are stored centrally in Zookeeper. SCRAM can be used in situations where Zookeeper cluster nodes are running isolated in a private network.
GSSAPI- Implements authentication against a Kerberos server.
The PLAIN mechanism sends the username and password over the network in an unencrypted format. It should be therefore only be used in combination with TLS encryption.
The SASL mechanisms are configured via the JAAS configuration file. Kafka uses the JAAS context named KafkaServer. After they are configured in JAAS, the SASL mechanisms have to be enabled in the Kafka configuration. This is done using the sasl.enabled.mechanisms property. This property contains a comma-separated list of enabled mechanisms:
sasl.enabled.mechanisms=PLAIN,SCRAM-SHA-256,SCRAM-SHA-512
In case the listener used for inter-broker communication is using SASL, the property sasl.mechanism.inter.broker.protocol has to be used to specify the SASL mechanism which it should use. For example:
sasl.mechanism.inter.broker.protocol=PLAIN
The username and password which will be used for the inter-broker communication has to be specified in the KafkaServer JAAS context using the field username and password.
SASL PLAIN
To use the PLAIN mechanism, the usernames and password which are allowed to connect are specified directly in the JAAS context. The following example shows the context configured for SASL PLAIN authentication. The example configures three different users:
-
admin -
user1 -
user2
KafkaServer {
org.apache.kafka.common.security.plain.PlainLoginModule required
user_admin="123456"
user_user1="123456"
user_user2="123456";
};The JAAS configuration file with the user database should be kept in sync on all Kafka brokers.
When SASL PLAIN is also used for inter-broker authentication, the username and password properties should be included in the JAAS context:
KafkaServer {
org.apache.kafka.common.security.plain.PlainLoginModule required
username="admin"
password="123456"
user_admin="123456"
user_user1="123456"
user_user2="123456";
};SASL SCRAM
SCRAM authentication in Kafka consists of two mechanisms: SCRAM-SHA-256 and SCRAM-SHA-512. These mechanisms differ only in the hashing algorithm used - SHA-256 versus stronger SHA-512. To enable SCRAM authentication, the JAAS configuration file has to include the following configuration:
KafkaServer {
org.apache.kafka.common.security.scram.ScramLoginModule required;
};When enabling SASL authentication in the Kafka configuration file, both SCRAM mechanisms can be listed. However, only one of them can be chosen for the inter-broker communication. For example:
sasl.enabled.mechanisms=SCRAM-SHA-256,SCRAM-SHA-512 sasl.mechanism.inter.broker.protocol=SCRAM-SHA-512
User credentials for the SCRAM mechanism are stored in Zookeeper. The kafka-configs.sh tool can be used to manage them. For example, run the following command to add user user1 with password 123456:
bin/kafka-configs.sh --zookeeper zoo1.my-domain.com:2181 --alter --add-config 'SCRAM-SHA-256=[password=123456],SCRAM-SHA-512=[password=123456]' --entity-type users --entity-name user1
To delete a user credential use:
bin/kafka-configs.sh --zookeeper zoo1.my-domain.com:2181 --alter --delete-config 'SCRAM-SHA-512' --entity-type users --entity-name user1
SASL GSSAPI
The SASL mechanism used for authentication using Kerberos is called GSSAPI. To configure Kerberos SASL authentication, the following configuration should be added to the JAAS configuration file:
KafkaServer {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
storeKey=true
keyTab="/etc/security/keytabs/kafka_server.keytab"
principal="kafka/kafka1.hostname.com@EXAMPLE.COM";
};The domain name in the Kerberos principal has to be always in upper case.
In addition to the JAAS configuration, the Kerberos service name needs to be specified in the sasl.kerberos.service.name property in the Kafka configuration:
sasl.enabled.mechanisms=GSSAPI sasl.mechanism.inter.broker.protocol=GSSAPI sasl.kerberos.service.name=kafka
Multiple SASL mechanisms
Kafka can use multiple SASL mechanisms at the same time. The different JAAS configurations can be all added to the same context:
KafkaServer {
org.apache.kafka.common.security.plain.PlainLoginModule required
user_admin="123456"
user_user1="123456"
user_user2="123456";
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
storeKey=true
keyTab="/etc/security/keytabs/kafka_server.keytab"
principal="kafka/kafka1.hostname.com@EXAMPLE.COM";
org.apache.kafka.common.security.scram.ScramLoginModule required;
};When multiple mechanisms are enabled, clients will be able to choose the mechanism which they want to use.
4.8.5. Enabling TLS client authentication
This procedure describes how to enable TLS client authentication in Kafka brokers.
Prerequisites
Procedure
- Prepare a JKS truststore containing the public key of the certification authority used to sign the user certificates.
Edit the
/opt/kafka/config/server.propertiesKafka configuration file on all cluster nodes for the following:-
Set the
ssl.truststore.locationoption to the path to the JKS truststore with the certification authority of the user certificates. -
Set the
ssl.truststore.passwordoption to the password you used to protect the truststore. Set the
ssl.client.authoption torequired.For example:
ssl.truststore.location=/path/to/truststore.jks ssl.truststore.password=123456 ssl.client.auth=required
-
Set the
- (Re)start the Kafka brokers
Additional resources
- For more information about configuring AMQ Streams, see Section 2.7, “Configuring AMQ Streams”.
- For more information about running a Kafka cluster, see Section 4.5, “Running a multi-node Kafka cluster”.
- For more information about configuring TLS client authentication in clients, see Appendix D, Producer configuration parameters and Appendix C, Consumer configuration parameters.
4.8.6. Enabling SASL PLAIN authentication
This procedure describes how to enable SASL PLAIN authentication in Kafka brokers.
Prerequisites
- AMQ Streams is installed on all hosts which will be used as Kafka brokers.
Procedure
Edit or create the
/opt/kafka/config/jaas.confJAAS configuration file. This file should contain all your users and their passwords. Make sure this file is the same on all Kafka brokers.For example:
KafkaServer { org.apache.kafka.common.security.plain.PlainLoginModule required user_admin="123456" user_user1="123456" user_user2="123456"; };Edit the
/opt/kafka/config/server.propertiesKafka configuration file on all cluster nodes for the following:-
Change the
listener.security.protocol.mapfield to specify theSASL_PLAINTEXTorSASL_SSLprotocol for the listener where you want to use SASL PLAIN authentication. Set the
sasl.enabled.mechanismsoption toPLAIN.For example:
listeners=INSECURE://:9092,AUTHENTICATED://:9093,REPLICATION://:9094 listener.security.protocol.map=INSECURE:PLAINTEXT,AUTHENTICATED:SASL_PLAINTEXT,REPLICATION:PLAINTEXT sasl.enabled.mechanisms=PLAIN
-
Change the
(Re)start the Kafka brokers using the KAFKA_OPTS environment variable to pass the JAAS configuration to Kafka brokers.
su - kafka export KAFKA_OPTS="-Djava.security.auth.login.config=/opt/kafka/config/jaas.conf"; /opt/kafka/bin/kafka-server-start.sh -daemon /opt/kafka/config/server.properties
Additional resources
- For more information about configuring AMQ Streams, see Section 2.7, “Configuring AMQ Streams”.
- For more information about running a Kafka cluster, see Section 4.5, “Running a multi-node Kafka cluster”.
- For more information about configuring SASL PLAIN authentication in clients, see Appendix D, Producer configuration parameters and Appendix C, Consumer configuration parameters.
4.8.7. Enabling SASL SCRAM authentication
This procedure describes how to enable SASL SCRAM authentication in Kafka brokers.
Prerequisites
- AMQ Streams is installed on all hosts which will be used as Kafka brokers.
Procedure
Edit or create the
/opt/kafka/config/jaas.confJAAS configuration file. Enable theScramLoginModulefor theKafkaServercontext. Make sure this file is the same on all Kafka brokers.For example:
KafkaServer { org.apache.kafka.common.security.scram.ScramLoginModule required; };Edit the
/opt/kafka/config/server.propertiesKafka configuration file on all cluster nodes for the following:-
Change the
listener.security.protocol.mapfield to specify theSASL_PLAINTEXTorSASL_SSLprotocol for the listener where you want to use SASL SCRAM authentication. Set the
sasl.enabled.mechanismsoption toSCRAM-SHA-256orSCRAM-SHA-512.For example:
listeners=INSECURE://:9092,AUTHENTICATED://:9093,REPLICATION://:9094 listener.security.protocol.map=INSECURE:PLAINTEXT,AUTHENTICATED:SASL_PLAINTEXT,REPLICATION:PLAINTEXT sasl.enabled.mechanisms=SCRAM-SHA-512
-
Change the
(Re)start the Kafka brokers using the KAFKA_OPTS environment variable to pass the JAAS configuration to Kafka brokers.
su - kafka export KAFKA_OPTS="-Djava.security.auth.login.config=/opt/kafka/config/jaas.conf"; /opt/kafka/bin/kafka-server-start.sh -daemon /opt/kafka/config/server.properties
Additional resources
- For more information about configuring AMQ Streams, see Section 2.7, “Configuring AMQ Streams”.
- For more information about running a Kafka cluster, see Section 4.5, “Running a multi-node Kafka cluster”.
- For more information about adding SASL SCRAM users, see Section 4.8.8, “Adding SASL SCRAM users”.
- For more information about deleting SASL SCRAM users, see Section 4.8.9, “Deleting SASL SCRAM users”.
- For more information about configuring SASL SCRAM authentication in clients, see Appendix D, Producer configuration parameters and Appendix C, Consumer configuration parameters.
4.8.8. Adding SASL SCRAM users
This procedure describes how to add new users for authentication using SASL SCRAM.
Prerequisites
Procedure
Use the
kafka-configs.shtool to add new SASL SCRAM users.bin/kafka-configs.sh --zookeeper <ZookeeperAddress> --alter --add-config 'SCRAM-SHA-512=[password=<Password>]' --entity-type users --entity-name <Username>
For example:
bin/kafka-configs.sh --zookeeper zoo1.my-domain.com:2181 --alter --add-config 'SCRAM-SHA-512=[password=123456]' --entity-type users --entity-name user1
Additional resources
- For more information about configuring SASL SCRAM authentication in clients, see Appendix D, Producer configuration parameters and Appendix C, Consumer configuration parameters.
4.8.9. Deleting SASL SCRAM users
This procedure describes how to remove users when using SASL SCRAM authentication.
Prerequisites
Procedure
Use the
kafka-configs.shtool to delete SASL SCRAM users.bin/kafka-configs.sh --zookeeper <ZookeeperAddress> --alter --delete-config 'SCRAM-SHA-512' --entity-type users --entity-name <Username>
For example:
bin/kafka-configs.sh --zookeeper zoo1.my-domain.com:2181 --alter --delete-config 'SCRAM-SHA-512' --entity-type users --entity-name user1
Additional resources
- For more information about configuring SASL SCRAM authentication in clients, see Appendix D, Producer configuration parameters and Appendix C, Consumer configuration parameters.
4.9. Logging
Kafka brokers use Log4j as their logging infrastructure. Logging configuration is by default read from the log4j.propeties configuration file which should be placed either in the /opt/kafka/config/ directory or on the classpath. The location and name of the configuration file can be changed using the Java property log4j.configuration which can be passed to Kafka using the KAFKA_LOG4J_OPTS environment variable:
su - kafka export KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:/my/path/to/log4j.config"; /opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
For more information about Log4j configurations, see Log4j manual.
Chapter 5. Topics
Messages in Kafka are always sent to or received from a topic. This chapter describes how to configure and manage Kafka topics.
5.1. Partitions and replicas
Messages in Kafka are always sent to or received from a topic. A topic is always split into one or more partitions. Partitions act as shards. That means that every message sent by a producer is always written only into a single partition. Thanks to the sharding of messages into different partitions, topics are easy to scale horizontally.
Each partition can have one or more replicas, which will be stored on different brokers in the cluster. When creating a topic you can configure the number of replicas using the replication factor. Replication factor defines the number of copies which will be held within the cluster. One of the replicas for given partition will be elected as a leader. The leader replica will be used by the producers to send new messages and by the consumers to consume messages. The other replicas will be follower replicas. The followers replicate the leader.
If the leader fails, one of the followers will automatically become the new leader. Each server acts as a leader for some of its partitions and a follower for others so the load is well balanced within the cluster.
NOTE: The replication factor determines the number of replicas including the leader and the followers. For example, if you set the replication factor to 3, then there will one leader and two follower replicas.
5.2. Message retention
The message retention policy defines how long the messages will be stored on the Kafka brokers. It can be defined based on time, partition size or both.
For example, you can define that the messages should be kept:
- For 7 days
- Until the parition has 1GB of messages. Once the limit is reached, the oldest messages will be removed.
- For 7 days or until the 1GB limit has been reached. Whatever limit comes first will be used.
Kafka brokers store messages in log segments. The messages which are past their retention policy will be deleted only when a new log segment is created. New log segments are created when the previous log segment exceeds the configured log segment size. Additionally, users can request new segments to be created periodically.
Additionally, Kafka brokers support a compacting policy.
For a topic with the compacted policy, the broker will always keep only the last message for each key. The older messages with the same key will be removed from the partition. Because compacting is a periodically executed action, it does not happen immediately when the new message with the same key are sent to the partition. Instead it might take some time until the older messages are removed.
For more information about the message retention configuration options, see Section 5.5, “Topic configuration”.
5.3. Topic auto-creation
When a producer or consumer tries to sent to or received from from a topic which does not exist, Kafka will, by default, automatically create that topic. This behavior is controlled by the auto.create.topics.enable configuration property which is set to true by default.
To disable it, set auto.create.topics.enable to false in the Kafka broker configuration file:
auto.create.topics.enable=false
5.4. Topic deletion
Kafka offers the possibility to disable deletion of topics. This is configured through the delete.topic.enable property, which is set to true by default (that is, deleting topics is possible). When this property is set to false it will be not possible to delete topics and all attempts to delete topic will return success but the topic will not be deleted.
delete.topic.enable=false
5.5. Topic configuration
Auto-created topics will use the default topic configuration which can be specified in the broker properties file. However, when creating topics manually, their configuration can be specified at creation time. It is also possible to change a topic’s configuration after it has been created. The main topic configuration options for manually created topics are:
cleanup.policy-
Configures the retention policy to
deleteorcompact. Thedeletepolicy will delete old records. Thecompactpolicy will enable log compaction. The default value isdelete. For more information about log compaction, see Kafka website. compression.type-
Specifies the compression which is used for stored messages. Valid values are
gzip,snappy,lz4,uncompressed(no compression) andproducer(retain the compression codec used by the producer). The default value isproducer. max.message.bytes-
The maximum size of a batch of messages allowed by the Kafka broker, in bytes. The default value is
1000012. min.insync.replicas-
The minimum number of replicas which must be in sync for a write to be considered successful. The default value is
1. retention.ms-
Maximum number of milliseconds for which log segments will be retained. Log segments older than this value will be deleted. The default value is
604800000(7 days). retention.bytes-
The maximum number of bytes a partition will retain. Once the partition size grows over this limit, the oldest log segments will be deleted. Value of
-1indicates no limit. The default value is-1. segment.bytes-
The maximum file size of a single commit log segment file in bytes. When the segment reaches its size, a new segment will be started. The default value is
1073741824bytes (1 gibibyte).
For list of all supported topic configuration options, see Appendix B, Topic configuration parameters.
The defaults for auto-created topics can be specified in the Kafka broker configuration using similar options:
log.cleanup.policy-
See
cleanup.policyabove. compression.type-
See
compression.typeabove. message.max.bytes-
See
max.message.bytesabove. min.insync.replicas-
See
min.insync.replicasabove. log.retention.ms-
See
retention.msabove. log.retention.bytes-
See
retention.bytesabove. log.segment.bytes-
See
segment.bytesabove. default.replication.factor-
Default replication factor for automatically created topics. Default value is
1. num.partitions-
Default number of partitions for automatically created topics. Default value is
1.
For list of all supported Kafka broker configuration options, see Appendix A, Broker configuration parameters.
5.6. Internal topics
Internal topics are created and used internally by the Kafka brokers and clients. Kafka has several internal topics. These are used to store consumer offsets (__consumer_offsets) or transaction state (__transaction_state). These topics can be configured using dedicated Kafka broker configuration options starting with prefix offsets.topic. and transaction.state.log.. The most important configuration options are:
offsets.topic.replication.factor-
Number of replicas for
__consumer_offsetstopic. The default value is3. offsets.topic.num.partitions-
Number of partitions for
__consumer_offsetstopic. The default value is50. transaction.state.log.replication.factor-
Number of replicas for
__transaction_statetopic. The default value is3. transaction.state.log.num.partitions-
Number of partitions for
__transaction_statetopic. The default value is50. transaction.state.log.min.isr-
Minimum number of replicas that must acknowledge a write to
__transaction_statetopic to be considered successful. If this minimum cannot be met, then the producer will fail with an exception. The default value is2.
5.7. Creating a topic
The kafka-topics.sh tool can be used to manage topics. kafka-topics.sh is part of the AMQ Streams distribution and can be found in the bin directory.
Prerequisites
- AMQ Streams cluster is installed and running
Creating a topic
Create a topic using the
kafka-topics.shutility and specify the following: Zookeeper URL in the--zookeeperoption. The new topic to be created in the--createoption. Topic name in the--topicoption. The number of partitions in the--partitionsoption. Replication factor in the--replication-factoroption.You can also override some of the default topic configuration options using the option
--config. This option can be used multiple times to override different options.bin/kafka-topics.sh --zookeeper <ZookeeperAddress> --create --topic <TopicName> --partitions <NumberOfPartitions> --replication-factor <ReplicationFactor> --config <Option1>=<Value1> --config <Option2>=<Value2>
Example of the command to create a topic named
mytopicbin/kafka-topics.sh --zookeeper zoo1.my-domain.com:2181 --create --topic mytopic --partitions 50 --replication-factor 3 --config cleanup.policy=compact --config min.insync.replicas=2
Verify that the topic exists using
kafka-topics.sh.bin/kafka-topics.sh --zookeeper <ZookeeperAddress> --describe --topic <TopicName>
Example of the command to describe a topic named
mytopicbin/kafka-topics.sh --zookeeper zoo1.my-domain.com:2181 --describe --topic mytopic
Additional resources
- For more information about topic configuration, see Section 5.5, “Topic configuration”.
- For list of all supported topic configuration options, see Appendix B, Topic configuration parameters.
5.8. Listing and describing topics
The kafka-topics.sh tool can be used to list and describe topics. kafka-topics.sh is part of the AMQ Streams distribution and can be found in the bin directory.
Prerequisites
- AMQ Streams cluster is installed and running
-
Topic
mytopicexists
Describing a topic
Describe a topic using the
kafka-topics.shutility.-
Specify the Zookeeper URL in the
--zookeeperoption. -
Use
--describeoption to specify that you want to describe a topic. -
Topic name has to be specified in the
--topicoption. When the
--topicoption is ommited, it will describe all available topics.bin/kafka-topics.sh --zookeeper <ZookeeperAddress> --describe --topic <TopicName>
Example of the command to describe a topic named
mytopicbin/kafka-topics.sh --zookeeper zoo1.my-domain.com:2181 --describe --topic mytopic
The describe command will list all partitions and replicas which belong to this topic. It will also list all topic configuration options.
-
Specify the Zookeeper URL in the
Additional resources
- For more information about topic configuration, see Section 5.5, “Topic configuration”.
- For more information about creating topics, see Section 5.7, “Creating a topic”.
5.9. Modifying a topic configuration
The kafka-configs.sh tool can be used to modify topic configurations. kafka-configs.sh is part of the AMQ Streams distribution and can be found in the bin directory.
Prerequisites
- AMQ Streams cluster is installed and running
-
Topic
mytopicexists
Modify topic configuration
Use the
kafka-configs.shtool to get the current configuration.-
Specify the Zookeeper URL in the
--zookeeperoption. -
Set the
--entity-typeastopicand--entity-nameto the name of your topic. Use
--describeoption to get the current configuration.bin/kafka-configs.sh --zookeeper <ZookeeperAddress> --entity-type topics --entity-name <TopicName> --describe
Example of the command to get configuration of a topic named
mytopicbin/kafka-configs.sh --zookeeper zoo1.my-domain.com:2181 --entity-type topics --entity-name mytopic --describe
-
Specify the Zookeeper URL in the
Use the
kafka-configs.shtool to change the configuration.-
Specify the Zookeeper URL in the
--zookeeperoptions. -
Set the
--entity-typeastopicand--entity-nameto the name of your topic. -
Use
--alteroption to modify the current configuration. Specify the options you want to add or change in the option
--add-config.bin/kafka-configs.sh --zookeeper <ZookeeperAddress> --entity-type topics --entity-name <TopicName> --alter --add-config <Option>=<Value>
Example of the command to change configuration of a topic named
mytopicbin/kafka-configs.sh --zookeeper zoo1.my-domain.com:2181 --entity-type topics --entity-name mytopic --alter --add-config min.insync.replicas=1
-
Specify the Zookeeper URL in the
Use the
kafka-configs.shtool to delete an existing configuration option.-
Specify the Zookeeper URL in the
--zookeeperoptions. -
Set the
--entity-typeastopicand--entity-nameto the name of your topic. -
Use
--delete-configoption to remove existing configuration option. Specify the options you want to remove in the option
--remove-config.bin/kafka-configs.sh --zookeeper <ZookeeperAddress> --entity-type topics --entity-name <TopicName> --alter --delete-config <Option>
Example of the command to change configuration of a topic named
mytopicbin/kafka-configs.sh --zookeeper zoo1.my-domain.com:2181 --entity-type topics --entity-name mytopic --alter --delete-config min.insync.replicas
-
Specify the Zookeeper URL in the
Additional resources
- For more information about topic configuration, see Section 5.5, “Topic configuration”.
- For more information about creating topics, see Section 5.7, “Creating a topic”.
- For list of all supported topic configuration options, see Appendix B, Topic configuration parameters.
5.10. Deleting a topic
The kafka-topics.sh tool can be used to manage topics. kafka-topics.sh is part of the AMQ Streams distribution and can be found in the bin directory.
Prerequisites
- AMQ Streams cluster is installed and running
-
Topic
mytopicexists
Deleting a topic
Delete a topic using the
kafka-topics.shutility.-
Specify the Zookeeper URL in the
--zookeeperoption. -
Use
--deleteoption to specify that an existing topic should be deleted. Topic name has to be specified in the
--topicoption.bin/kafka-topics.sh --zookeeper <ZookeeperAddress> --delete --topic <TopicName>
Example of the command to create a topic named
mytopicbin/kafka-topics.sh --zookeeper zoo1.my-domain.com:2181 --delete --topic mytopic
-
Specify the Zookeeper URL in the
Verify that the topic was deleted using
kafka-topics.sh.bin/kafka-topics.sh --zookeeper <ZookeeperAddress> --listExample of the command to list all topics
bin/kafka-topics.sh --zookeeper zoo1.my-domain.com:2181 --list
Additional resources
- For more information about creating topics, see Section 5.7, “Creating a topic”.
Chapter 6. Scaling Clusters
6.1. Scaling Kafka clusters
6.1.1. Adding brokers to a cluster
The primary way of increasing throughput for a topic is to increase the number of partitions for that topic. That works because the partitions allow the load for that topic to be shared between the brokers in the cluster. When the brokers are all constrained by some resource (typically I/O), then using more partitions will not yield an increase in throughput. Instead, you must add brokers to the cluster.
When you add an extra broker to the cluster, AMQ Streams does not assign any partitions to it automatically. You have to decide which partitions to move from the existing brokers to the new broker.
Once the partitions have been redistributed between all brokers, each broker should have a lower resource utilization.
6.1.2. Removing brokers from the cluster
Before you remove a broker from a cluster, you must ensure that it is not assigned to any partitions. You should decide which remaining brokers will be responsible for each of the partitions on the broker being decommissioned. Once the broker has no assigned partitions, you can stop it.
6.2. Reassignment of partitions
The kafka-reassign-partitions.sh utility is used to reassign partitions to different brokers.
It has three different modes:
--generate- Takes a set of topics and brokers and generates a reassignment JSON file which will result in the partitions of those topics being assigned to those brokers. It is an easy way to generate a reassignment JSON file, but it operates on whole topics, so its use is not always appropriate.
--execute- Takes a reassignment JSON file and applies it to the partitions and brokers in the cluster. Brokers which are gaining partitions will become followers of the partition leader. For a given partition, once the new broker has caught up and joined the ISR the old broker will stop being a follower and will delete its replica.
--verify-
Using the same reassignment JSON file as the
--executestep,--verifychecks whether all of the partitions in the file have been moved to their intended brokers. If the reassignment is complete it will also remove any throttles which are in effect. Unless removed, throttles will continue to affect the cluster even after the reassignment has finished.
It is only possible to have one reassignment running in the cluster at any given time, and it is not possible to cancel a running reassignment. If you need to cancel a reassignment you have to wait for it to complete and then perform another reassignment to revert the effects of the first one. The kafka-reassign-partitions.sh will print the reassignment JSON for this reversion as part of its output. Very large reassignments should be broken down into a number of smaller reassignments in case there is a need to stop in-progress reassignment.
6.2.1. Reassignment JSON file
The reassignment JSON file has a specific structure:
{
"version": 1,
"partitions": [
<PartitionObjects>
]
}Where <PartitionObjects> is a comma-separated list of objects like:
{
"topic": <TopicName>,
"partition": <Partition>,
"replicas": [ <AssignedBrokerIds> ],
"log_dirs": [<LogDirs>]
}
The "log_dirs" property is optional and is used to move the partition to a specific log directory.
The following is an example reassignment JSON file that assigns topic topic-a, partition 4 to brokers 2, 4 and 7, and topic topic-b partition 2 to brokers 1, 5 and 7:
{
"version": 1,
"partitions": [
{
"topic": "topic-a",
"partition": 4,
"replicas": [2,4,7]
},
{
"topic": "topic-b",
"partition": 2,
"replicas": [1,5,7]
}
]
}Partitions not included in the JSON are not changed.
6.2.2. Generating reassignment JSON files
The easiest way to assign all the partitions for a given set of topics to a given set of brokers is to generate a reassignment JSON file using kafka-reassign-partitions.sh --generate, command.
bin/kafka-reassign-partitions.sh --zookeeper <Zookeeper> --topics-to-move-json-file <TopicsFile> --broker-list <BrokerList> --generate
The <TopicsFile> is a JSON file which lists the topics to move. It has the following structure:
{
"version": 1,
"topics": [
<TopicObjects>
]
}where <TopicObjects> is a comma-separated list of objects like:
{
"topic": <TopicName>
}
For example to move all the partitions of topic-a and topic-b to brokers 4 and 7
bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --topics-to-move-json-file topics-to-be-moved.json --broker-list 4,7 --generate
where topics-to-be-moved.json has contents:
{
"version": 1,
"topics": [
{ "topic": "topic-a"},
{ "topic": "topic-b"}
]
}6.2.3. Creating reassignment JSON files manually
You can manually create the reassignment JSON file if you want to move specific partitions.
6.3. Reassignment throttles
Reassigning partitions can be a slow process because it can require moving lots of data between brokers. To avoid this having a detrimental impact on clients it is possible to throttle the reassignment. Using a throttle can mean the reassignment takes longer. If the throttle is too low then the newly assigned brokers will not be able to keep up with records being published and the reassignment will never complete. If the throttle is too high then clients will be impacted. For example, for producers, this could manifest as higher than normal latency waiting for acknowledgement. For consumers, this could manifest as a drop in throughput caused by higher latency between polls.
6.4. Scaling up a Kafka cluster
This procedure describes how to increase the number of brokers in a Kafka cluster.
Prerequisites
- An existing Kafka cluster.
- A new machine with the AMQ broker installed.
- A reassignment JSON file of how partitions should be reassigned to brokers in the enlarged cluster.
Procedure
-
Create a configuration file for the new broker using the same settings as for the other brokers in your cluster, except for
broker.idwhich should be a number that is not already used by any of the other brokers. Start the new Kafka broker passing the configuration file you created in the previous step as the argument to the
kafka-server-start.shscript:su - kafka /opt/kafka/bin/kafka-server-start.sh -daemon /opt/kafka/config/server.properties
Verify that the Kafka broker is running.
jcmd | grep Kafka
- Repeat the above steps for each new broker.
Execute the partition reassignment using the
kafka-reassign-partitions.shcommand line tool.kafka-reassign-partitions.sh --zookeeper <ZookeeperHostAndPort> --reassignment-json-file <ReassignmentJsonFile> --execute
If you are going to throttle replication you can also pass the
--throttleoption with an inter-broker throttled rate in bytes per second. For example:kafka-reassign-partitions.sh --zookeeper zookeeper1:2181 --reassignment-json-file reassignment.json --throttle 5000000 --execute
This command will print out two reassignment JSON objects. The first records the current assignment for the partitions being moved. You should save this to a file in case you need to revert the reassignment later on. The second JSON object is the target reassignment you have passed in your reassignment JSON file.
If you need to change the throttle during reassignment you can use the same command line with a different throttled rate. For example:
kafka-reassign-partitions.sh --zookeeper zookeeper1:2181 --reassignment-json-file reassignment.json --throttle 10000000 --execute
Periodically verify whether the reassignment has completed using the
kafka-reassign-partitions.shcommand line tool. This is the same command as the previous step but with the--verifyoption instead of the--executeoption.kafka-reassign-partitions.sh --zookeeper <ZookeeperHostAndPort> --reassignment-json-file <ReassignmentJsonFile> --verify
For example:
kafka-reassign-partitions.sh --zookeeper zookeeper1:2181 --reassignment-json-file reassignment.json --verify
-
The reassignment has finished when the
--verifycommand reports each of the partitions being moved as completed successfully. This final--verifywill also have the effect of removing any reassignment throttles. You can now delete the revert file if you saved the JSON for reverting the assignment to their original brokers.
6.5. Scaling down a Kafka cluster
Additional resources
This procedure procedure describes how to decrease the number of brokers in a Kafka cluster.
Prerequisites
- An existing Kafka cluster.
- A reassignment JSON file of how partitions should be reassigned to brokers in the cluster once the broker(s) have been removed.
Procedure
Execute the partition reassignment using the
kafka-reassign-partitions.shcommand line tool.kafka-reassign-partitions.sh --zookeeper <ZookeeperHostAndPort> --reassignment-json-file <ReassignmentJsonFile> --execute
If you are going to throttle replication you can also pass the
--throttleoption with an inter-broker throttled rate in bytes per second. For example:kafka-reassign-partitions.sh --zookeeper zookeeper1:2181 --reassignment-json-file reassignment.json --throttle 5000000 --execute
This command will print out two reassignment JSON objects. The first records the current assignment for the partitions being moved. You should save this to a file in case you need to revert the reassignment later on. The second JSON object is the target reassignment you have passed in your reassignment JSON file.
If you need to change the throttle during reassignment you can use the same command line with a different throttled rate. For example:
kafka-reassign-partitions.sh --zookeeper zookeeper1:2181 --reassignment-json-file reassignment.json --throttle 10000000 --execute
Periodically verify whether the reassignment has completed using the
kafka-reassign-partitions.shcommand line tool. This is the same command as the previous step but with the--verifyoption instead of the--executeoption.kafka-reassign-partitions.sh --zookeeper <ZookeeperHostAndPort> --reassignment-json-file <ReassignmentJsonFile> --verify
For example:
kafka-reassign-partitions.sh --zookeeper zookeeper1:2181 --reassignment-json-file reassignment.json --verify
-
The reassignment has finished when the
--verifycommand reports each of the partitions being moved as completed successfully. This final--verifywill also have the effect of removing any reassignment throttles. You can now delete the revert file if you saved the JSON for reverting the assignment to their original brokers. Once all the partition reassignments have finished, the broker being removed should not have responsibility for any of the partitions in the cluster. You can verify this by checking each of the directories given in the broker’s
log.dirsconfiguration parameters. If any of the log directories on the broker contains a directory that does not match the extended regular expression\.[a-z0-9]-delete$then the broker still has live partitions and it should not be stopped.You can check this by executing the command:
ls -l <LogDir> | grep -E '^d' | grep -vE '[a-zA-Z0-9.-]+\.[a-z0-9]+-delete$'If the above command prints any output then the broker still has live partitions. In this case, either the reassignment has not finished, or the reassignment JSON file was incorrect.
Once you have confirmed that the broker has no live partitions you can stop it.
su - kafka /opt/kafka/bin/kafka-server-stop.sh
Confirm that the Kafka broker is stopped.
jcmd | grep kafka
Chapter 7. Kafka Connect
Additional resources
Kafka Connect is a tool for streaming data between Apache Kafka and external systems. It provides a framework for moving large amounts of data while maintaining scalability and reliability. Kafka Connect is typically used to integrate Kafka with database, storage, and messaging systems that are external to your Kafka cluster.
Kafka Connect uses connector plug-ins that implement connectivity for different types of external systems. There are two types of connector plug-ins: sink and source. Sink connectors stream data from Kafka to external systems. Source connectors stream data from external systems into Kafka.
Kafka Connect can run in standalone or distributed modes.
- Standalone mode
- In standalone mode, Kafka Connect runs on a single node with user-defined configuration read from a properties file.
- Distributed mode
- In distributed mode, Kafka Connect runs across one or more worker nodes and the workloads are distributed among them. You manage connectors and their configuration using an HTTP REST interface.
7.1. Kafka Connect in standalone mode
In standalone mode, Kafka Connect runs as a single process, on a single node. You manage the configuration of standalone mode using properties files.
7.1.1. Configuring Kafka Connect in standalone mode
To configure Kafka Connect in standalone mode, edit the config/connect-standalone.properties configuration file. The following options are the most important.
bootstrap.servers-
A list of Kafka broker addresses used as bootstrap connections to Kafka. For example,
kafka0.my-domain.com:9092,kafka1.my-domain.com:9092,kafka2.my-domain.com:9092. key.converter-
The class used to convert message keys to and from Kafka format. For example,
org.apache.kafka.connect.json.JsonConverter. value.converter-
The class used to convert message payloads to and from Kafka format. For example,
org.apache.kafka.connect.json.JsonConverter. offset.storage.file.filename- Specifies the file in which the offset data is stored.
An example configuration file is provided in the installation directory at config/connect-standalone.properties. For a complete list of all supported Kafka Connect configuration options, see [kafka-connect-configuration-parameters-str].
Connector plug-ins open client connections to the Kafka brokers using the bootstrap address. To configure these connections, use the standard Kafka producer and consumer configuration options prefixed by producer. or consumer..
For more information on configuring Kafka producers and consumers, see Appendix D, Producer configuration parameters and Appendix C, Consumer configuration parameters.
7.1.2. Configuring connectors in Kafka Connect in standalone mode
You can configure connector plug-ins for Kafka Connect in standalone mode using properties files. Most configuration options are specific to each connector. The following options apply to all connectors:
name- The name of the connector, which must be unique within the current Kafka Connect instance.
connector.class-
The class of the connector plug-in. For example,
org.apache.kafka.connect.file.FileStreamSinkConnector. tasks.max- The maximum number of tasks that the specified connector can use. Tasks enable the connector to perform work in parallel. The connector might create fewer tasks than specified.
key.converter-
The class used to convert message keys to and from Kafka format. This overrides the default value set by the Kafka Connect configuration. For example,
org.apache.kafka.connect.json.JsonConverter. value.converter-
The class used to convert message payloads to and from Kafka format. This overrides the default value set by the Kafka Connect configuration. For example,
org.apache.kafka.connect.json.JsonConverter.
Additionally, you must set at least one of the following options for sink connectors:
topics- A comma-separated list of topics used as input.
topics.regex- A Java regular expression of topics used as input.
For all other options, see the documentation for the available connectors.
AMQ Streams includes example connector configuration files – see config/connect-file-sink.properties and config/connect-file-source.properties in the AMQ Streams installation directory.
7.1.3. Running Kafka Connect in standalone mode
This procedure describes how to configure and run Kafka Connect in standalone mode.
Prerequisites
- An installed and running AMQ Streams} cluster.
Procedure
Edit the
/opt/kafka/config/connect-standalone.propertiesKafka Connect configuration file and setbootstrap.serverto point to your Kafka brokers. For example:bootstrap.servers=kafka0.my-domain.com:9092,kafka1.my-domain.com:9092,kafka2.my-domain.com:9092
Start Kafka Connect with the configuration file and specify one or more connector configurations.
su - kafka /opt/kafka/bin/connect-standalone.sh /opt/kafka/config/connect-standalone.properties connector1.properties [connector2.properties ...]
Verify that Kafka Connect is running.
jcmd | grep ConnectStandalone
Additional resources
- For more information on installing AMQ Streams, see Section 2.3, “Installing AMQ Streams”.
- For more information on configuring AMQ Streams, see Section 2.7, “Configuring AMQ Streams”.
- For a complete list of supported Kafka Connect configuration options, see Appendix F, Kafka Connect configuration parameters.
7.2. Kafka Connect in distributed mode
In distributed mode, Kafka Connect runs across one or more worker nodes and the workloads are distributed among them. You manage connector plug-ins and their configuration using the HTTP REST interface.
7.2.1. Configuring Kafka Connect in distributed mode
To configure Kafka Connect in distributed mode, edit the config/connect-distributed.properties configuration file. The following options are the most important.
bootstrap.servers-
A list of Kafka broker addresses used as bootstrap connections to Kafka. For example,
kafka0.my-domain.com:9092,kafka1.my-domain.com:9092,kafka2.my-domain.com:9092. key.converter-
The class used to convert message keys to and from Kafka format. For example,
org.apache.kafka.connect.json.JsonConverter. value.converter-
The class used to convert message payloads to and from Kafka format. For example,
org.apache.kafka.connect.json.JsonConverter. group.id-
The name of the distributed Kafka Connect cluster. This must be unique and must not conflict with another consumer group ID. The default value is
connect-cluster. config.storage.topic-
The Kafka topic used to store connector configurations. The default value is
connect-configs. offset.storage.topic-
The Kafka topic used to store offsets. The default value is
connect-offset. status.storage.topic-
The Kafka topic used for worker node statuses. The default value is
connect-status.
AMQ Streams includes an example configuration file for Kafka Connect in distributed mode – see config/connect-distributed.properties in the AMQ Streams installation directory.
For a complete list of all supported Kafka Connect configuration options, see Appendix F, Kafka Connect configuration parameters.
Connector plug-ins open client connections to the Kafka brokers using the bootstrap address. To configure these connections, use the standard Kafka producer and consumer configuration options prefixed by producer. or consumer..
For more information on configuring Kafka producers and consumers, see Appendix D, Producer configuration parameters and Appendix C, Consumer configuration parameters.
7.2.2. Configuring connectors in distributed Kafka Connect
HTTP REST Interface
Connectors for distributed Kafka Connect are configured using HTTP REST interface. The REST interface listens on port 8083 by default. It supports following endpoints:
GET /connectors- Return a list of existing connectors.
POST /connectors- Create a connector. The request body has to be a JSON object with the connector configuration.
GET /connectors/<name>- Get information about a specific connector.
GET /connectors/<name>/config- Get configuration of a specific connector.
PUT /connectors/<name>/config- Update the configuration of a specific connector.
GET /connectors/<name>/status- Get the status of a specific connector.
PUT /connectors/<name>/pause- Pause the connector and all its tasks. The connector will stop processing any messages.
PUT /connectors/<name>/resume- Resume a paused connector.
POST /connectors/<name>/restart- Restart a connector in case it has failed.
DELETE /connectors/<name>- Delete a connector.
GET /connector-plugins- Get a list of all supported connector plugins.
Connector configuration
Most configuration options are connector specific and included in the documentation for the connectors. The following fields are common for all connectors.
name- Name of the connector. Must be unique within a given Kafka Connect instance.
connector.class-
Class of the connector plugin. For example
org.apache.kafka.connect.file.FileStreamSinkConnector. tasks.max- The maximum number of tasks used by this connector. Tasks are used by the connector to parallelise its work. Connetors may create fewer tasks than specified.
key.converter-
Class used to convert message keys to and from Kafka format. This overrides the default value set by the Kafka Connect configuration. For example,
org.apache.kafka.connect.json.JsonConverter. value.converter-
Class used to convert message payloads to and from Kafka format. This overrides the default value set by the Kafka Connect configuration. For example,
org.apache.kafka.connect.json.JsonConverter.
Additionally, one of the following options must be set for sink connectors:
topics- A comma-separated list of topics used as input.
topics.regex- A Java regular expression of topics used as input.
For all other options, see the documentation for the specific connector.
AMQ Streams includes example connector configuration files. They can be found in config/connect-file-sink.properties and config/connect-file-source.properties in the AMQ Streams installation directory.
7.2.3. Running distributed Kafka Connect
This procedure describes how to configure and run Kafka Connect in distributed mode.
Prerequisites
- An installed and running AMQ Streams cluster.
Running the cluster
Edit the
/opt/kafka/config/connect-distributed.propertiesKafka Connect configuration file on all Kafka Connect worker nodes.-
Set the
bootstrap.serveroption to point to your Kafka brokers. -
Set the
group.idoption. -
Set the
config.storage.topicoption. -
Set the
offset.storage.topicoption. Set the
status.storage.topicoption.For example:
bootstrap.servers=kafka0.my-domain.com:9092,kafka1.my-domain.com:9092,kafka2.my-domain.com:9092 group.id=my-group-id config.storage.topic=my-group-id-configs offset.storage.topic=my-group-id-offsets status.storage.topic=my-group-id-status
-
Set the
Start the Kafka Connect workers with the
/opt/kafka/config/connect-distributed.propertiesconfiguration file on all Kafka Connect nodes.su - kafka /opt/kafka/bin/connect-distributed.sh /opt/kafka/config/connect-distributed.properties
Verify that Kafka Connect is running.
jcmd | grep ConnectDistributed
Additional resources
- For more information about installing AMQ Streams, see Section 2.3, “Installing AMQ Streams”.
- For more information about configuring AMQ Streams, see Section 2.7, “Configuring AMQ Streams”.
- For a complete list of supported Kafka Connect configuration options, see Appendix F, Kafka Connect configuration parameters.
7.2.4. Creating connectors
This procedure describes how to use the Kafka Connect REST API to create a connector plug-in for use with Kafka Connect in distributed mode.
Prerequisites
- A Kafka Connect installation running in distributed mode.
Procedure
Prepare a JSON payload with the connector configuration. For example:
{ "name": "my-connector", "config": { "connector.class": "org.apache.kafka.connect.file.FileStreamSinkConnector", "tasks.max": "1", "topics": "my-topic-1,my-topic-2", "file": "/tmp/output-file.txt" } }Send a POST request to
<KafkaConnectAddress>:8083/connectorsto create the connector. The following example usescurl:curl -X POST -H "Content-Type: application/json" --data @sink-connector.json http://connect0.my-domain.com:8083/connectors
Verify that the connector was deployed by sending a GET request to
<KafkaConnectAddress>:8083/connectors. The following example usescurl:curl http://connect0.my-domain.com:8083/connectors
7.2.5. Deleting connectors
This procedure describes how to use the Kafka Connect REST API to delete a connector plug-in from Kafka Connect in distributed mode.
Prerequisites
- A Kafka Connect installation running in distributed mode.
Deleting connectors
Verify that the connector exists by sending a
GETrequest to<KafkaConnectAddress>:8083/connectors/<ConnectorName>. The following example usescurl:curl http://connect0.my-domain.com:8083/connectors
To delete the connector, send a
DELETErequest to<KafkaConnectAddress>:8083/connectors. The following example usescurl:curl -X DELETE http://connect0.my-domain.com:8083/connectors/my-connector
Verify that the connector was deleted by sending a GET request to
<KafkaConnectAddress>:8083/connectors. The following example usescurl:curl http://connect0.my-domain.com:8083/connectors
7.3. Connector plug-ins
The following connector plug-ins are included with AMQ Streams.
FileStreamSink Reads data from Kafka topics and writes the data to a file.
FileStreamSource Reads data from a file and sends the data to Kafka topics.
You can add more connector plug-ins if needed. Kafka Connect searches for and runs additional connector plug-ins at startup. To define the path that kafka Connect searches for plug-ins, set the plugin.path configuration option:
plugin.path=/opt/kafka/connector-plugins,/opt/connectors
The plugin.path configuration option can contain a comma-separated list of paths.
When running Kafka Connect in distributed mode, plug-ins must be made available on all worker nodes.
7.4. Adding connector plugins
This procedure shows you how to add additional connector plug-ins.
Prerequisites
- An installed and running AMQ Streams cluster.
Procedure
Create the
/opt/kafka/connector-pluginsdirectory.su - kafka mkdir /opt/kafka/connector-plugins
Edit the
/opt/kafka/config/connect-standalone.propertiesor/opt/kafka/config/connect-distributed.propertiesKafka Connect configuration file, and set theplugin.pathoption to/opt/kafka/connector-plugins. For example:plugin.path=/opt/kafka/connector-plugins
-
Copy your connector plug-ins to
/opt/kafka/connector-plugins. - Start or restart the Kafka Connect workers.
Additional resources
- For more information on installing AMQ Streams, see Section 2.3, “Installing AMQ Streams”.
- For more information on configuring AMQ Streams, see Section 2.7, “Configuring AMQ Streams”.
- For more information on running Kafka Connect in standalone mode, see Section 7.1.3, “Running Kafka Connect in standalone mode”.
- For more information on running Kafka Connect in distributed mode, see Section 7.2.3, “Running distributed Kafka Connect”.
- For a complete list of supported Kafka Connect configuration options, see Appendix F, Kafka Connect configuration parameters.
Appendix A. Broker configuration parameters
| Name | Description | Type | Default | Valid Values | Importance |
|---|---|---|---|---|---|
|
| Zookeeper host string | string | high | ||
|
|
DEPRECATED: only used when | string | null | high | |
|
|
Listeners to publish to ZooKeeper for clients to use, if different than the | string | null | high | |
|
|
DEPRECATED: only used when | int | null | high | |
|
| Enable auto creation of topic on the server | boolean | true | high | |
|
| Enables auto leader balancing. A background thread checks and triggers leader balance if required at regular intervals | boolean | true | high | |
|
| The number of threads to use for various background processing tasks | int | 10 | [1,…] | high |
|
| The broker id for this server. If unset, a unique broker id will be generated.To avoid conflicts between zookeeper generated broker id’s and user configured broker id’s, generated broker ids start from reserved.broker.max.id + 1. | int | -1 | high | |
|
| Specify the final compression type for a given topic. This configuration accepts the standard compression codecs ('gzip', 'snappy', 'lz4'). It additionally accepts 'uncompressed' which is equivalent to no compression; and 'producer' which means retain the original compression codec set by the producer. | string | producer | high | |
|
| Enables delete topic. Delete topic through the admin tool will have no effect if this config is turned off | boolean | true | high | |
|
|
DEPRECATED: only used when | string | "" | high | |
|
| The frequency with which the partition rebalance check is triggered by the controller | long | 300 | high | |
|
| The ratio of leader imbalance allowed per broker. The controller would trigger a leader balance if it goes above this value per broker. The value is specified in percentage. | int | 10 | high | |
|
| Listener List - Comma-separated list of URIs we will listen on and the listener names. If the listener name is not a security protocol, listener.security.protocol.map must also be set. Specify hostname as 0.0.0.0 to bind to all interfaces. Leave hostname empty to bind to default interface. Examples of legal listener lists: PLAINTEXT://myhost:9092,SSL://:9091 CLIENT://0.0.0.0:9092,REPLICATION://localhost:9093 | string | null | high | |
|
| The directory in which the log data is kept (supplemental for log.dirs property) | string | /tmp/kafka-logs | high | |
|
| The directories in which the log data is kept. If not set, the value in log.dir is used | string | null | high | |
|
| The number of messages accumulated on a log partition before messages are flushed to disk | long | 9223372036854775807 | [1,…] | high |
|
| The maximum time in ms that a message in any topic is kept in memory before flushed to disk. If not set, the value in log.flush.scheduler.interval.ms is used | long | null | high | |
|
| The frequency with which we update the persistent record of the last flush which acts as the log recovery point | int | 60000 | [0,…] | high |
|
| The frequency in ms that the log flusher checks whether any log needs to be flushed to disk | long | 9223372036854775807 | high | |
|
| The frequency with which we update the persistent record of log start offset | int | 60000 | [0,…] | high |
|
| The maximum size of the log before deleting it | long | -1 | high | |
|
| The number of hours to keep a log file before deleting it (in hours), tertiary to log.retention.ms property | int | 168 | high | |
|
| The number of minutes to keep a log file before deleting it (in minutes), secondary to log.retention.ms property. If not set, the value in log.retention.hours is used | int | null | high | |
|
| The number of milliseconds to keep a log file before deleting it (in milliseconds), If not set, the value in log.retention.minutes is used | long | null | high | |
|
| The maximum time before a new log segment is rolled out (in hours), secondary to log.roll.ms property | int | 168 | [1,…] | high |
|
| The maximum jitter to subtract from logRollTimeMillis (in hours), secondary to log.roll.jitter.ms property | int | 0 | [0,…] | high |
|
| The maximum jitter to subtract from logRollTimeMillis (in milliseconds). If not set, the value in log.roll.jitter.hours is used | long | null | high | |
|
| The maximum time before a new log segment is rolled out (in milliseconds). If not set, the value in log.roll.hours is used | long | null | high | |
|
| The maximum size of a single log file | int | 1073741824 | [14,…] | high |
|
| The amount of time to wait before deleting a file from the filesystem | long | 60000 | [0,…] | high |
|
| The largest record batch size allowed by Kafka. If this is increased and there are consumers older than 0.10.2, the consumers' fetch size must also be increased so that the they can fetch record batches this large. In the latest message format version, records are always grouped into batches for efficiency. In previous message format versions, uncompressed records are not grouped into batches and this limit only applies to a single record in that case.
This can be set per topic with the topic level | int | 1000012 | [0,…] | high |
|
| When a producer sets acks to "all" (or "-1"), min.insync.replicas specifies the minimum number of replicas that must acknowledge a write for the write to be considered successful. If this minimum cannot be met, then the producer will raise an exception (either NotEnoughReplicas or NotEnoughReplicasAfterAppend). When used together, min.insync.replicas and acks allow you to enforce greater durability guarantees. A typical scenario would be to create a topic with a replication factor of 3, set min.insync.replicas to 2, and produce with acks of "all". This will ensure that the producer raises an exception if a majority of replicas do not receive a write. | int | 1 | [1,…] | high |
|
| The number of threads that the server uses for processing requests, which may include disk I/O | int | 8 | [1,…] | high |
|
| The number of threads that the server uses for receiving requests from the network and sending responses to the network | int | 3 | [1,…] | high |
|
| The number of threads per data directory to be used for log recovery at startup and flushing at shutdown | int | 1 | [1,…] | high |
|
| The number of threads that can move replicas between log directories, which may include disk I/O | int | null | high | |
|
| Number of fetcher threads used to replicate messages from a source broker. Increasing this value can increase the degree of I/O parallelism in the follower broker. | int | 1 | high | |
|
| The maximum size for a metadata entry associated with an offset commit | int | 4096 | high | |
|
| The required acks before the commit can be accepted. In general, the default (-1) should not be overridden | short | -1 | high | |
|
| Offset commit will be delayed until all replicas for the offsets topic receive the commit or this timeout is reached. This is similar to the producer request timeout. | int | 5000 | [1,…] | high |
|
| Batch size for reading from the offsets segments when loading offsets into the cache. | int | 5242880 | [1,…] | high |
|
| Frequency at which to check for stale offsets | long | 600000 | [1,…] | high |
|
| Offsets older than this retention period will be discarded | int | 10080 | [1,…] | high |
|
| Compression codec for the offsets topic - compression may be used to achieve "atomic" commits | int | 0 | high | |
|
| The number of partitions for the offset commit topic (should not change after deployment) | int | 50 | [1,…] | high |
|
| The replication factor for the offsets topic (set higher to ensure availability). Internal topic creation will fail until the cluster size meets this replication factor requirement. | short | 3 | [1,…] | high |
|
| The offsets topic segment bytes should be kept relatively small in order to facilitate faster log compaction and cache loads | int | 104857600 | [1,…] | high |
|
|
DEPRECATED: only used when | int | 9092 | high | |
|
| The number of queued requests allowed before blocking the network threads | int | 500 | [1,…] | high |
|
| DEPRECATED: Used only when dynamic default quotas are not configured for <user, <client-id> or <user, client-id> in Zookeeper. Any consumer distinguished by clientId/consumer group will get throttled if it fetches more bytes than this value per-second | long | 9223372036854775807 | [1,…] | high |
|
| DEPRECATED: Used only when dynamic default quotas are not configured for <user>, <client-id> or <user, client-id> in Zookeeper. Any producer distinguished by clientId will get throttled if it produces more bytes than this value per-second | long | 9223372036854775807 | [1,…] | high |
|
| Minimum bytes expected for each fetch response. If not enough bytes, wait up to replicaMaxWaitTimeMs | int | 1 | high | |
|
| max wait time for each fetcher request issued by follower replicas. This value should always be less than the replica.lag.time.max.ms at all times to prevent frequent shrinking of ISR for low throughput topics | int | 500 | high | |
|
| The frequency with which the high watermark is saved out to disk | long | 5000 | high | |
|
| If a follower hasn’t sent any fetch requests or hasn’t consumed up to the leaders log end offset for at least this time, the leader will remove the follower from isr | long | 10000 | high | |
|
| The socket receive buffer for network requests | int | 65536 | high | |
|
| The socket timeout for network requests. Its value should be at least replica.fetch.wait.max.ms | int | 30000 | high | |
|
| The configuration controls the maximum amount of time the client will wait for the response of a request. If the response is not received before the timeout elapses the client will resend the request if necessary or fail the request if retries are exhausted. | int | 30000 | high | |
|
| The SO_RCVBUF buffer of the socket sever sockets. If the value is -1, the OS default will be used. | int | 102400 | high | |
|
| The maximum number of bytes in a socket request | int | 104857600 | [1,…] | high |
|
| The SO_SNDBUF buffer of the socket sever sockets. If the value is -1, the OS default will be used. | int | 102400 | high | |
|
| The maximum allowed timeout for transactions. If a client’s requested transaction time exceed this, then the broker will return an error in InitProducerIdRequest. This prevents a client from too large of a timeout, which can stall consumers reading from topics included in the transaction. | int | 900000 | [1,…] | high |
|
| Batch size for reading from the transaction log segments when loading producer ids and transactions into the cache. | int | 5242880 | [1,…] | high |
|
| Overridden min.insync.replicas config for the transaction topic. | int | 2 | [1,…] | high |
|
| The number of partitions for the transaction topic (should not change after deployment). | int | 50 | [1,…] | high |
|
| The replication factor for the transaction topic (set higher to ensure availability). Internal topic creation will fail until the cluster size meets this replication factor requirement. | short | 3 | [1,…] | high |
|
| The transaction topic segment bytes should be kept relatively small in order to facilitate faster log compaction and cache loads | int | 104857600 | [1,…] | high |
|
| The maximum amount of time in ms that the transaction coordinator will wait before proactively expire a producer’s transactional id without receiving any transaction status updates from it. | int | 604800000 | [1,…] | high |
|
| Indicates whether to enable replicas not in the ISR set to be elected as leader as a last resort, even though doing so may result in data loss | boolean | false | high | |
|
| The max time that the client waits to establish a connection to zookeeper. If not set, the value in zookeeper.session.timeout.ms is used | int | null | high | |
|
| The maximum number of unacknowledged requests the client will send to Zookeeper before blocking. | int | 10 | [1,…] | high |
|
| Zookeeper session timeout | int | 6000 | high | |
|
| Set client to use secure ACLs | boolean | false | high | |
|
| Enable automatic broker id generation on the server. When enabled the value configured for reserved.broker.max.id should be reviewed. | boolean | true | medium | |
|
|
Rack of the broker. This will be used in rack aware replication assignment for fault tolerance. Examples: | string | null | medium | |
|
| Idle connections timeout: the server socket processor threads close the connections that idle more than this | long | 600000 | medium | |
|
| Enable controlled shutdown of the server | boolean | true | medium | |
|
| Controlled shutdown can fail for multiple reasons. This determines the number of retries when such failure happens | int | 3 | medium | |
|
| Before each retry, the system needs time to recover from the state that caused the previous failure (Controller fail over, replica lag etc). This config determines the amount of time to wait before retrying. | long | 5000 | medium | |
|
| The socket timeout for controller-to-broker channels | int | 30000 | medium | |
|
| default replication factors for automatically created topics | int | 1 | medium | |
|
| The token validity time in seconds before the token needs to be renewed. Default value 1 day. | long | 86400000 | [1,…] | medium |
|
| Master/secret key to generate and verify delegation tokens. Same key must be configured across all the brokers. If the key is not set or set to empty string, brokers will disable the delegation token support. | password | null | medium | |
|
| The token has a maximum lifetime beyond which it cannot be renewed anymore. Default value 7 days. | long | 604800000 | [1,…] | medium |
|
| The purge interval (in number of requests) of the delete records request purgatory | int | 1 | medium | |
|
| The purge interval (in number of requests) of the fetch request purgatory | int | 1000 | medium | |
|
| The amount of time the group coordinator will wait for more consumers to join a new group before performing the first rebalance. A longer delay means potentially fewer rebalances, but increases the time until processing begins. | int | 3000 | medium | |
|
| The maximum allowed session timeout for registered consumers. Longer timeouts give consumers more time to process messages in between heartbeats at the cost of a longer time to detect failures. | int | 300000 | medium | |
|
| The minimum allowed session timeout for registered consumers. Shorter timeouts result in quicker failure detection at the cost of more frequent consumer heartbeating, which can overwhelm broker resources. | int | 6000 | medium | |
|
| Name of listener used for communication between brokers. If this is unset, the listener name is defined by security.inter.broker.protocol. It is an error to set this and security.inter.broker.protocol properties at the same time. | string | null | medium | |
|
| Specify which version of the inter-broker protocol will be used. This is typically bumped after all brokers were upgraded to a new version. Example of some valid values are: 0.8.0, 0.8.1, 0.8.1.1, 0.8.2, 0.8.2.0, 0.8.2.1, 0.9.0.0, 0.9.0.1 Check ApiVersion for the full list. | string | 2.0-IV1 | medium | |
|
| The amount of time to sleep when there are no logs to clean | long | 15000 | [0,…] | medium |
|
| The total memory used for log deduplication across all cleaner threads | long | 134217728 | medium | |
|
| How long are delete records retained? | long | 86400000 | medium | |
|
| Enable the log cleaner process to run on the server. Should be enabled if using any topics with a cleanup.policy=compact including the internal offsets topic. If disabled those topics will not be compacted and continually grow in size. | boolean | true | medium | |
|
| Log cleaner dedupe buffer load factor. The percentage full the dedupe buffer can become. A higher value will allow more log to be cleaned at once but will lead to more hash collisions | double | 0.9 | medium | |
|
| The total memory used for log cleaner I/O buffers across all cleaner threads | int | 524288 | [0,…] | medium |
|
| The log cleaner will be throttled so that the sum of its read and write i/o will be less than this value on average | double | 1.7976931348623157E308 | medium | |
|
| The minimum ratio of dirty log to total log for a log to eligible for cleaning | double | 0.5 | medium | |
|
| The minimum time a message will remain uncompacted in the log. Only applicable for logs that are being compacted. | long | 0 | medium | |
|
| The number of background threads to use for log cleaning | int | 1 | [0,…] | medium |
|
| The default cleanup policy for segments beyond the retention window. A comma separated list of valid policies. Valid policies are: "delete" and "compact" | list | delete | [compact, delete] | medium |
|
| The interval with which we add an entry to the offset index | int | 4096 | [0,…] | medium |
|
| The maximum size in bytes of the offset index | int | 10485760 | [4,…] | medium |
|
| Specify the message format version the broker will use to append messages to the logs. The value should be a valid ApiVersion. Some examples are: 0.8.2, 0.9.0.0, 0.10.0, check ApiVersion for more details. By setting a particular message format version, the user is certifying that all the existing messages on disk are smaller or equal than the specified version. Setting this value incorrectly will cause consumers with older versions to break as they will receive messages with a format that they don’t understand. | string | 2.0-IV1 | medium | |
|
| The maximum difference allowed between the timestamp when a broker receives a message and the timestamp specified in the message. If log.message.timestamp.type=CreateTime, a message will be rejected if the difference in timestamp exceeds this threshold. This configuration is ignored if log.message.timestamp.type=LogAppendTime.The maximum timestamp difference allowed should be no greater than log.retention.ms to avoid unnecessarily frequent log rolling. | long | 9223372036854775807 | medium | |
|
|
Define whether the timestamp in the message is message create time or log append time. The value should be either | string | CreateTime | [CreateTime, LogAppendTime] | medium |
|
| Should pre allocate file when create new segment? If you are using Kafka on Windows, you probably need to set it to true. | boolean | false | medium | |
|
| The frequency in milliseconds that the log cleaner checks whether any log is eligible for deletion | long | 300000 | [1,…] | medium |
|
| The maximum number of connections we allow from each ip address. This can be set to 0 if there are overrides configured using max.connections.per.ip.overrides property | int | 2147483647 | [0,…] | medium |
|
| A comma-separated list of per-ip or hostname overrides to the default maximum number of connections. An example value is "hostName:100,127.0.0.1:200" | string | "" | medium | |
|
| The maximum number of incremental fetch sessions that we will maintain. | int | 1000 | [0,…] | medium |
|
| The default number of log partitions per topic | int | 1 | [1,…] | medium |
|
| The old secret that was used for encoding dynamically configured passwords. This is required only when the secret is updated. If specified, all dynamically encoded passwords are decoded using this old secret and re-encoded using password.encoder.secret when broker starts up. | password | null | medium | |
|
| The secret used for encoding dynamically configured passwords for this broker. | password | null | medium | |
|
|
The fully qualified name of a class that implements the KafkaPrincipalBuilder interface, which is used to build the KafkaPrincipal object used during authorization. This config also supports the deprecated PrincipalBuilder interface which was previously used for client authentication over SSL. If no principal builder is defined, the default behavior depends on the security protocol in use. For SSL authentication, the principal name will be the distinguished name from the client certificate if one is provided; otherwise, if client authentication is not required, the principal name will be ANONYMOUS. For SASL authentication, the principal will be derived using the rules defined by | class | null | medium | |
|
| The purge interval (in number of requests) of the producer request purgatory | int | 1000 | medium | |
|
| The number of queued bytes allowed before no more requests are read | long | -1 | medium | |
|
| The amount of time to sleep when fetch partition error occurs. | int | 1000 | [0,…] | medium |
|
|
The number of bytes of messages to attempt to fetch for each partition. This is not an absolute maximum, if the first record batch in the first non-empty partition of the fetch is larger than this value, the record batch will still be returned to ensure that progress can be made. The maximum record batch size accepted by the broker is defined via | int | 1048576 | [0,…] | medium |
|
|
Maximum bytes expected for the entire fetch response. Records are fetched in batches, and if the first record batch in the first non-empty partition of the fetch is larger than this value, the record batch will still be returned to ensure that progress can be made. As such, this is not an absolute maximum. The maximum record batch size accepted by the broker is defined via | int | 10485760 | [0,…] | medium |
|
| Max number that can be used for a broker.id | int | 1000 | [0,…] | medium |
|
| The fully qualified name of a SASL client callback handler class that implements the AuthenticateCallbackHandler interface. | class | null | medium | |
|
| The list of SASL mechanisms enabled in the Kafka server. The list may contain any mechanism for which a security provider is available. Only GSSAPI is enabled by default. | list | GSSAPI | medium | |
|
| JAAS login context parameters for SASL connections in the format used by JAAS configuration files. JAAS configuration file format is described here. The format for the value is: ‘loginModuleClass controlFlag (optionName=optionValue)*;’. For brokers, the config must be prefixed with listener prefix and SASL mechanism name in lower-case. For example, listener.name.sasl_ssl.scram-sha-256.sasl.jaas.config=com.example.ScramLoginModule required; | password | null | medium | |
|
| Kerberos kinit command path. | string | /usr/bin/kinit | medium | |
|
| Login thread sleep time between refresh attempts. | long | 60000 | medium | |
|
|
A list of rules for mapping from principal names to short names (typically operating system usernames). The rules are evaluated in order and the first rule that matches a principal name is used to map it to a short name. Any later rules in the list are ignored. By default, principal names of the form {username}/{hostname}@{REALM} are mapped to {username}. For more details on the format please see #security_authz[ security authorization and acls]. Note that this configuration is ignored if an extension of KafkaPrincipalBuilder is provided by the | list | DEFAULT | medium | |
|
| The Kerberos principal name that Kafka runs as. This can be defined either in Kafka’s JAAS config or in Kafka’s config. | string | null | medium | |
|
| Percentage of random jitter added to the renewal time. | double | 0.05 | medium | |
|
| Login thread will sleep until the specified window factor of time from last refresh to ticket’s expiry has been reached, at which time it will try to renew the ticket. | double | 0.8 | medium | |
|
| The fully qualified name of a SASL login callback handler class that implements the AuthenticateCallbackHandler interface. For brokers, login callback handler config must be prefixed with listener prefix and SASL mechanism name in lower-case. For example, listener.name.sasl_ssl.scram-sha-256.sasl.login.callback.handler.class=com.example.CustomScramLoginCallbackHandler | class | null | medium | |
|
| The fully qualified name of a class that implements the Login interface. For brokers, login config must be prefixed with listener prefix and SASL mechanism name in lower-case. For example, listener.name.sasl_ssl.scram-sha-256.sasl.login.class=com.example.CustomScramLogin | class | null | medium | |
|
| The amount of buffer time before credential expiration to maintain when refreshing a credential, in seconds. If a refresh would otherwise occur closer to expiration than the number of buffer seconds then the refresh will be moved up to maintain as much of the buffer time as possible. Legal values are between 0 and 3600 (1 hour); a default value of 300 (5 minutes) is used if no value is specified. This value and sasl.login.refresh.min.period.seconds are both ignored if their sum exceeds the remaining lifetime of a credential. Currently applies only to OAUTHBEARER. | short | 300 | medium | |
|
| The desired minimum time for the login refresh thread to wait before refreshing a credential, in seconds. Legal values are between 0 and 900 (15 minutes); a default value of 60 (1 minute) is used if no value is specified. This value and sasl.login.refresh.buffer.seconds are both ignored if their sum exceeds the remaining lifetime of a credential. Currently applies only to OAUTHBEARER. | short | 60 | medium | |
|
| Login refresh thread will sleep until the specified window factor relative to the credential’s lifetime has been reached, at which time it will try to refresh the credential. Legal values are between 0.5 (50%) and 1.0 (100%) inclusive; a default value of 0.8 (80%) is used if no value is specified. Currently applies only to OAUTHBEARER. | double | 0.8 | medium | |
|
| The maximum amount of random jitter relative to the credential’s lifetime that is added to the login refresh thread’s sleep time. Legal values are between 0 and 0.25 (25%) inclusive; a default value of 0.05 (5%) is used if no value is specified. Currently applies only to OAUTHBEARER. | double | 0.05 | medium | |
|
| SASL mechanism used for inter-broker communication. Default is GSSAPI. | string | GSSAPI | medium | |
|
| The fully qualified name of a SASL server callback handler class that implements the AuthenticateCallbackHandler interface. Server callback handlers must be prefixed with listener prefix and SASL mechanism name in lower-case. For example, listener.name.sasl_ssl.plain.sasl.server.callback.handler.class=com.example.CustomPlainCallbackHandler. | class | null | medium | |
|
| Security protocol used to communicate between brokers. Valid values are: PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL. It is an error to set this and inter.broker.listener.name properties at the same time. | string | PLAINTEXT | medium | |
|
| A list of cipher suites. This is a named combination of authentication, encryption, MAC and key exchange algorithm used to negotiate the security settings for a network connection using TLS or SSL network protocol. By default all the available cipher suites are supported. | list | "" | medium | |
|
| Configures kafka broker to request client authentication. The following settings are common:
| string | none | [required, requested, none] | medium |
|
| The list of protocols enabled for SSL connections. | list | TLSv1.2,TLSv1.1,TLSv1 | medium | |
|
| The password of the private key in the key store file. This is optional for client. | password | null | medium | |
|
| The algorithm used by key manager factory for SSL connections. Default value is the key manager factory algorithm configured for the Java Virtual Machine. | string | SunX509 | medium | |
|
| The location of the key store file. This is optional for client and can be used for two-way authentication for client. | string | null | medium | |
|
| The store password for the key store file. This is optional for client and only needed if ssl.keystore.location is configured. | password | null | medium | |
|
| The file format of the key store file. This is optional for client. | string | JKS | medium | |
|
| The SSL protocol used to generate the SSLContext. Default setting is TLS, which is fine for most cases. Allowed values in recent JVMs are TLS, TLSv1.1 and TLSv1.2. SSL, SSLv2 and SSLv3 may be supported in older JVMs, but their usage is discouraged due to known security vulnerabilities. | string | TLS | medium | |
|
| The name of the security provider used for SSL connections. Default value is the default security provider of the JVM. | string | null | medium | |
|
| The algorithm used by trust manager factory for SSL connections. Default value is the trust manager factory algorithm configured for the Java Virtual Machine. | string | PKIX | medium | |
|
| The location of the trust store file. | string | null | medium | |
|
| The password for the trust store file. If a password is not set access to the truststore is still available, but integrity checking is disabled. | password | null | medium | |
|
| The file format of the trust store file. | string | JKS | medium | |
|
|
The alter configs policy class that should be used for validation. The class should implement the | class | null | low | |
|
| The number of samples to retain in memory for alter log dirs replication quotas | int | 11 | [1,…] | low |
|
| The time span of each sample for alter log dirs replication quotas | int | 1 | [1,…] | low |
|
| The authorizer class that should be used for authorization | string | "" | low | |
|
| The fully qualified name of a class that implements the ClientQuotaCallback interface, which is used to determine quota limits applied to client requests. By default, <user, client-id>, <user> or <client-id> quotas stored in ZooKeeper are applied. For any given request, the most specific quota that matches the user principal of the session and the client-id of the request is applied. | class | null | low | |
|
|
The create topic policy class that should be used for validation. The class should implement the | class | null | low | |
|
| Scan interval to remove expired delegation tokens. | long | 3600000 | [1,…] | low |
|
|
Map between listener names and security protocols. This must be defined for the same security protocol to be usable in more than one port or IP. For example, internal and external traffic can be separated even if SSL is required for both. Concretely, the user could define listeners with names INTERNAL and EXTERNAL and this property as: | string | PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL | low | |
|
|
This configuration controls whether down-conversion of message formats is enabled to satisfy consume requests. When set to | boolean | true | low | |
|
|
A list of classes to use as metrics reporters. Implementing the | list | "" | low | |
|
| The number of samples maintained to compute metrics. | int | 2 | [1,…] | low |
|
| The highest recording level for metrics. | string | INFO | low | |
|
| The window of time a metrics sample is computed over. | long | 30000 | [1,…] | low |
|
| The Cipher algorithm used for encoding dynamically configured passwords. | string | AES/CBC/PKCS5Padding | low | |
|
| The iteration count used for encoding dynamically configured passwords. | int | 4096 | [1024,…] | low |
|
| The key length used for encoding dynamically configured passwords. | int | 128 | [8,…] | low |
|
| The SecretKeyFactory algorithm used for encoding dynamically configured passwords. Default is PBKDF2WithHmacSHA512 if available and PBKDF2WithHmacSHA1 otherwise. | string | null | low | |
|
| The number of samples to retain in memory for client quotas | int | 11 | [1,…] | low |
|
| The time span of each sample for client quotas | int | 1 | [1,…] | low |
|
| The number of samples to retain in memory for replication quotas | int | 11 | [1,…] | low |
|
| The time span of each sample for replication quotas | int | 1 | [1,…] | low |
|
| The endpoint identification algorithm to validate server hostname using server certificate. | string | https | low | |
|
| The SecureRandom PRNG implementation to use for SSL cryptography operations. | string | null | low | |
|
| The interval at which to rollback transactions that have timed out | int | 60000 | [1,…] | low |
|
|
The interval at which to remove transactions that have expired due to | int | 3600000 | [1,…] | low |
|
| How far a ZK follower can be behind a ZK leader | int | 2000 | low |
Appendix B. Topic configuration parameters
| Name | Description | Type | Default | Valid Values | Server Default Property | Importance |
|---|---|---|---|---|---|---|
|
| A string that is either "delete" or "compact". This string designates the retention policy to use on old log segments. The default policy ("delete") will discard old segments when their retention time or size limit has been reached. The "compact" setting will enable #compaction[log compaction] on the topic. | list | delete | [compact, delete] | log.cleanup.policy | medium |
|
| Specify the final compression type for a given topic. This configuration accepts the standard compression codecs ('gzip', 'snappy', lz4). It additionally accepts 'uncompressed' which is equivalent to no compression; and 'producer' which means retain the original compression codec set by the producer. | string | producer | [uncompressed, snappy, lz4, gzip, producer] | compression.type | medium |
|
| The amount of time to retain delete tombstone markers for #compaction[log compacted] topics. This setting also gives a bound on the time in which a consumer must complete a read if they begin from offset 0 to ensure that they get a valid snapshot of the final stage (otherwise delete tombstones may be collected before they complete their scan). | long | 86400000 | [0,…] | log.cleaner.delete.retention.ms | medium |
|
| The time to wait before deleting a file from the filesystem | long | 60000 | [0,…] | log.segment.delete.delay.ms | medium |
|
| This setting allows specifying an interval at which we will force an fsync of data written to the log. For example if this was set to 1 we would fsync after every message; if it were 5 we would fsync after every five messages. In general we recommend you not set this and use replication for durability and allow the operating system’s background flush capabilities as it is more efficient. This setting can be overridden on a per-topic basis (see #topicconfigs[the per-topic configuration section]). | long | 9223372036854775807 | [0,…] | log.flush.interval.messages | medium |
|
| This setting allows specifying a time interval at which we will force an fsync of data written to the log. For example if this was set to 1000 we would fsync after 1000 ms had passed. In general we recommend you not set this and use replication for durability and allow the operating system’s background flush capabilities as it is more efficient. | long | 9223372036854775807 | [0,…] | log.flush.interval.ms | medium |
|
| A list of replicas for which log replication should be throttled on the follower side. The list should describe a set of replicas in the form [PartitionId]:[BrokerId],[PartitionId]:[BrokerId]:… or alternatively the wildcard '*' can be used to throttle all replicas for this topic. | list | "" | [partitionId],[brokerId]:[partitionId],[brokerId]:… | follower.replication.throttled.replicas | medium |
|
| This setting controls how frequently Kafka adds an index entry to it’s offset index. The default setting ensures that we index a message roughly every 4096 bytes. More indexing allows reads to jump closer to the exact position in the log but makes the index larger. You probably don’t need to change this. | int | 4096 | [0,…] | log.index.interval.bytes | medium |
|
| A list of replicas for which log replication should be throttled on the leader side. The list should describe a set of replicas in the form [PartitionId]:[BrokerId],[PartitionId]:[BrokerId]:… or alternatively the wildcard '*' can be used to throttle all replicas for this topic. | list | "" | [partitionId],[brokerId]:[partitionId],[brokerId]:… | leader.replication.throttled.replicas | medium |
|
| The largest record batch size allowed by Kafka. If this is increased and there are consumers older than 0.10.2, the consumers' fetch size must also be increased so that the they can fetch record batches this large. In the latest message format version, records are always grouped into batches for efficiency. In previous message format versions, uncompressed records are not grouped into batches and this limit only applies to a single record in that case. | int | 1000012 | [0,…] | message.max.bytes | medium |
|
| Specify the message format version the broker will use to append messages to the logs. The value should be a valid ApiVersion. Some examples are: 0.8.2, 0.9.0.0, 0.10.0, check ApiVersion for more details. By setting a particular message format version, the user is certifying that all the existing messages on disk are smaller or equal than the specified version. Setting this value incorrectly will cause consumers with older versions to break as they will receive messages with a format that they don’t understand. | string | 2.0-IV1 | log.message.format.version | medium | |
|
| The maximum difference allowed between the timestamp when a broker receives a message and the timestamp specified in the message. If message.timestamp.type=CreateTime, a message will be rejected if the difference in timestamp exceeds this threshold. This configuration is ignored if message.timestamp.type=LogAppendTime. | long | 9223372036854775807 | [0,…] | log.message.timestamp.difference.max.ms | medium |
|
|
Define whether the timestamp in the message is message create time or log append time. The value should be either | string | CreateTime | [CreateTime, LogAppendTime] | log.message.timestamp.type | medium |
|
| This configuration controls how frequently the log compactor will attempt to clean the log (assuming #compaction[log compaction] is enabled). By default we will avoid cleaning a log where more than 50% of the log has been compacted. This ratio bounds the maximum space wasted in the log by duplicates (at 50% at most 50% of the log could be duplicates). A higher ratio will mean fewer, more efficient cleanings but will mean more wasted space in the log. | double | 0.5 | [0,…,1] | log.cleaner.min.cleanable.ratio | medium |
|
| The minimum time a message will remain uncompacted in the log. Only applicable for logs that are being compacted. | long | 0 | [0,…] | log.cleaner.min.compaction.lag.ms | medium |
|
| When a producer sets acks to "all" (or "-1"), this configuration specifies the minimum number of replicas that must acknowledge a write for the write to be considered successful. If this minimum cannot be met, then the producer will raise an exception (either NotEnoughReplicas or NotEnoughReplicasAfterAppend). When used together, min.insync.replicas and acks allow you to enforce greater durability guarantees. A typical scenario would be to create a topic with a replication factor of 3, set min.insync.replicas to 2, and produce with acks of "all". This will ensure that the producer raises an exception if a majority of replicas do not receive a write. | int | 1 | [1,…] | min.insync.replicas | medium |
|
| True if we should preallocate the file on disk when creating a new log segment. | boolean | false | log.preallocate | medium | |
|
| This configuration controls the maximum size a partition (which consists of log segments) can grow to before we will discard old log segments to free up space if we are using the "delete" retention policy. By default there is no size limit only a time limit. Since this limit is enforced at the partition level, multiply it by the number of partitions to compute the topic retention in bytes. | long | -1 | log.retention.bytes | medium | |
|
| This configuration controls the maximum time we will retain a log before we will discard old log segments to free up space if we are using the "delete" retention policy. This represents an SLA on how soon consumers must read their data. If set to -1, no time limit is applied. | long | 604800000 | log.retention.ms | medium | |
|
| This configuration controls the segment file size for the log. Retention and cleaning is always done a file at a time so a larger segment size means fewer files but less granular control over retention. | int | 1073741824 | [14,…] | log.segment.bytes | medium |
|
| This configuration controls the size of the index that maps offsets to file positions. We preallocate this index file and shrink it only after log rolls. You generally should not need to change this setting. | int | 10485760 | [0,…] | log.index.size.max.bytes | medium |
|
| The maximum random jitter subtracted from the scheduled segment roll time to avoid thundering herds of segment rolling | long | 0 | [0,…] | log.roll.jitter.ms | medium |
|
| This configuration controls the period of time after which Kafka will force the log to roll even if the segment file isn’t full to ensure that retention can delete or compact old data. | long | 604800000 | [1,…] | log.roll.ms | medium |
|
| Indicates whether to enable replicas not in the ISR set to be elected as leader as a last resort, even though doing so may result in data loss. | boolean | false | unclean.leader.election.enable | medium | |
|
|
This configuration controls whether down-conversion of message formats is enabled to satisfy consume requests. When set to | boolean | true | log.message.downconversion.enable | low |
Appendix C. Consumer configuration parameters
| Name | Description | Type | Default | Valid Values | Importance |
|---|---|---|---|---|---|
|
|
Deserializer class for key that implements the | class | high | ||
|
|
Deserializer class for value that implements the | class | high | ||
|
|
A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form | list | "" | non-null value | high |
|
| The minimum amount of data the server should return for a fetch request. If insufficient data is available the request will wait for that much data to accumulate before answering the request. The default setting of 1 byte means that fetch requests are answered as soon as a single byte of data is available or the fetch request times out waiting for data to arrive. Setting this to something greater than 1 will cause the server to wait for larger amounts of data to accumulate which can improve server throughput a bit at the cost of some additional latency. | int | 1 | [0,…] | high |
|
|
A unique string that identifies the consumer group this consumer belongs to. This property is required if the consumer uses either the group management functionality by using | string | "" | high | |
|
|
The expected time between heartbeats to the consumer coordinator when using Kafka’s group management facilities. Heartbeats are used to ensure that the consumer’s session stays active and to facilitate rebalancing when new consumers join or leave the group. The value must be set lower than | int | 3000 | high | |
|
|
The maximum amount of data per-partition the server will return. Records are fetched in batches by the consumer. If the first record batch in the first non-empty partition of the fetch is larger than this limit, the batch will still be returned to ensure that the consumer can make progress. The maximum record batch size accepted by the broker is defined via | int | 1048576 | [0,…] | high |
|
|
The timeout used to detect consumer failures when using Kafka’s group management facility. The consumer sends periodic heartbeats to indicate its liveness to the broker. If no heartbeats are received by the broker before the expiration of this session timeout, then the broker will remove this consumer from the group and initiate a rebalance. Note that the value must be in the allowable range as configured in the broker configuration by | int | 10000 | high | |
|
| The password of the private key in the key store file. This is optional for client. | password | null | high | |
|
| The location of the key store file. This is optional for client and can be used for two-way authentication for client. | string | null | high | |
|
| The store password for the key store file. This is optional for client and only needed if ssl.keystore.location is configured. | password | null | high | |
|
| The location of the trust store file. | string | null | high | |
|
| The password for the trust store file. If a password is not set access to the truststore is still available, but integrity checking is disabled. | password | null | high | |
|
| What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted):
| string | latest | [latest, earliest, none] | medium |
|
| Close idle connections after the number of milliseconds specified by this config. | long | 540000 | medium | |
|
|
Specifies the timeout (in milliseconds) for consumer APIs that could block. This configuration is used as the default timeout for all consumer operations that do not explicitly accept a | int | 60000 | [0,…] | medium |
|
| If true the consumer’s offset will be periodically committed in the background. | boolean | true | medium | |
|
|
Whether records from internal topics (such as offsets) should be exposed to the consumer. If set to | boolean | true | medium | |
|
|
The maximum amount of data the server should return for a fetch request. Records are fetched in batches by the consumer, and if the first record batch in the first non-empty partition of the fetch is larger than this value, the record batch will still be returned to ensure that the consumer can make progress. As such, this is not a absolute maximum. The maximum record batch size accepted by the broker is defined via | int | 52428800 | [0,…] | medium |
|
|
Controls how to read messages written transactionally. If set to
Messages will always be returned in offset order. Hence, in Further, when in `read_committed</mode> the seekToEnd method will return the LSO | string | read_uncommitted | [read_committed, read_uncommitted] | medium |
|
| The maximum delay between invocations of poll() when using consumer group management. This places an upper bound on the amount of time that the consumer can be idle before fetching more records. If poll() is not called before expiration of this timeout, then the consumer is considered failed and the group will rebalance in order to reassign the partitions to another member. | int | 300000 | [1,…] | medium |
|
| The maximum number of records returned in a single call to poll(). | int | 500 | [1,…] | medium |
|
| The class name of the partition assignment strategy that the client will use to distribute partition ownership amongst consumer instances when group management is used | list | class org.apache.kafka.clients.consumer.RangeAssignor | non-null value | medium |
|
| The size of the TCP receive buffer (SO_RCVBUF) to use when reading data. If the value is -1, the OS default will be used. | int | 65536 | [-1,…] | medium |
|
| The configuration controls the maximum amount of time the client will wait for the response of a request. If the response is not received before the timeout elapses the client will resend the request if necessary or fail the request if retries are exhausted. | int | 30000 | [0,…] | medium |
|
| The fully qualified name of a SASL client callback handler class that implements the AuthenticateCallbackHandler interface. | class | null | medium | |
|
| JAAS login context parameters for SASL connections in the format used by JAAS configuration files. JAAS configuration file format is described here. The format for the value is: ‘loginModuleClass controlFlag (optionName=optionValue)*;’. For brokers, the config must be prefixed with listener prefix and SASL mechanism name in lower-case. For example, listener.name.sasl_ssl.scram-sha-256.sasl.jaas.config=com.example.ScramLoginModule required; | password | null | medium | |
|
| The Kerberos principal name that Kafka runs as. This can be defined either in Kafka’s JAAS config or in Kafka’s config. | string | null | medium | |
|
| The fully qualified name of a SASL login callback handler class that implements the AuthenticateCallbackHandler interface. For brokers, login callback handler config must be prefixed with listener prefix and SASL mechanism name in lower-case. For example, listener.name.sasl_ssl.scram-sha-256.sasl.login.callback.handler.class=com.example.CustomScramLoginCallbackHandler | class | null | medium | |
|
| The fully qualified name of a class that implements the Login interface. For brokers, login config must be prefixed with listener prefix and SASL mechanism name in lower-case. For example, listener.name.sasl_ssl.scram-sha-256.sasl.login.class=com.example.CustomScramLogin | class | null | medium | |
|
| SASL mechanism used for client connections. This may be any mechanism for which a security provider is available. GSSAPI is the default mechanism. | string | GSSAPI | medium | |
|
| Protocol used to communicate with brokers. Valid values are: PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL. | string | PLAINTEXT | medium | |
|
| The size of the TCP send buffer (SO_SNDBUF) to use when sending data. If the value is -1, the OS default will be used. | int | 131072 | [-1,…] | medium |
|
| The list of protocols enabled for SSL connections. | list | TLSv1.2,TLSv1.1,TLSv1 | medium | |
|
| The file format of the key store file. This is optional for client. | string | JKS | medium | |
|
| The SSL protocol used to generate the SSLContext. Default setting is TLS, which is fine for most cases. Allowed values in recent JVMs are TLS, TLSv1.1 and TLSv1.2. SSL, SSLv2 and SSLv3 may be supported in older JVMs, but their usage is discouraged due to known security vulnerabilities. | string | TLS | medium | |
|
| The name of the security provider used for SSL connections. Default value is the default security provider of the JVM. | string | null | medium | |
|
| The file format of the trust store file. | string | JKS | medium | |
|
|
The frequency in milliseconds that the consumer offsets are auto-committed to Kafka if | int | 5000 | [0,…] | low |
|
| Automatically check the CRC32 of the records consumed. This ensures no on-the-wire or on-disk corruption to the messages occurred. This check adds some overhead, so it may be disabled in cases seeking extreme performance. | boolean | true | low | |
|
| An id string to pass to the server when making requests. The purpose of this is to be able to track the source of requests beyond just ip/port by allowing a logical application name to be included in server-side request logging. | string | "" | low | |
|
| The maximum amount of time the server will block before answering the fetch request if there isn’t sufficient data to immediately satisfy the requirement given by fetch.min.bytes. | int | 500 | [0,…] | low |
|
|
A list of classes to use as interceptors. Implementing the | list | "" | non-null value | low |
|
| The period of time in milliseconds after which we force a refresh of metadata even if we haven’t seen any partition leadership changes to proactively discover any new brokers or partitions. | long | 300000 | [0,…] | low |
|
|
A list of classes to use as metrics reporters. Implementing the | list | "" | non-null value | low |
|
| The number of samples maintained to compute metrics. | int | 2 | [1,…] | low |
|
| The highest recording level for metrics. | string | INFO | [INFO, DEBUG] | low |
|
| The window of time a metrics sample is computed over. | long | 30000 | [0,…] | low |
|
| The maximum amount of time in milliseconds to wait when reconnecting to a broker that has repeatedly failed to connect. If provided, the backoff per host will increase exponentially for each consecutive connection failure, up to this maximum. After calculating the backoff increase, 20% random jitter is added to avoid connection storms. | long | 1000 | [0,…] | low |
|
| The base amount of time to wait before attempting to reconnect to a given host. This avoids repeatedly connecting to a host in a tight loop. This backoff applies to all connection attempts by the client to a broker. | long | 50 | [0,…] | low |
|
| The amount of time to wait before attempting to retry a failed request to a given topic partition. This avoids repeatedly sending requests in a tight loop under some failure scenarios. | long | 100 | [0,…] | low |
|
| Kerberos kinit command path. | string | /usr/bin/kinit | low | |
|
| Login thread sleep time between refresh attempts. | long | 60000 | low | |
|
| Percentage of random jitter added to the renewal time. | double | 0.05 | low | |
|
| Login thread will sleep until the specified window factor of time from last refresh to ticket’s expiry has been reached, at which time it will try to renew the ticket. | double | 0.8 | low | |
|
| The amount of buffer time before credential expiration to maintain when refreshing a credential, in seconds. If a refresh would otherwise occur closer to expiration than the number of buffer seconds then the refresh will be moved up to maintain as much of the buffer time as possible. Legal values are between 0 and 3600 (1 hour); a default value of 300 (5 minutes) is used if no value is specified. This value and sasl.login.refresh.min.period.seconds are both ignored if their sum exceeds the remaining lifetime of a credential. Currently applies only to OAUTHBEARER. | short | 300 | [0,…,3600] | low |
|
| The desired minimum time for the login refresh thread to wait before refreshing a credential, in seconds. Legal values are between 0 and 900 (15 minutes); a default value of 60 (1 minute) is used if no value is specified. This value and sasl.login.refresh.buffer.seconds are both ignored if their sum exceeds the remaining lifetime of a credential. Currently applies only to OAUTHBEARER. | short | 60 | [0,…,900] | low |
|
| Login refresh thread will sleep until the specified window factor relative to the credential’s lifetime has been reached, at which time it will try to refresh the credential. Legal values are between 0.5 (50%) and 1.0 (100%) inclusive; a default value of 0.8 (80%) is used if no value is specified. Currently applies only to OAUTHBEARER. | double | 0.8 | [0.5,…,1.0] | low |
|
| The maximum amount of random jitter relative to the credential’s lifetime that is added to the login refresh thread’s sleep time. Legal values are between 0 and 0.25 (25%) inclusive; a default value of 0.05 (5%) is used if no value is specified. Currently applies only to OAUTHBEARER. | double | 0.05 | [0.0,…,0.25] | low |
|
| A list of cipher suites. This is a named combination of authentication, encryption, MAC and key exchange algorithm used to negotiate the security settings for a network connection using TLS or SSL network protocol. By default all the available cipher suites are supported. | list | null | low | |
|
| The endpoint identification algorithm to validate server hostname using server certificate. | string | https | low | |
|
| The algorithm used by key manager factory for SSL connections. Default value is the key manager factory algorithm configured for the Java Virtual Machine. | string | SunX509 | low | |
|
| The SecureRandom PRNG implementation to use for SSL cryptography operations. | string | null | low | |
|
| The algorithm used by trust manager factory for SSL connections. Default value is the trust manager factory algorithm configured for the Java Virtual Machine. | string | PKIX | low |
Appendix D. Producer configuration parameters
| Name | Description | Type | Default | Valid Values | Importance |
|---|---|---|---|---|---|
|
|
Serializer class for key that implements the | class | high | ||
|
|
Serializer class for value that implements the | class | high | ||
|
| The number of acknowledgments the producer requires the leader to have received before considering a request complete. This controls the durability of records that are sent. The following settings are allowed:
| string | 1 | [all, -1, 0, 1] | high |
|
|
A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form | list | "" | non-null value | high |
|
|
The total bytes of memory the producer can use to buffer records waiting to be sent to the server. If records are sent faster than they can be delivered to the server the producer will block for | long | 33554432 | [0,…] | high |
|
|
The compression type for all data generated by the producer. The default is none (i.e. no compression). Valid values are | string | none | high | |
|
|
Setting a value greater than zero will cause the client to resend any record whose send fails with a potentially transient error. Note that this retry is no different than if the client resent the record upon receiving the error. Allowing retries without setting | int | 0 | [0,…,2147483647] | high |
|
| The password of the private key in the key store file. This is optional for client. | password | null | high | |
|
| The location of the key store file. This is optional for client and can be used for two-way authentication for client. | string | null | high | |
|
| The store password for the key store file. This is optional for client and only needed if ssl.keystore.location is configured. | password | null | high | |
|
| The location of the trust store file. | string | null | high | |
|
| The password for the trust store file. If a password is not set access to the truststore is still available, but integrity checking is disabled. | password | null | high | |
|
| The producer will attempt to batch records together into fewer requests whenever multiple records are being sent to the same partition. This helps performance on both the client and the server. This configuration controls the default batch size in bytes. No attempt will be made to batch records larger than this size. Requests sent to brokers will contain multiple batches, one for each partition with data available to be sent. A small batch size will make batching less common and may reduce throughput (a batch size of zero will disable batching entirely). A very large batch size may use memory a bit more wastefully as we will always allocate a buffer of the specified batch size in anticipation of additional records. | int | 16384 | [0,…] | medium |
|
| An id string to pass to the server when making requests. The purpose of this is to be able to track the source of requests beyond just ip/port by allowing a logical application name to be included in server-side request logging. | string | "" | medium | |
|
| Close idle connections after the number of milliseconds specified by this config. | long | 540000 | medium | |
|
|
The producer groups together any records that arrive in between request transmissions into a single batched request. Normally this occurs only under load when records arrive faster than they can be sent out. However in some circumstances the client may want to reduce the number of requests even under moderate load. This setting accomplishes this by adding a small amount of artificial delay—that is, rather than immediately sending out a record the producer will wait for up to the given delay to allow other records to be sent so that the sends can be batched together. This can be thought of as analogous to Nagle’s algorithm in TCP. This setting gives the upper bound on the delay for batching: once we get | long | 0 | [0,…] | medium |
|
|
The configuration controls how long | long | 60000 | [0,…] | medium |
|
| The maximum size of a request in bytes. This setting will limit the number of record batches the producer will send in a single request to avoid sending huge requests. This is also effectively a cap on the maximum record batch size. Note that the server has its own cap on record batch size which may be different from this. | int | 1048576 | [0,…] | medium |
|
|
Partitioner class that implements the | class | org.apache.kafka.clients.producer.internals.DefaultPartitioner | medium | |
|
| The size of the TCP receive buffer (SO_RCVBUF) to use when reading data. If the value is -1, the OS default will be used. | int | 32768 | [-1,…] | medium |
|
| The configuration controls the maximum amount of time the client will wait for the response of a request. If the response is not received before the timeout elapses the client will resend the request if necessary or fail the request if retries are exhausted. This should be larger than replica.lag.time.max.ms (a broker configuration) to reduce the possibility of message duplication due to unnecessary producer retries. | int | 30000 | [0,…] | medium |
|
| The fully qualified name of a SASL client callback handler class that implements the AuthenticateCallbackHandler interface. | class | null | medium | |
|
| JAAS login context parameters for SASL connections in the format used by JAAS configuration files. JAAS configuration file format is described here. The format for the value is: ‘loginModuleClass controlFlag (optionName=optionValue)*;’. For brokers, the config must be prefixed with listener prefix and SASL mechanism name in lower-case. For example, listener.name.sasl_ssl.scram-sha-256.sasl.jaas.config=com.example.ScramLoginModule required; | password | null | medium | |
|
| The Kerberos principal name that Kafka runs as. This can be defined either in Kafka’s JAAS config or in Kafka’s config. | string | null | medium | |
|
| The fully qualified name of a SASL login callback handler class that implements the AuthenticateCallbackHandler interface. For brokers, login callback handler config must be prefixed with listener prefix and SASL mechanism name in lower-case. For example, listener.name.sasl_ssl.scram-sha-256.sasl.login.callback.handler.class=com.example.CustomScramLoginCallbackHandler | class | null | medium | |
|
| The fully qualified name of a class that implements the Login interface. For brokers, login config must be prefixed with listener prefix and SASL mechanism name in lower-case. For example, listener.name.sasl_ssl.scram-sha-256.sasl.login.class=com.example.CustomScramLogin | class | null | medium | |
|
| SASL mechanism used for client connections. This may be any mechanism for which a security provider is available. GSSAPI is the default mechanism. | string | GSSAPI | medium | |
|
| Protocol used to communicate with brokers. Valid values are: PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL. | string | PLAINTEXT | medium | |
|
| The size of the TCP send buffer (SO_SNDBUF) to use when sending data. If the value is -1, the OS default will be used. | int | 131072 | [-1,…] | medium |
|
| The list of protocols enabled for SSL connections. | list | TLSv1.2,TLSv1.1,TLSv1 | medium | |
|
| The file format of the key store file. This is optional for client. | string | JKS | medium | |
|
| The SSL protocol used to generate the SSLContext. Default setting is TLS, which is fine for most cases. Allowed values in recent JVMs are TLS, TLSv1.1 and TLSv1.2. SSL, SSLv2 and SSLv3 may be supported in older JVMs, but their usage is discouraged due to known security vulnerabilities. | string | TLS | medium | |
|
| The name of the security provider used for SSL connections. Default value is the default security provider of the JVM. | string | null | medium | |
|
| The file format of the trust store file. | string | JKS | medium | |
|
|
When set to 'true', the producer will ensure that exactly one copy of each message is written in the stream. If 'false', producer retries due to broker failures, etc., may write duplicates of the retried message in the stream. Note that enabling idempotence requires | boolean | false | low | |
|
|
A list of classes to use as interceptors. Implementing the | list | "" | non-null value | low |
|
| The maximum number of unacknowledged requests the client will send on a single connection before blocking. Note that if this setting is set to be greater than 1 and there are failed sends, there is a risk of message re-ordering due to retries (i.e., if retries are enabled). | int | 5 | [1,…] | low |
|
| The period of time in milliseconds after which we force a refresh of metadata even if we haven’t seen any partition leadership changes to proactively discover any new brokers or partitions. | long | 300000 | [0,…] | low |
|
|
A list of classes to use as metrics reporters. Implementing the | list | "" | non-null value | low |
|
| The number of samples maintained to compute metrics. | int | 2 | [1,…] | low |
|
| The highest recording level for metrics. | string | INFO | [INFO, DEBUG] | low |
|
| The window of time a metrics sample is computed over. | long | 30000 | [0,…] | low |
|
| The maximum amount of time in milliseconds to wait when reconnecting to a broker that has repeatedly failed to connect. If provided, the backoff per host will increase exponentially for each consecutive connection failure, up to this maximum. After calculating the backoff increase, 20% random jitter is added to avoid connection storms. | long | 1000 | [0,…] | low |
|
| The base amount of time to wait before attempting to reconnect to a given host. This avoids repeatedly connecting to a host in a tight loop. This backoff applies to all connection attempts by the client to a broker. | long | 50 | [0,…] | low |
|
| The amount of time to wait before attempting to retry a failed request to a given topic partition. This avoids repeatedly sending requests in a tight loop under some failure scenarios. | long | 100 | [0,…] | low |
|
| Kerberos kinit command path. | string | /usr/bin/kinit | low | |
|
| Login thread sleep time between refresh attempts. | long | 60000 | low | |
|
| Percentage of random jitter added to the renewal time. | double | 0.05 | low | |
|
| Login thread will sleep until the specified window factor of time from last refresh to ticket’s expiry has been reached, at which time it will try to renew the ticket. | double | 0.8 | low | |
|
| The amount of buffer time before credential expiration to maintain when refreshing a credential, in seconds. If a refresh would otherwise occur closer to expiration than the number of buffer seconds then the refresh will be moved up to maintain as much of the buffer time as possible. Legal values are between 0 and 3600 (1 hour); a default value of 300 (5 minutes) is used if no value is specified. This value and sasl.login.refresh.min.period.seconds are both ignored if their sum exceeds the remaining lifetime of a credential. Currently applies only to OAUTHBEARER. | short | 300 | [0,…,3600] | low |
|
| The desired minimum time for the login refresh thread to wait before refreshing a credential, in seconds. Legal values are between 0 and 900 (15 minutes); a default value of 60 (1 minute) is used if no value is specified. This value and sasl.login.refresh.buffer.seconds are both ignored if their sum exceeds the remaining lifetime of a credential. Currently applies only to OAUTHBEARER. | short | 60 | [0,…,900] | low |
|
| Login refresh thread will sleep until the specified window factor relative to the credential’s lifetime has been reached, at which time it will try to refresh the credential. Legal values are between 0.5 (50%) and 1.0 (100%) inclusive; a default value of 0.8 (80%) is used if no value is specified. Currently applies only to OAUTHBEARER. | double | 0.8 | [0.5,…,1.0] | low |
|
| The maximum amount of random jitter relative to the credential’s lifetime that is added to the login refresh thread’s sleep time. Legal values are between 0 and 0.25 (25%) inclusive; a default value of 0.05 (5%) is used if no value is specified. Currently applies only to OAUTHBEARER. | double | 0.05 | [0.0,…,0.25] | low |
|
| A list of cipher suites. This is a named combination of authentication, encryption, MAC and key exchange algorithm used to negotiate the security settings for a network connection using TLS or SSL network protocol. By default all the available cipher suites are supported. | list | null | low | |
|
| The endpoint identification algorithm to validate server hostname using server certificate. | string | https | low | |
|
| The algorithm used by key manager factory for SSL connections. Default value is the key manager factory algorithm configured for the Java Virtual Machine. | string | SunX509 | low | |
|
| The SecureRandom PRNG implementation to use for SSL cryptography operations. | string | null | low | |
|
| The algorithm used by trust manager factory for SSL connections. Default value is the trust manager factory algorithm configured for the Java Virtual Machine. | string | PKIX | low | |
|
|
The maximum amount of time in ms that the transaction coordinator will wait for a transaction status update from the producer before proactively aborting the ongoing transaction.If this value is larger than the transaction.max.timeout.ms setting in the broker, the request will fail with a | int | 60000 | low | |
|
|
The TransactionalId to use for transactional delivery. This enables reliability semantics which span multiple producer sessions since it allows the client to guarantee that transactions using the same TransactionalId have been completed prior to starting any new transactions. If no TransactionalId is provided, then the producer is limited to idempotent delivery. Note that enable.idempotence must be enabled if a TransactionalId is configured. The default is | string | null | non-empty string | low |
Appendix E. Admin Client configuration parameters
| Name | Description | Type | Default | Valid Values | Importance |
|---|---|---|---|---|---|
|
|
A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form | list | high | ||
|
| The password of the private key in the key store file. This is optional for client. | password | null | high | |
|
| The location of the key store file. This is optional for client and can be used for two-way authentication for client. | string | null | high | |
|
| The store password for the key store file. This is optional for client and only needed if ssl.keystore.location is configured. | password | null | high | |
|
| The location of the trust store file. | string | null | high | |
|
| The password for the trust store file. If a password is not set access to the truststore is still available, but integrity checking is disabled. | password | null | high | |
|
| An id string to pass to the server when making requests. The purpose of this is to be able to track the source of requests beyond just ip/port by allowing a logical application name to be included in server-side request logging. | string | "" | medium | |
|
| Close idle connections after the number of milliseconds specified by this config. | long | 300000 | medium | |
|
| The size of the TCP receive buffer (SO_RCVBUF) to use when reading data. If the value is -1, the OS default will be used. | int | 65536 | [-1,…] | medium |
|
| The configuration controls the maximum amount of time the client will wait for the response of a request. If the response is not received before the timeout elapses the client will resend the request if necessary or fail the request if retries are exhausted. | int | 120000 | [0,…] | medium |
|
| The fully qualified name of a SASL client callback handler class that implements the AuthenticateCallbackHandler interface. | class | null | medium | |
|
| JAAS login context parameters for SASL connections in the format used by JAAS configuration files. JAAS configuration file format is described here. The format for the value is: ‘loginModuleClass controlFlag (optionName=optionValue)*;’. For brokers, the config must be prefixed with listener prefix and SASL mechanism name in lower-case. For example, listener.name.sasl_ssl.scram-sha-256.sasl.jaas.config=com.example.ScramLoginModule required; | password | null | medium | |
|
| The Kerberos principal name that Kafka runs as. This can be defined either in Kafka’s JAAS config or in Kafka’s config. | string | null | medium | |
|
| The fully qualified name of a SASL login callback handler class that implements the AuthenticateCallbackHandler interface. For brokers, login callback handler config must be prefixed with listener prefix and SASL mechanism name in lower-case. For example, listener.name.sasl_ssl.scram-sha-256.sasl.login.callback.handler.class=com.example.CustomScramLoginCallbackHandler | class | null | medium | |
|
| The fully qualified name of a class that implements the Login interface. For brokers, login config must be prefixed with listener prefix and SASL mechanism name in lower-case. For example, listener.name.sasl_ssl.scram-sha-256.sasl.login.class=com.example.CustomScramLogin | class | null | medium | |
|
| SASL mechanism used for client connections. This may be any mechanism for which a security provider is available. GSSAPI is the default mechanism. | string | GSSAPI | medium | |
|
| Protocol used to communicate with brokers. Valid values are: PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL. | string | PLAINTEXT | medium | |
|
| The size of the TCP send buffer (SO_SNDBUF) to use when sending data. If the value is -1, the OS default will be used. | int | 131072 | [-1,…] | medium |
|
| The list of protocols enabled for SSL connections. | list | TLSv1.2,TLSv1.1,TLSv1 | medium | |
|
| The file format of the key store file. This is optional for client. | string | JKS | medium | |
|
| The SSL protocol used to generate the SSLContext. Default setting is TLS, which is fine for most cases. Allowed values in recent JVMs are TLS, TLSv1.1 and TLSv1.2. SSL, SSLv2 and SSLv3 may be supported in older JVMs, but their usage is discouraged due to known security vulnerabilities. | string | TLS | medium | |
|
| The name of the security provider used for SSL connections. Default value is the default security provider of the JVM. | string | null | medium | |
|
| The file format of the trust store file. | string | JKS | medium | |
|
| The period of time in milliseconds after which we force a refresh of metadata even if we haven’t seen any partition leadership changes to proactively discover any new brokers or partitions. | long | 300000 | [0,…] | low |
|
|
A list of classes to use as metrics reporters. Implementing the | list | "" | low | |
|
| The number of samples maintained to compute metrics. | int | 2 | [1,…] | low |
|
| The highest recording level for metrics. | string | INFO | [INFO, DEBUG] | low |
|
| The window of time a metrics sample is computed over. | long | 30000 | [0,…] | low |
|
| The maximum amount of time in milliseconds to wait when reconnecting to a broker that has repeatedly failed to connect. If provided, the backoff per host will increase exponentially for each consecutive connection failure, up to this maximum. After calculating the backoff increase, 20% random jitter is added to avoid connection storms. | long | 1000 | [0,…] | low |
|
| The base amount of time to wait before attempting to reconnect to a given host. This avoids repeatedly connecting to a host in a tight loop. This backoff applies to all connection attempts by the client to a broker. | long | 50 | [0,…] | low |
|
| Setting a value greater than zero will cause the client to resend any request that fails with a potentially transient error. | int | 5 | [0,…] | low |
|
| The amount of time to wait before attempting to retry a failed request. This avoids repeatedly sending requests in a tight loop under some failure scenarios. | long | 100 | [0,…] | low |
|
| Kerberos kinit command path. | string | /usr/bin/kinit | low | |
|
| Login thread sleep time between refresh attempts. | long | 60000 | low | |
|
| Percentage of random jitter added to the renewal time. | double | 0.05 | low | |
|
| Login thread will sleep until the specified window factor of time from last refresh to ticket’s expiry has been reached, at which time it will try to renew the ticket. | double | 0.8 | low | |
|
| The amount of buffer time before credential expiration to maintain when refreshing a credential, in seconds. If a refresh would otherwise occur closer to expiration than the number of buffer seconds then the refresh will be moved up to maintain as much of the buffer time as possible. Legal values are between 0 and 3600 (1 hour); a default value of 300 (5 minutes) is used if no value is specified. This value and sasl.login.refresh.min.period.seconds are both ignored if their sum exceeds the remaining lifetime of a credential. Currently applies only to OAUTHBEARER. | short | 300 | [0,…,3600] | low |
|
| The desired minimum time for the login refresh thread to wait before refreshing a credential, in seconds. Legal values are between 0 and 900 (15 minutes); a default value of 60 (1 minute) is used if no value is specified. This value and sasl.login.refresh.buffer.seconds are both ignored if their sum exceeds the remaining lifetime of a credential. Currently applies only to OAUTHBEARER. | short | 60 | [0,…,900] | low |
|
| Login refresh thread will sleep until the specified window factor relative to the credential’s lifetime has been reached, at which time it will try to refresh the credential. Legal values are between 0.5 (50%) and 1.0 (100%) inclusive; a default value of 0.8 (80%) is used if no value is specified. Currently applies only to OAUTHBEARER. | double | 0.8 | [0.5,…,1.0] | low |
|
| The maximum amount of random jitter relative to the credential’s lifetime that is added to the login refresh thread’s sleep time. Legal values are between 0 and 0.25 (25%) inclusive; a default value of 0.05 (5%) is used if no value is specified. Currently applies only to OAUTHBEARER. | double | 0.05 | [0.0,…,0.25] | low |
|
| A list of cipher suites. This is a named combination of authentication, encryption, MAC and key exchange algorithm used to negotiate the security settings for a network connection using TLS or SSL network protocol. By default all the available cipher suites are supported. | list | null | low | |
|
| The endpoint identification algorithm to validate server hostname using server certificate. | string | https | low | |
|
| The algorithm used by key manager factory for SSL connections. Default value is the key manager factory algorithm configured for the Java Virtual Machine. | string | SunX509 | low | |
|
| The SecureRandom PRNG implementation to use for SSL cryptography operations. | string | null | low | |
|
| The algorithm used by trust manager factory for SSL connections. Default value is the trust manager factory algorithm configured for the Java Virtual Machine. | string | PKIX | low |
Appendix F. Kafka Connect configuration parameters
| Name | Description | Type | Default | Valid Values | Importance |
|---|---|---|---|---|---|
|
| The name of the Kafka topic where connector configurations are stored | string | high | ||
|
| A unique string that identifies the Connect cluster group this worker belongs to. | string | high | ||
|
| Converter class used to convert between Kafka Connect format and the serialized form that is written to Kafka. This controls the format of the keys in messages written to or read from Kafka, and since this is independent of connectors it allows any connector to work with any serialization format. Examples of common formats include JSON and Avro. | class | high | ||
|
| The name of the Kafka topic where connector offsets are stored | string | high | ||
|
| The name of the Kafka topic where connector and task status are stored | string | high | ||
|
| Converter class used to convert between Kafka Connect format and the serialized form that is written to Kafka. This controls the format of the values in messages written to or read from Kafka, and since this is independent of connectors it allows any connector to work with any serialization format. Examples of common formats include JSON and Avro. | class | high | ||
|
|
A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form | list | localhost:9092 | high | |
|
|
The expected time between heartbeats to the group coordinator when using Kafka’s group management facilities. Heartbeats are used to ensure that the worker’s session stays active and to facilitate rebalancing when new members join or leave the group. The value must be set lower than | int | 3000 | high | |
|
| The maximum allowed time for each worker to join the group once a rebalance has begun. This is basically a limit on the amount of time needed for all tasks to flush any pending data and commit offsets. If the timeout is exceeded, then the worker will be removed from the group, which will cause offset commit failures. | int | 60000 | high | |
|
|
The timeout used to detect worker failures. The worker sends periodic heartbeats to indicate its liveness to the broker. If no heartbeats are received by the broker before the expiration of this session timeout, then the broker will remove the worker from the group and initiate a rebalance. Note that the value must be in the allowable range as configured in the broker configuration by | int | 10000 | high | |
|
| The password of the private key in the key store file. This is optional for client. | password | null | high | |
|
| The location of the key store file. This is optional for client and can be used for two-way authentication for client. | string | null | high | |
|
| The store password for the key store file. This is optional for client and only needed if ssl.keystore.location is configured. | password | null | high | |
|
| The location of the trust store file. | string | null | high | |
|
| The password for the trust store file. If a password is not set access to the truststore is still available, but integrity checking is disabled. | password | null | high | |
|
| Close idle connections after the number of milliseconds specified by this config. | long | 540000 | medium | |
|
| The size of the TCP receive buffer (SO_RCVBUF) to use when reading data. If the value is -1, the OS default will be used. | int | 32768 | [0,…] | medium |
|
| The configuration controls the maximum amount of time the client will wait for the response of a request. If the response is not received before the timeout elapses the client will resend the request if necessary or fail the request if retries are exhausted. | int | 40000 | [0,…] | medium |
|
| The fully qualified name of a SASL client callback handler class that implements the AuthenticateCallbackHandler interface. | class | null | medium | |
|
| JAAS login context parameters for SASL connections in the format used by JAAS configuration files. JAAS configuration file format is described here. The format for the value is: ‘loginModuleClass controlFlag (optionName=optionValue)*;’. For brokers, the config must be prefixed with listener prefix and SASL mechanism name in lower-case. For example, listener.name.sasl_ssl.scram-sha-256.sasl.jaas.config=com.example.ScramLoginModule required; | password | null | medium | |
|
| The Kerberos principal name that Kafka runs as. This can be defined either in Kafka’s JAAS config or in Kafka’s config. | string | null | medium | |
|
| The fully qualified name of a SASL login callback handler class that implements the AuthenticateCallbackHandler interface. For brokers, login callback handler config must be prefixed with listener prefix and SASL mechanism name in lower-case. For example, listener.name.sasl_ssl.scram-sha-256.sasl.login.callback.handler.class=com.example.CustomScramLoginCallbackHandler | class | null | medium | |
|
| The fully qualified name of a class that implements the Login interface. For brokers, login config must be prefixed with listener prefix and SASL mechanism name in lower-case. For example, listener.name.sasl_ssl.scram-sha-256.sasl.login.class=com.example.CustomScramLogin | class | null | medium | |
|
| SASL mechanism used for client connections. This may be any mechanism for which a security provider is available. GSSAPI is the default mechanism. | string | GSSAPI | medium | |
|
| Protocol used to communicate with brokers. Valid values are: PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL. | string | PLAINTEXT | medium | |
|
| The size of the TCP send buffer (SO_SNDBUF) to use when sending data. If the value is -1, the OS default will be used. | int | 131072 | [0,…] | medium |
|
| The list of protocols enabled for SSL connections. | list | TLSv1.2,TLSv1.1,TLSv1 | medium | |
|
| The file format of the key store file. This is optional for client. | string | JKS | medium | |
|
| The SSL protocol used to generate the SSLContext. Default setting is TLS, which is fine for most cases. Allowed values in recent JVMs are TLS, TLSv1.1 and TLSv1.2. SSL, SSLv2 and SSLv3 may be supported in older JVMs, but their usage is discouraged due to known security vulnerabilities. | string | TLS | medium | |
|
| The name of the security provider used for SSL connections. Default value is the default security provider of the JVM. | string | null | medium | |
|
| The file format of the trust store file. | string | JKS | medium | |
|
| When the worker is out of sync with other workers and needs to resynchronize configurations, wait up to this amount of time before giving up, leaving the group, and waiting a backoff period before rejoining. | int | 3000 | medium | |
|
| When the worker is out of sync with other workers and fails to catch up within worker.sync.timeout.ms, leave the Connect cluster for this long before rejoining. | int | 300000 | medium | |
|
| Sets the methods supported for cross origin requests by setting the Access-Control-Allow-Methods header. The default value of the Access-Control-Allow-Methods header allows cross origin requests for GET, POST and HEAD. | string | "" | low | |
|
| Value to set the Access-Control-Allow-Origin header to for REST API requests.To enable cross origin access, set this to the domain of the application that should be permitted to access the API, or '*' to allow access from any domain. The default value only allows access from the domain of the REST API. | string | "" | low | |
|
| An id string to pass to the server when making requests. The purpose of this is to be able to track the source of requests beyond just ip/port by allowing a logical application name to be included in server-side request logging. | string | "" | low | |
|
|
Comma-separated names of | list | "" | low | |
|
| Replication factor used when creating the configuration storage topic | short | 3 | [1,…] | low |
|
| HeaderConverter class used to convert between Kafka Connect format and the serialized form that is written to Kafka. This controls the format of the header values in messages written to or read from Kafka, and since this is independent of connectors it allows any connector to work with any serialization format. Examples of common formats include JSON and Avro. By default, the SimpleHeaderConverter is used to serialize header values to strings and deserialize them by inferring the schemas. | class | org.apache.kafka.connect.storage.SimpleHeaderConverter | low | |
|
| Converter class used to convert between Kafka Connect format and the serialized form that is written to Kafka. This controls the format of the keys in messages written to or read from Kafka, and since this is independent of connectors it allows any connector to work with any serialization format. Examples of common formats include JSON and Avro. This setting controls the format used for internal bookkeeping data used by the framework, such as configs and offsets, so users can typically use any functioning Converter implementation. Deprecated; will be removed in an upcoming version. | class | org.apache.kafka.connect.json.JsonConverter | low | |
|
| Converter class used to convert between Kafka Connect format and the serialized form that is written to Kafka. This controls the format of the values in messages written to or read from Kafka, and since this is independent of connectors it allows any connector to work with any serialization format. Examples of common formats include JSON and Avro. This setting controls the format used for internal bookkeeping data used by the framework, such as configs and offsets, so users can typically use any functioning Converter implementation. Deprecated; will be removed in an upcoming version. | class | org.apache.kafka.connect.json.JsonConverter | low | |
|
| List of comma-separated URIs the REST API will listen on. The supported protocols are HTTP and HTTPS. Specify hostname as 0.0.0.0 to bind to all interfaces. Leave hostname empty to bind to default interface. Examples of legal listener lists: HTTP://myhost:8083,HTTPS://myhost:8084 | list | null | low | |
|
| The period of time in milliseconds after which we force a refresh of metadata even if we haven’t seen any partition leadership changes to proactively discover any new brokers or partitions. | long | 300000 | [0,…] | low |
|
|
A list of classes to use as metrics reporters. Implementing the | list | "" | low | |
|
| The number of samples maintained to compute metrics. | int | 2 | [1,…] | low |
|
| The highest recording level for metrics. | string | INFO | [INFO, DEBUG] | low |
|
| The window of time a metrics sample is computed over. | long | 30000 | [0,…] | low |
|
| Interval at which to try committing offsets for tasks. | long | 60000 | low | |
|
| Maximum number of milliseconds to wait for records to flush and partition offset data to be committed to offset storage before cancelling the process and restoring the offset data to be committed in a future attempt. | long | 5000 | low | |
|
| The number of partitions used when creating the offset storage topic | int | 25 | [1,…] | low |
|
| Replication factor used when creating the offset storage topic | short | 3 | [1,…] | low |
|
| List of paths separated by commas (,) that contain plugins (connectors, converters, transformations). The list should consist of top level directories that include any combination of: a) directories immediately containing jars with plugins and their dependencies b) uber-jars with plugins and their dependencies c) directories immediately containing the package directory structure of classes of plugins and their dependencies Note: symlinks will be followed to discover dependencies or plugins. Examples: plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors | list | null | low | |
|
| The maximum amount of time in milliseconds to wait when reconnecting to a broker that has repeatedly failed to connect. If provided, the backoff per host will increase exponentially for each consecutive connection failure, up to this maximum. After calculating the backoff increase, 20% random jitter is added to avoid connection storms. | long | 1000 | [0,…] | low |
|
| The base amount of time to wait before attempting to reconnect to a given host. This avoids repeatedly connecting to a host in a tight loop. This backoff applies to all connection attempts by the client to a broker. | long | 50 | [0,…] | low |
|
| If this is set, this is the hostname that will be given out to other workers to connect to. | string | null | low | |
|
| Sets the advertised listener (HTTP or HTTPS) which will be given to other workers to use. | string | null | low | |
|
| If this is set, this is the port that will be given out to other workers to connect to. | int | null | low | |
|
|
Comma-separated names of | list | "" | low | |
|
| Hostname for the REST API. If this is set, it will only bind to this interface. | string | null | low | |
|
| Port for the REST API to listen on. | int | 8083 | low | |
|
| The amount of time to wait before attempting to retry a failed request to a given topic partition. This avoids repeatedly sending requests in a tight loop under some failure scenarios. | long | 100 | [0,…] | low |
|
| Kerberos kinit command path. | string | /usr/bin/kinit | low | |
|
| Login thread sleep time between refresh attempts. | long | 60000 | low | |
|
| Percentage of random jitter added to the renewal time. | double | 0.05 | low | |
|
| Login thread will sleep until the specified window factor of time from last refresh to ticket’s expiry has been reached, at which time it will try to renew the ticket. | double | 0.8 | low | |
|
| The amount of buffer time before credential expiration to maintain when refreshing a credential, in seconds. If a refresh would otherwise occur closer to expiration than the number of buffer seconds then the refresh will be moved up to maintain as much of the buffer time as possible. Legal values are between 0 and 3600 (1 hour); a default value of 300 (5 minutes) is used if no value is specified. This value and sasl.login.refresh.min.period.seconds are both ignored if their sum exceeds the remaining lifetime of a credential. Currently applies only to OAUTHBEARER. | short | 300 | [0,…,3600] | low |
|
| The desired minimum time for the login refresh thread to wait before refreshing a credential, in seconds. Legal values are between 0 and 900 (15 minutes); a default value of 60 (1 minute) is used if no value is specified. This value and sasl.login.refresh.buffer.seconds are both ignored if their sum exceeds the remaining lifetime of a credential. Currently applies only to OAUTHBEARER. | short | 60 | [0,…,900] | low |
|
| Login refresh thread will sleep until the specified window factor relative to the credential’s lifetime has been reached, at which time it will try to refresh the credential. Legal values are between 0.5 (50%) and 1.0 (100%) inclusive; a default value of 0.8 (80%) is used if no value is specified. Currently applies only to OAUTHBEARER. | double | 0.8 | [0.5,…,1.0] | low |
|
| The maximum amount of random jitter relative to the credential’s lifetime that is added to the login refresh thread’s sleep time. Legal values are between 0 and 0.25 (25%) inclusive; a default value of 0.05 (5%) is used if no value is specified. Currently applies only to OAUTHBEARER. | double | 0.05 | [0.0,…,0.25] | low |
|
| A list of cipher suites. This is a named combination of authentication, encryption, MAC and key exchange algorithm used to negotiate the security settings for a network connection using TLS or SSL network protocol. By default all the available cipher suites are supported. | list | null | low | |
|
| Configures kafka broker to request client authentication. The following settings are common:
| string | none | low | |
|
| The endpoint identification algorithm to validate server hostname using server certificate. | string | https | low | |
|
| The algorithm used by key manager factory for SSL connections. Default value is the key manager factory algorithm configured for the Java Virtual Machine. | string | SunX509 | low | |
|
| The SecureRandom PRNG implementation to use for SSL cryptography operations. | string | null | low | |
|
| The algorithm used by trust manager factory for SSL connections. Default value is the trust manager factory algorithm configured for the Java Virtual Machine. | string | PKIX | low | |
|
| The number of partitions used when creating the status storage topic | int | 5 | [1,…] | low |
|
| Replication factor used when creating the status storage topic | short | 3 | [1,…] | low |
|
| Amount of time to wait for tasks to shutdown gracefully. This is the total amount of time, not per task. All task have shutdown triggered, then they are waited on sequentially. | long | 5000 | low |
Appendix G. Kafka Streams configuration parameters
| Name | Description | Type | Default | Valid Values | Importance |
|---|---|---|---|---|---|
|
| An identifier for the stream processing application. Must be unique within the Kafka cluster. It is used as 1) the default client-id prefix, 2) the group-id for membership management, 3) the changelog topic prefix. | string | high | ||
|
|
A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form | list | high | ||
|
| The replication factor for change log topics and repartition topics created by the stream processing application. | int | 1 | high | |
|
| Directory location for state store. | string | /tmp/kafka-streams | high | |
|
| Maximum number of memory bytes to be used for buffering across all threads | long | 10485760 | [0,…] | medium |
|
| An ID prefix string used for the client IDs of internal consumer, producer and restore-consumer, with pattern '<client.id>-StreamThread-<threadSequenceNumber>-<consumer|producer|restore-consumer>'. | string | "" | medium | |
|
|
Exception handling class that implements the | class | org.apache.kafka.streams.errors.LogAndFailExceptionHandler | medium | |
|
|
Default serializer / deserializer class for key that implements the | class | org.apache.kafka.common.serialization.Serdes$ByteArraySerde | medium | |
|
|
Exception handling class that implements the | class | org.apache.kafka.streams.errors.DefaultProductionExceptionHandler | medium | |
|
|
Default timestamp extractor class that implements the | class | org.apache.kafka.streams.processor.FailOnInvalidTimestamp | medium | |
|
|
Default serializer / deserializer class for value that implements the | class | org.apache.kafka.common.serialization.Serdes$ByteArraySerde | medium | |
|
| The number of standby replicas for each task. | int | 0 | medium | |
|
| The number of threads to execute stream processing. | int | 1 | medium | |
|
|
The processing guarantee that should be used. Possible values are | string | at_least_once | [at_least_once, exactly_once] | medium |
|
| Protocol used to communicate with brokers. Valid values are: PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL. | string | PLAINTEXT | medium | |
|
| A configuration telling Kafka Streams if it should optimize the topology, disabled by default | string | none | [none, all] | medium |
|
| A host:port pair pointing to an embedded user defined endpoint that can be used for discovering the locations of state stores within a single KafkaStreams application | string | "" | low | |
|
| The maximum number of records to buffer per partition. | int | 1000 | low | |
|
| The frequency with which to save the position of the processor. (Note, if 'processing.guarantee' is set to 'exactly_once', the default value is 100, otherwise the default value is 30000. | long | 30000 | low | |
|
| Close idle connections after the number of milliseconds specified by this config. | long | 540000 | low | |
|
| The period of time in milliseconds after which we force a refresh of metadata even if we haven’t seen any partition leadership changes to proactively discover any new brokers or partitions. | long | 300000 | [0,…] | low |
|
|
A list of classes to use as metrics reporters. Implementing the | list | "" | low | |
|
| The number of samples maintained to compute metrics. | int | 2 | [1,…] | low |
|
| The highest recording level for metrics. | string | INFO | [INFO, DEBUG] | low |
|
| The window of time a metrics sample is computed over. | long | 30000 | [0,…] | low |
|
|
Partition grouper class that implements the | class | org.apache.kafka.streams.processor.DefaultPartitionGrouper | low | |
|
| The amount of time in milliseconds to block waiting for input. | long | 100 | low | |
|
| The size of the TCP receive buffer (SO_RCVBUF) to use when reading data. If the value is -1, the OS default will be used. | int | 32768 | [0,…] | low |
|
| The maximum amount of time in milliseconds to wait when reconnecting to a broker that has repeatedly failed to connect. If provided, the backoff per host will increase exponentially for each consecutive connection failure, up to this maximum. After calculating the backoff increase, 20% random jitter is added to avoid connection storms. | long | 1000 | [0,…] | low |
|
| The base amount of time to wait before attempting to reconnect to a given host. This avoids repeatedly connecting to a host in a tight loop. This backoff applies to all connection attempts by the client to a broker. | long | 50 | [0,…] | low |
|
| The configuration controls the maximum amount of time the client will wait for the response of a request. If the response is not received before the timeout elapses the client will resend the request if necessary or fail the request if retries are exhausted. | int | 40000 | [0,…] | low |
|
| Setting a value greater than zero will cause the client to resend any request that fails with a potentially transient error. | int | 0 | [0,…,2147483647] | low |
|
| The amount of time to wait before attempting to retry a failed request to a given topic partition. This avoids repeatedly sending requests in a tight loop under some failure scenarios. | long | 100 | [0,…] | low |
|
|
A Rocks DB config setter class or class name that implements the | class | null | low | |
|
| The size of the TCP send buffer (SO_SNDBUF) to use when sending data. If the value is -1, the OS default will be used. | int | 131072 | [0,…] | low |
|
| The amount of time in milliseconds to wait before deleting state when a partition has migrated. Only state directories that have not been modified for at least state.cleanup.delay.ms will be removed | long | 600000 | low | |
|
| Allows upgrading from versions 0.10.0/0.10.1/0.10.2/0.11.0/1.0/1.1 to version 1.2 (or newer) in a backward compatible way. When upgrading from 1.2 to a newer version it is not required to specify this config.Default is null. Accepted values are "0.10.0", "0.10.1", "0.10.2", "0.11.0", "1.0", "1.1" (for upgrading from the corresponding old version). | string | null | [null, 0.10.0, 0.10.1, 0.10.2, 0.11.0, 1.0, 1.1] | low |
|
| Added to a windows maintainMs to ensure data is not deleted from the log prematurely. Allows for clock drift. Default is 1 day | long | 86400000 | low |
