Making open source more inclusive
Red Hat is committed to replacing problematic language in our code and documentation. We are beginning with these four terms: master, slave, blacklist, and whitelist. Due to the enormity of this endeavor, these changes will be gradually implemented over upcoming releases. For more details on making our language more inclusive, see our CTO Chris Wright’s message.
What is OpenShift Streams for Apache Kafka?
Red Hat OpenShift Streams for Apache Kafka is a cloud service that simplifies the process of running Apache Kafka. Apache Kafka is an open source, distributed, publish-subscribe messaging system for creating fault-tolerant, real-time data feeds.
Your Red Hat account gives you access to OpenShift Streams for Apache Kafka. Within minutes, you’ll have a Kafka instance up and running, and be ready to start defining the Kafka configuration you need.
Understanding Kafka instances
A Kafka instance comprises a Kafka cluster with multiple brokers. Brokers contain topics that receive and store data. Kafka splits topics into partitions and distributes and replicates the partitions across brokers for fault tolerance and increased throughput.
To understand how a Kafka instance operates as a message broker, it’s important to understand the key concepts described here.
- Kafka cluster
- A Kafka cluster is a group of Kafka brokers, ZooKeeper instances, and management components.
- ZooKeeper provides a cluster coordination service, storing and tracking the status of brokers and consumers.
- A broker, sometimes referred to as a server or node, contains topics and orchestrates the storage and passing of messages.
- A topic provides a destination for the storage of data. Kafka splits each topic into one or more partitions.
- A partition is a subset of a topic. Kafka uses partitions for data sharding and replication. The number of topic partitions is defined by a topic partition count.
- Partition leader
- A partition leader handles all producer requests for a given partition.
- Partition follower
- A partition follower replicates the partition data of a partition leader, optionally handling consumer requests.
- Replication factor
- Topics use a replication factor to configure the number of replicas of each partition within the cluster. A topic comprises at least one partition. In OpenShift Streams for Apache Kafka, the replication factor is 3.
- In-sync replicas
- Kafka elects a partition leader to handle all producer requests for a partition. Partition followers on other brokers replicate the partition data of the partition leader. Each in-sync replica has the same number of messages as the leader. If the leader partition fails, Kafka chooses an in-sync replica as the new leader.
In your configuration, you can define how many replicas must be in sync to be able to produce messages. Kafka commits a message only after the message has been successfully copied to the partition replica. In this way, if the leader fails, the message is not lost.
In OpenShift Streams for Apache Kafka, the minimum number of in-sync replicas is 2. Here we can see in the diagram that each numbered partition has a leader and two followers in replicated topics.
Kafka producers and consumers
Producers and consumers send and receive (publish and subscribe) messages through Kafka brokers. Kafka clients act as producers, consumers, or both.
- A producer sends messages to a broker topic. The broker then writes the messages to a partition, either on a round-robin basis or to a specific partition based on a message key. A round-robin approach distributes messages equally across partitions.
- A consumer subscribes to a topic and reads messages according to partition and offset.
- Consumer group
- You can use consumer groups to share a large data stream generated by multiple producers. Kafka groups any consumers that have the same group ID. It then distributes messages across the group members. Consumers within a group don’t read data from the same partition, but can receive data from one or more partitions.
Offsets allow consumers to track their position in each partition. Each message in a given partition has a unique offset. Offsets help identify the position of a consumer within the partition to track the number of records that have been consumed.
The producer offset at the head of the partition shows the write position. The consumer offset position shows the read position.
When a record is consumed, the consumer’s position in the partition is updated by committing an offset. The Kafka
__consumer_offsetstopic stores information about the latest committed offset for each partition, according to the consumer group.
Kafka’s underlying data stream-processing capabilities and component architecture deliver the following key capabilities:
Extremely high throughput and low latency data sharing between microservices and other applications
Message ordering guarantees
Message rewind/replay from data storage to reconstruct an application state
Message compaction using keys to identify messages and retain only the most recent
Horizontal scalability in a cluster configuration
Replication of data to control fault tolerance
Retention of high volumes of data for immediate access
Kafka use cases
Kafka’s capabilities make it suitable for the following use cases:
Event sourcing to capture changes to the state of an application as a log of events
Website activity tracking
Operational monitoring through metrics
Log collection and aggregation
Commit logs for distributed systems
Stream processing so that applications can respond to data in real time
How OpenShift Streams for Apache Kafka supports Apache Kafka
Red Hat OpenShift Streams for Apache Kafka puts you in control of setting up Apache Kafka. As well as managing the OpenShift Streams for Apache Kafka service through the web console, you can also download and use a dedicated command-line interface (CLI), or use the publicly available REST APIs for provisioning and administration tasks.
Use the OpenShift Streams for Apache Kafka components to quickly and easily create new Kafka instances and manage those instances.
The components simplify the following actions:
Running Kafka instances
Creating and managing topics
Configuring access to Kafka
Securing access to Kafka
After you’ve configured the Kafka instance, you can generate connection details. You use the connection details to connect the Kafka client applications and Kafka utilities that will produce and consume messages from the Kafka instance.
Use the web console to perform the following actions:
Create and manage Kafka instances and topics.
Create and manage a service account to connect to Kafka.
View the status and configuration of your Kafka deployment before you make updates.
You can view the status of a Kafka instance before navigating to the page where you manage your topics or view information about the consumer groups connected to the Kafka instance.
You add the credentials generated for a service account to client applications so that they can connect to the Kafka instance.
Download and use the
rhoas command-line interface (CLI) tool to manage your OpenShift Streams for Apache Kafka service from the command line.
rhoas CLI tool to perform the same operations that are available in the console. To manage your Kafka resources, you can use
delete commands. To view information about resources, you can use
The syntax for the CLI tool is easy to understand, which makes it intuitive to use. For example, you might use the
topic commands to create a Kafka topic, and update the topic after viewing information about its configuration.
rhoas kafka topic create --name topic-1
rhoas kafka topic describe --name topic-1
This update command changes the number of partitions, the message retention period, and the number of replicas for the topic:
rhoas kafka topic update --name topic-3 --retention-ms 36000 --partitions 3
OpenShift Streams for Apache Kafka provides REST APIs that enable HTTP-based interactions with the service. The available REST APIs are described as follows:
- Management API
OpenShift Streams for Apache Kafka Management API to perform management tasks for OpenShift Streams for Apache Kafka. For example, you can perform the following tasks:
Retrieve instance information
Create, delete, or update instance deployments
- Instance API
OpenShift Streams for Apache Kafka Instance API to manage and interact with resources in your Kafka instances. For example, you can perform the following tasks:
View, create, update, and delete topics
View, update, and delete consumer groups
View, create, and delete Access Control Lists (ACLs)
Produce and consume messages
For a detailed overview of how to interact with the OpenShift Streams for Apache Kafka REST APIs, see Interacting with Red Hat OpenShift Application Services using APIs.
Guided help with quick starts
The web console has tours and quick starts that help to guide you through Kafka-related tasks. Click the light-bulb icon in the console to take a tour. Navigate to Learning Resources to find quick starts.
Take a tour to learn how to use the service, and how to navigate the console to find information.
Step through a quick start to perform a specific task.
Each quick start has a clearly defined set of goals, and each goal contains the steps you need to accomplish that goal. For example, you might want to begin with the Getting Started with Red Hat OpenShift Streams for Apache Kafka quick start to learn how to create and inspect a Kafka instance, and then create a topic for that instance.
When you have a Kafka instance running, and topics to store your messages, the Configuring and connecting Kafka scripts with Red Hat OpenShift Streams for Apache Kafka quick start shows you how to use example Kafka producer and consumer clients to start producing and consuming messages in minutes.
How to use OpenShift Streams for Apache Kafka
Use Red Hat OpenShift Streams for Apache Kafka to create, manage, and check the status of Kafka instances, topics, and consumer groups. You can perform these activities from the console, the
rhoas CLI, or REST APIs.
Create a Kafka instance and a service account that provides the details for connecting to that Kafka instance. With the Kafka instance running, create and manage topics. Use the service account connection details to handle user requests from Kafka client applications to produce or consume messages from those topics.
Create Kafka instances
When you first use the OpenShift Streams for Apache Kafka service, you’ll begin by creating a Kafka instance. After the instance is created, you’re ready to create topics.
To send or consume messages to topics, you can look up the bootstrap server and generate a service account. Applications can then use this service account to connect to the Kafka instance.
Create service accounts
A service account is like a user account, but for client applications. You create a service account to allow client applications to connect to your Kafka instance. When you first create a service account, the credentials required for connection — a client ID and client secret — are generated for the account.
Manage account access
ACLs enable you to manage access to the Kafka resources that you create. You can set ACL permissions for user accounts and service accounts. You can use the web console or the
rhoas CLI to set ACL-based access permissions for service accounts and other user accounts. You set the permissions to enable accounts to access resources, such as consumer groups, topics, or messages that use the same transactional ID. Kafka uses transactional IDs to provide delivery guarantees for related messages.
You authorize access and specify the operations allowed on the resources. For example, you might specify an ACL permission that allows consumers to read messages from all topics in the Kafka instance. The permissions apply to all client applications associated with the account.
Create and configure topics
When you create a topic, you configure the following items:
The number of topic partitions
Size-based and time-based message retention policies
You can also view configuration for the following items:
Message size and compression type
Log indexing, and cleanup and flushing of old data
From the web console, you can view all the topics created for a Kafka instance. You can then select and delete any topics.
Set up client access to Kafka instances
You can set up clients and utilities to produce and consume messages from your Kafka instances. You must configure the client with the connection details needed to make a secure connection.
OpenShift Streams for Apache Kafka provides both SASL/OAUTHBEARER and SASL/PLAIN for client authentication.
Both mechanisms allow clients to establish authenticated sessions with the Kafka instance. SASL/OAUTHBEARER authentication is the recommended authentication method.
SASL/OAUTHBEARER authentication uses token-based credentials exchange, which means the client never shares its credentials with the Kafka instance. Many standard Kafka clients have in-built support for SASL/OAUTHBEARER authentication.
If a client doesn’t support SASL/OAUTHBEARER authentication, you can use SASL/PLAIN authentication. For SASL/PLAIN authentication, the connection details include the client ID and client secret created for the service account and the bootstrap server URL.
Monitor and configure consumer groups
When you configure an application client to access Kafka, you can assign a group ID to associate the consumer with a consumer group. All consumers with the same group ID belong to the consumer group with that ID.
Use OpenShift Streams for Apache Kafka components to check the status of consumer groups that access a particular Kafka instance.
Check consumer groups in the web console
View information about the consumer groups in the web console. For each consumer group, you can check the following information:
The total number of active members
The total number of partitions with consumer lag
Consumer lag indicates a delay between the last message added to a partition and the message currently being picked up by the consumer subscribed to that partition.
You can view details of the consumers receiving messages from each partition in the topic. For each consumer group, you can check the total number of unconsumed partitions. If a partition is not being consumed, it can indicate that a consumer subscribed to the topic is down or the partition is empty.
Track offset positions for consumers
You can also track a consumer’s position in a partition through the following offset information:
Current offset is current offset number for the consumer in the partition log.
Log end offset is the current offset number for the producer in the partition log.
Offset lag is the difference between the consumer and producer offset positions in the log.
Consumer lag reflects the position of the consumer offset in relation to the end of the partition log. This difference is sometimes referred to as the delta between the producer offset and consumer offset, which are the read and write positions in the Kafka broker topic partitions. And it’s a particularly important metric. For applications that rely on the processing of (near) real-time data, it’s critical to monitor consumer lag to check that it doesn’t become too big. The greater the lag becomes, the further the process moves from the real-time processing objective. Lag is often reduced by adding new consumers to a group.
You can use OpenShift Streams for Apache Kafka components to manage consumer groups. After you stop the consumers in a group, you can delete the group, or reset consumer offsets. A reset changes the position from which consumers read the message log of a topic partition. For example, you can reset offsets so that new consumers fetch messages from the start of a message log rather than fetching the latest message.
View metrics on resource utilization
When you select a Kafka instance in the web console, use the Dashboard page to view metrics for Kafka instances and topics. The metrics provide information for monitoring of resource utilization.
You can check the disk space used by Kafka brokers to store message data over a specified period. You can also check data traffic by viewing the total amount of data sent to and consumed from topics. The metrics provide insights that can help you tune the performance of your Kafka instances.
You can also use the OpenShift Streams for Apache Kafka Management API to get metrics from the Kafka instance.
Bind OpenShift Dedicated-based applications to the service
Red Hat OpenShift Dedicated is an enterprise Kubernetes platform managed and supported by Red Hat. OpenShift Dedicated removes the operational complexity of running and maintaining OpenShift on a cloud provider.
If you’re running an OpenShift Dedicated-based client application, you can use the Red Hat Service Binding Operator to bind the application from a given namespace to the OpenShift Streams for Apache Kafka service.
rhoas cluster connect command to complete the following actions:
Create a service account and mount it as a secret into your cluster.
Create a Kafka
Requestobject to create a
ServiceBindingobject using the Service Binding operator.
Get OpenShift Streams for Apache Kafka
You can purchase Red Hat OpenShift Streams for Apache Kafka either as a pay-as-you-go subscription, or as a pre-paid subscription. For more information and for pricing details, go to https://console.redhat.com/application-services/streams/overview.
To try OpenShift Streams for Apache Kafka, you can create a no-cost trial instance. To learn more, go to console.redhat.com.