Introduction to Red Hat OpenShift Streams for Apache Kafka

Guide
  • Red Hat OpenShift Streams for Apache Kafka 1
  • Updated 22 November 2022
  • Published 14 April 2021

Introduction to Red Hat OpenShift Streams for Apache Kafka

Guide
Red Hat OpenShift Streams for Apache Kafka 1
  • Updated 22 November 2022
  • Published 14 April 2021

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code and documentation. We are beginning with these four terms: master, slave, blacklist, and whitelist. Due to the enormity of this endeavor, these changes will be gradually implemented over upcoming releases. For more details on making our language more inclusive, see our CTO Chris Wright’s message.

What is OpenShift Streams for Apache Kafka?

Red Hat OpenShift Streams for Apache Kafka is a cloud service that simplifies the process of running Apache Kafka. Apache Kafka is an open source, distributed, publish-subscribe messaging system for creating fault-tolerant, real-time data feeds.

Your Red Hat account gives you access to OpenShift Streams for Apache Kafka. Within minutes, you’ll have a Kafka instance up and running, and be ready to start defining the Kafka configuration you need.

Understanding Kafka instances

A Kafka instance comprises a Kafka cluster with multiple brokers. Brokers contain topics that receive and store data. Kafka splits topics into partitions and distributes and replicates the partitions across brokers for fault tolerance and increased throughput.

To understand how a Kafka instance operates as a message broker, it’s important to understand the key concepts described here.

Kafka cluster
A Kafka cluster is a group of Kafka brokers, ZooKeeper instances, and management components.
ZooKeeper
ZooKeeper provides a cluster coordination service, storing and tracking the status of brokers and consumers.
Broker
A broker, sometimes referred to as a server or node, contains topics and orchestrates the storage and passing of messages.
Topic
A topic provides a destination for the storage of data. Kafka splits each topic into one or more partitions.
Partition
A partition is a subset of a topic. Kafka uses partitions for data sharding and replication. The number of topic partitions is defined by a topic partition count.
Partition leader
A partition leader handles all producer requests for a given partition.
Partition follower
A partition follower replicates the partition data of a partition leader, optionally handling consumer requests.
Replication factor
Topics use a replication factor to configure the number of replicas of each partition within the cluster. A topic comprises at least one partition. In OpenShift Streams for Apache Kafka, the replication factor is 3.
In-sync replicas
Kafka elects a partition leader to handle all producer requests for a partition. Partition followers on other brokers replicate the partition data of the partition leader. Each in-sync replica has the same number of messages as the leader. If the leader partition fails, Kafka chooses an in-sync replica as the new leader.

In your configuration, you can define how many replicas must be in sync to be able to produce messages. Kafka commits a message only after the message has been successfully copied to the partition replica. In this way, if the leader fails, the message is not lost.

In OpenShift Streams for Apache Kafka, the minimum number of in-sync replicas is 2. Here we can see in the diagram that each numbered partition has a leader and two followers in replicated topics.

Image of Kafka cluster with replicated partitions
Figure 1. Kafka topics with replicated partitions

Kafka producers and consumers

Producers and consumers send and receive (publish and subscribe) messages through Kafka brokers. Kafka clients act as producers, consumers, or both.

Producer
A producer sends messages to a broker topic. The broker then writes the messages to a partition, either on a round-robin basis or to a specific partition based on a message key. A round-robin approach distributes messages equally across partitions.
Consumer
A consumer subscribes to a topic and reads messages according to partition and offset.
Consumer group
You can use consumer groups to share a large data stream generated by multiple producers. Kafka groups any consumers that have the same group ID. It then distributes messages across the group members. Consumers within a group don’t read data from the same partition, but can receive data from one or more partitions.
Image of Kafka producers and consumers interacting with Kafka
Figure 2. Producers and consumers interacting with the Kafka broker
Offsets
Offsets allow consumers to track their position in each partition. Each message in a given partition has a unique offset. Offsets help identify the position of a consumer within the partition to track the number of records that have been consumed.

The producer offset at the head of the partition shows the write position. The consumer offset position shows the read position.

Image of Kafka producers and consumers writing and reading using offsets
Figure 3. Producing and consuming data from a partition

When a record is consumed, the consumer’s position in the partition is updated by committing an offset. The Kafka __consumer_offsets topic stores information about the latest committed offset for each partition, according to the consumer group.

Kafka capabilities

Kafka’s underlying data stream-processing capabilities and component architecture deliver the following key capabilities:

  • Extremely high throughput and low latency data sharing between microservices and other applications

  • Message ordering guarantees

  • Message rewind/replay from data storage to reconstruct an application state

  • Message compaction using keys to identify messages and retain only the most recent

  • Horizontal scalability in a cluster configuration

  • Replication of data to control fault tolerance

  • Retention of high volumes of data for immediate access

Kafka use cases

Kafka’s capabilities make it suitable for the following use cases:

  • Event-driven architectures

  • Event sourcing to capture changes to the state of an application as a log of events

  • Message brokering

  • Website activity tracking

  • Operational monitoring through metrics

  • Log collection and aggregation

  • Commit logs for distributed systems

  • Stream processing so that applications can respond to data in real time

Additional resources

How OpenShift Streams for Apache Kafka supports Apache Kafka

Red Hat OpenShift Streams for Apache Kafka puts you in control of setting up Apache Kafka. As well as managing the OpenShift Streams for Apache Kafka service through the web console, you can also download and use a dedicated command-line interface (CLI), or use the publicly available REST APIs for provisioning and administration tasks.

Use the OpenShift Streams for Apache Kafka components to quickly and easily create new Kafka instances and manage those instances.

The components simplify the following actions:

  • Running Kafka instances

  • Managing brokers

  • Creating and managing topics

  • Configuring access to Kafka

  • Securing access to Kafka

After you’ve configured the Kafka instance, you can generate connection details. You use the connection details to connect the Kafka client applications and Kafka utilities that will produce and consume messages from the Kafka instance.

Image of OpenShift Streams for Apache Kafka components interacting with Kafka
Figure 4. OpenShift Streams for Apache Kafka components for managing Kafka

Web console

Use the web console to perform the following actions:

  • Create and manage Kafka instances and topics.

  • Create and manage a service account to connect to Kafka.

  • View the status and configuration of your Kafka deployment before you make updates.

You can view the status of a Kafka instance before navigating to the page where you manage your topics or view information about the consumer groups connected to the Kafka instance.

You add the credentials generated for a service account to client applications so that they can connect to the Kafka instance.

CLI tool

Download and use the rhoas command-line interface (CLI) tool to manage your OpenShift Streams for Apache Kafka service from the command line.

Use the rhoas CLI tool to perform the same operations that are available in the console. To manage your Kafka resources, you can use create, update, and delete commands. To view information about resources, you can use status, list, and describe commands.

The syntax for the CLI tool is easy to understand, which makes it intuitive to use. For example, you might use the topic commands to create a Kafka topic, and update the topic after viewing information about its configuration.

Command to create a Kafka topic
rhoas kafka topic create --name topic-1
Command to view the details of a specific Kafka topic
rhoas kafka topic describe --name topic-1

This update command changes the number of partitions, the message retention period, and the number of replicas for the topic:

Command to update the configuration of a Kafka topic
rhoas kafka topic update --name topic-3 --retention-ms 36000 --partitions 3

REST APIs

OpenShift Streams for Apache Kafka provides REST APIs that enable HTTP-based interactions with the service. The available REST APIs are described as follows:

Management API
Use the OpenShift Streams for Apache Kafka Management API to perform management tasks for OpenShift Streams for Apache Kafka. For example, you can perform the following tasks:
  • Retrieve instance information

  • Create, delete, or update instance deployments

  • Manage security

  • View metrics

Instance API
Use the OpenShift Streams for Apache Kafka Instance API to manage and interact with resources in your Kafka instances. For example, you can perform the following tasks:
  • View, create, update, and delete topics

  • View, update, and delete consumer groups

  • View, create, and delete Access Control Lists (ACLs)

  • Produce and consume messages

For a detailed overview of how to interact with the OpenShift Streams for Apache Kafka REST APIs, see Interacting with Red Hat OpenShift Application Services using APIs.

Guided help with quick starts

The web console has tours and quick starts that help to guide you through Kafka-related tasks. Click the light-bulb icon in the console to take a tour. Navigate to Learning Resources to find quick starts.

  • Take a tour to learn how to use the service, and how to navigate the console to find information.

  • Step through a quick start to perform a specific task.

Each quick start has a clearly defined set of goals, and each goal contains the steps you need to accomplish that goal. For example, you might want to begin with the Getting Started with Red Hat OpenShift Streams for Apache Kafka quick start to learn how to create and inspect a Kafka instance, and then create a topic for that instance.

Image of OpenShift Streams for Apache Kafka console showing quick starts
Figure 5. OpenShift Streams for Apache Kafka web console

When you have a Kafka instance running, and topics to store your messages, the Configuring and connecting Kafka scripts with Red Hat OpenShift Streams for Apache Kafka quick start shows you how to use example Kafka producer and consumer clients to start producing and consuming messages in minutes.

How to use OpenShift Streams for Apache Kafka

Use Red Hat OpenShift Streams for Apache Kafka to create, manage, and check the status of Kafka instances, topics, and consumer groups. You can perform these activities from the console, the rhoas CLI, or REST APIs.

Create a Kafka instance and a service account that provides the details for connecting to that Kafka instance. With the Kafka instance running, create and manage topics. Use the service account connection details to handle user requests from Kafka client applications to produce or consume messages from those topics.

Create Kafka instances

When you first use the OpenShift Streams for Apache Kafka service, you’ll begin by creating a Kafka instance. After the instance is created, you’re ready to create topics.

To send or consume messages to topics, you can look up the bootstrap server and generate a service account. Applications can then use this service account to connect to the Kafka instance.

Create service accounts

A service account is like a user account, but for client applications. You create a service account to allow client applications to connect to your Kafka instance. When you first create a service account, the credentials required for connection — a client ID and client secret — are generated for the account.

You can reset the credentials or delete the service account at any time. However, you can only access the client ID and secret when you first create a service account. Make sure to record these details during the creation process for later use when connecting producer or consumer applications to the Kafka instance.

Manage account access

ACLs enable you to manage access to the Kafka resources that you create. You can set ACL permissions for user accounts and service accounts. You can use the web console or the rhoas CLI to set ACL-based access permissions for service accounts and other user accounts. You set the permissions to enable accounts to access resources, such as consumer groups, topics, or messages that use the same transactional ID. Kafka uses transactional IDs to provide delivery guarantees for related messages.

You authorize access and specify the operations allowed on the resources. For example, you might specify an ACL permission that allows consumers to read messages from all topics in the Kafka instance. The permissions apply to all client applications associated with the account.

Create and configure topics

When you create a topic, you configure the following items:

  • The number of topic partitions

  • Size-based and time-based message retention policies

You can also view configuration for the following items:

  • Message size and compression type

  • Log indexing, and cleanup and flushing of old data

From the web console, you can view all the topics created for a Kafka instance. You can then select and delete any topics.

For topic replication in Kafka, partition leader elections can be clean or unclean. OpenShift Streams for Apache Kafka allows only clean leader election, which means that out-of-sync replicas cannot become leaders. If no in-sync replicas are available, Kafka waits until the original leader is back online before it picks up messages again.

Set up client access to Kafka instances

You can set up clients and utilities to produce and consume messages from your Kafka instances. You must configure the client with the connection details needed to make a secure connection.

OpenShift Streams for Apache Kafka provides both SASL/OAUTHBEARER and SASL/PLAIN for client authentication.

Both mechanisms allow clients to establish authenticated sessions with the Kafka instance. SASL/OAUTHBEARER authentication is the recommended authentication method.

SASL/OAUTHBEARER authentication uses token-based credentials exchange, which means the client never shares its credentials with the Kafka instance. Many standard Kafka clients have in-built support for SASL/OAUTHBEARER authentication.

If a client doesn’t support SASL/OAUTHBEARER authentication, you can use SASL/PLAIN authentication. For SASL/PLAIN authentication, the connection details include the client ID and client secret created for the service account and the bootstrap server URL.

Monitor and configure consumer groups

When you configure an application client to access Kafka, you can assign a group ID to associate the consumer with a consumer group. All consumers with the same group ID belong to the consumer group with that ID.

Use OpenShift Streams for Apache Kafka components to check the status of consumer groups that access a particular Kafka instance.

Check consumer groups in the web console

View information about the consumer groups in the web console. For each consumer group, you can check the following information:

  • The total number of active members

  • The total number of partitions with consumer lag

Consumer lag indicates a delay between the last message added to a partition and the message currently being picked up by the consumer subscribed to that partition.

You can view details of the consumers receiving messages from each partition in the topic. For each consumer group, you can check the total number of unconsumed partitions. If a partition is not being consumed, it can indicate that a consumer subscribed to the topic is down or the partition is empty.

Track offset positions for consumers

You can also track a consumer’s position in a partition through the following offset information:

  • Current offset is current offset number for the consumer in the partition log.

  • Log end offset is the current offset number for the producer in the partition log.

  • Offset lag is the difference between the consumer and producer offset positions in the log.

Consumer lag reflects the position of the consumer offset in relation to the end of the partition log. This difference is sometimes referred to as the delta between the producer offset and consumer offset, which are the read and write positions in the Kafka broker topic partitions. And it’s a particularly important metric. For applications that rely on the processing of (near) real-time data, it’s critical to monitor consumer lag to check that it doesn’t become too big. The greater the lag becomes, the further the process moves from the real-time processing objective. Lag is often reduced by adding new consumers to a group.

Image of consumer lag shown from partition offset positions
Figure 6. Consumer lag between the producer and consumer offset

You can use OpenShift Streams for Apache Kafka components to manage consumer groups. After you stop the consumers in a group, you can delete the group, or reset consumer offsets. A reset changes the position from which consumers read the message log of a topic partition. For example, you can reset offsets so that new consumers fetch messages from the start of a message log rather than fetching the latest message.

View metrics on resource utilization

When you select a Kafka instance in the web console, use the Dashboard page to view metrics for Kafka instances and topics. The metrics provide information for monitoring of resource utilization.

You can check the disk space used by Kafka brokers to store message data over a specified period. You can also check data traffic by viewing the total amount of data sent to and consumed from topics. The metrics provide insights that can help you tune the performance of your Kafka instances.

You can also use the OpenShift Streams for Apache Kafka Management API to get metrics from the Kafka instance.

Bind OpenShift Dedicated-based applications to the service

Red Hat OpenShift Dedicated is an enterprise Kubernetes platform managed and supported by Red Hat. OpenShift Dedicated removes the operational complexity of running and maintaining OpenShift on a cloud provider.

If you’re running an OpenShift Dedicated-based client application, you can use the Red Hat Service Binding Operator to bind the application from a given namespace to the OpenShift Streams for Apache Kafka service.

Use the rhoas CLI rhoas cluster connect command to complete the following actions:

  • Create a service account and mount it as a secret into your cluster.

  • Create a Kafka Request object to create a ServiceBinding object using the Service Binding operator.

Get OpenShift Streams for Apache Kafka

You can purchase Red Hat OpenShift Streams for Apache Kafka either as a pay-as-you-go subscription, or as a pre-paid subscription. For more information and for pricing details, go to https://console.redhat.com/application-services/streams/overview.

To try OpenShift Streams for Apache Kafka, you can create a no-cost trial instance. To learn more, go to console.redhat.com.