Introduction to Red Hat OpenShift Streams for Apache Kafka

Guide
  • Red Hat OpenShift Streams for Apache Kafka 1
  • Updated 03 September 2021
  • Published 14 April 2021

Introduction to Red Hat OpenShift Streams for Apache Kafka

Guide
Red Hat OpenShift Streams for Apache Kafka 1
  • Updated 03 September 2021
  • Published 14 April 2021

Red Hat OpenShift Streams for Apache Kafka is currently available for Development Preview. Development Preview releases provide early access to a limited set of features that might not be fully tested and that might change in the final GA version. Users are discouraged from using Development Preview software in production or for business-critical workloads. Limited documentation is available for Development Preview releases and is typically focused on fundamental user goals.

Discover the features and functions available in the Red Hat OpenShift Streams for Apache Kafka cloud service.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code and documentation. We are beginning with these four terms: master, slave, blacklist, and whitelist. Due to the enormity of this endeavor, these changes will be gradually implemented over upcoming releases. For more details on making our language more inclusive, see our CTO Chris Wright’s message.

What is OpenShift Streams for Apache Kafka?

Red Hat OpenShift Streams for Apache Kafka is a cloud service that simplifies the process of running Apache Kafka. Apache Kafka is an open-source, distributed, publish-subscribe messaging system for creating fault-tolerant, real-time data feeds.

Your Red Hat account gives you access to OpenShift Streams for Apache Kafka. Within minutes, you’ll have a Kafka instance up and running, and be ready to start defining the Kafka configuration you need.

Understanding Kafka instances

A Kafka instance comprises a Kafka cluster with multiple brokers. Brokers contain topics that receive and store data. Topics are split by partitions, which are distributed and replicated across brokers for fault tolerance and increased throughput.

To understand how a Kafka instance operates as a message broker, it’s important to understand the key concepts described here.

Kafka cluster
A Kafka cluster is a group of Kafka brokers, ZooKeeper instances, and management components.
Broker
A broker, sometimes referred to as a server or node, contains topics and orchestrates the storage and passing of messages.
Topic
A topic provides a destination for the storage of data. Each topic is split into one or more partitions.
Partition
A partition is a subset of a topic. Partitions are used for data sharding and replication. The number of topic partitions is defined by a topic partition count.
Partition leader
A partition leader handles all producer requests for a given partition.
Partition follower
A partition follower replicates the partition data of a partition leader, optionally handling consumer requests.
Replication factor
Topics use a replication factor to configure the number of replicas of each partition within the cluster. A topic comprises at least one partition. In OpenShift Streams for Apache Kafka, the replication factor is 3.
In-sync replicas
Kafka elects a partition leader to handle all producer requests for a partition. Partition followers on other brokers replicate the partition data of the partition leader. Each in-sync replica has the same number of messages as the leader. If the leader partition fails, Kafka chooses an in-sync replica as the new leader.

In your configuration, you can define how many replicas must be in sync to be able to produce messages. A message is committed only after it’s been successfully copied to the replica partition. In this way, if the leader fails, the message is not lost.

In OpenShift Streams for Apache Kafka, the minimum number of in-sync replicas is 2. Here we can see in the diagram that each numbered partition has a leader and two followers in replicated topics.

Image of Kafka cluster with replicated partitions
Figure 1. Kafka topics with replicated partitions

Kafka producers and consumers

Producers and consumers send and receive (publish and subscribe) messages through Kafka brokers. Apache Kafka clients act as producers, consumers, or both.

Producer
A producer sends messages to a broker topic to be written to a partition. Messages are written to partitions either on a round-robin basis or to a specific partition based on a message key. A round-robin approach distributes messages equally across partitions.
Consumer
A consumer subscribes to a topic and reads messages according to partition and offset.
Consumer group
Consumer groups are typically used to share a large data stream generated by multiple producers from a given topic. Consumers are grouped using a group ID, allowing messages to be spread across the members. Consumers within a group don’t read data from the same partition, but can receive data from one or more partitions.
Image of Kafka producers and consumers interacting with Kafka
Figure 2. Producers and consumers interacting with the Kafka broker
Offsets
Offsets allow consumers to track their position in each partition. Each message in a given partition has a unique offset. Offsets help identify the position of a consumer within the partition to track the number of records that have been consumed.

The producer offset at the head of the partition shows the read position. The consumer offset position shows the write position.

Image of Kafka producers and consumers writing and reading using offsets
Figure 3. Producing and consuming data from a partition

When a record is consumed, the consumer’s position in the partition is updated by committing an offset. Kafka’s __consumer_offsets topic stores information on the latest committed offset for each partition according to the consumer group.

Kafka capabilities

Kafka’s underlying data stream-processing capabilities and component architecture delivers the following key capabilities:

  • Extremely high throughput and low latency data sharing between microservices and other applications

  • Message ordering guarantees

  • Message rewind/replay from data storage to reconstruct an application state

  • Message compaction using keys to identify messages and retain only the most recent

  • Horizontal scalability in a cluster configuration

  • Replication of data to control fault tolerance

  • Retention of high volumes of data for immediate access

Kafka use cases

Kafka’s capabilities make it suitable for the following use cases:

  • Event-driven architectures

  • Event sourcing to capture changes to the state of an application as a log of events

  • Message brokering

  • Website activity tracking

  • Operational monitoring through metrics

  • Log collection and aggregation

  • Commit logs for distributed systems

  • Stream processing so that applications can respond to data in real time

Additional resources

How OpenShift Streams for Apache Kafka supports Kafka

OpenShift Streams for Apache Kafka puts you in control of setting up Kafka. As well as managing the OpenShift Streams for Apache Kafka service through the web console, you can also download and use a dedicated command-line interface (CLI), or use the publicly available REST APIs for provisioning and administration tasks.

Use the OpenShift Streams for Apache Kafka components to quickly and easily create new Kafka instances and manage those instances.

The components simplify the process of:

  • Running Kafka instances

  • Managing brokers

  • Creating and managing topics

  • Configuring access to Kafka

  • Securing access to Kafka

After the Kafka instance is configured, you can generate connection details. You use the connection details to connect the Kafka client applications and Kafka utilities that will produce and consume messages from the Kafka instance.

Image of OpenShift Streams for Apache Kafka components interacting with Kafka
Figure 4. OpenShift Streams for Apache Kafka components for managing Kafka

Web console

Use the web console to:

  • Create and manage Kafka instances and topics

  • Create and manage a service account to connect to Kafka

  • View the status and configuration of your Kafka deployment before you make updates

You can view the status of a Kafka instance before navigating to the page where you manage your topics or view information on the consumer groups connected to the Kafka instance.

You add the credentials generated for a service account to client applications so that they can connect to the Kafka instance.

Quick starts in the console guide you through common tasks.

Image of OpenShift Streams for Apache Kafka console showing quick starts
Figure 5. OpenShift Streams for Apache Kafka web console

CLI tool

Download and use the rhoas command-line interface (CLI) tool to manage your OpenShift Streams for Apache Kafka service from the command line.

Use the rhoas CLI tool to perform the same operations that are available in the console. To manage your Kafka resources, you can use create, update, and delete commands. To view information on resources, you can use status, list, and describe commands.

The syntax for the CLI tool is easy to understand, which makes it intuitive to use. For example, you might use the topic commands to create a Kafka topic, and update the topic after viewing information on its configuration.

Command to create a Kafka topic
rhoas kafka topic create topic-1
Command to view the details of a specific Kafka topic
rhoas kafka topic describe topic-1

This update command changes the number of partitions, the message retention period, and the number of replicas for the topic:

Command to update the configuration of a Kafka topic
rhoas kafka topic update topic-3 --retention-ms 36000 --partitions 3

REST API

OpenShift Streams for Apache Kafka provides a RESTful interface to allow HTTP-based interactions with the service. The Kafka Service Fleet Manager REST API is available from api.openshift.com.

Endpoints provide access to Kafka and service account resources. Methods are available to perform actions on those resources. For example, you can use the API to retrieve a list of Kafka instances, or create a new service account.

Curl request to return a list of Kafka instances
curl -X GET "https://api.openshift.com/api/managed-services-api/v1/kafkas" -H "accept: application/json"
Curl request to create a new service account
curl -X POST "https://api.openshift.com/api/managed-services-api/v1/serviceaccounts" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"$ref\":\"#/components/examples/ServiceAccountRequestExample\"}"

Guided help with quick starts

The web console has tours and quick starts that help to guide you through Kafka-related tasks.

  • Take a tour to learn how to use the service, and how to navigate the console to find information.

  • Step through a quick start to perform a specific task.

Each quick start has a clearly defined set of goals, and each goal contains the steps you need to accomplish that goal. For example, you might want to begin with the Getting Started with Red Hat OpenShift Streams for Apache Kafka quick start to learn how to create and inspect a Kafka instance, and then create a topic for that instance.

When you have a Kafka instance running, and topics to store your messages, the Using Kafka bin scripts with Red Hat OpenShift Streams for Apache Kafka quick start shows you how to use example Kafka producer and consumer clients to start producing and consuming messages in minutes.

How to use OpenShift Streams for Apache Kafka

Use OpenShift Streams for Apache Kafka to create, manage and check the status of Kafka instances, topics, and consumer groups. You can perform these activities from the console, the rhoas CLI, or REST API.

Create a Kafka instance and a service account that provides the details for connecting to that Kafka instance. With the Kafka instance running, create and manage topics. Use the service account connection details to handle user requests from Kafka client applications to produce or consume messages from those topics.

Create Kafka instances

When you first use the OpenShift Streams for Apache Kafka service, you’ll begin by creating a Kafka instance. After the instance is created, you’re ready to create topics.

To send or consume messages to topics, you can look up the bootstrap server and generate a service account that can be used by an application to connect to the Kafka instance.

Create service accounts

A service account is like a user account, but for client applications. You create a service account to allow client applications to connect to your Kafka instance. When you first create a service account, the credentials required for connection — a client ID and client secret — are generated for the account.

You can reset the credentials or delete the service account at any time. However, you can only access the client ID and secret when you first create a service account. Make sure to record these details during the creation process for later use when connecting producer or consumer applications to the Kafka instance.

Create and configure topics

When you create a topic, you set core configuration, such as:

  • The number of topic partitions

  • Size-based and time-based message retention policies

You can also define configuration for:

  • Message size and compression type

  • Log indexing, and cleanup and flushing of old data

From the web console, you can view all the topics created for a Kafka instance. You can then select and delete any topics.

For topic replication, partition leader elections can be clean or unclean. OpenShift Streams for Apache Kafka allows only clean leader election, which means that out-of-sync replicas cannot become leaders. If no in-sync replicas are available, Kafka waits until the original leader is back online before messages are picked up again.

Set up client access to Kafka instances

You can set up clients and utilities to produce and consume messages from your Kafka instances. The client must be configured with the connection details required to make a secure connection.

OpenShift Streams for Apache Kafka provides both SASL/OAUTHBEARER and SASL/PLAIN for client authentication.

Both mechanisms allow clients to establish authenticated sessions with the Kafka instance. SASL/OAUTHBEARER authentication is the recommended authentication method.

SASL/OAUTHBEARER authentication uses token-based credentials exchange, which means the client never shares its credentials with the Kafka instance. Support for SASL/OAUTHBEARER authentication is built in to many standard Kafka clients.

If a client doesn’t support SASL/OAUTHBEARER authentication, you can use SASL/PLAIN authentication. For SASL/PLAIN authentication, the connection details include the client ID and client secret created for the service account and the bootstrap server URL.

Monitor and configure consumer groups

When you configure an application client to access Kafka, you can assign a group ID to associate the consumer with a consumer group. All consumers with the same group ID belong to the consumer group with that ID.

Use OpenShift Streams for Apache Kafka components to check the status of consumer groups that access a particular Kafka instance.

Check consumer groups in the web console

View information on the consumer groups in the web console. For each consumer group, you can check:

  • The total number of active members

  • The total number of partitions with consumer lag

Consumer lag indicates a delay between the last message added to a partition and the message currently being picked up by the consumer subscribed to that partition.

If you select a specific consumer group in the console, you can view details of the consumers receiving messages from each partition in the topic. For each consumer group, you can check the total number of unconsumed partitions. If a partition is not being consumed, it can indicate that a consumer subscribed to the topic is down or the partition is empty.

Track offset positions for consumers

You can also track a consumer’s position in a partition through offset information:

  • Current offset is current offset number for the consumer in the partition log.

  • Log end offset is the current offset number for the producer in the partition log.

  • Offset lag is the difference between the consumer and producer offset positions in the log.

Consumer lag reflects the position of the consumer offset in relation to the end of the partition log. This difference is sometimes referred to as the delta between the producer offset and consumer offset, which are the read and write positions in the Kafka broker topic partitions. And it’s a particularly important metric. For applications that rely on the processing of (near) real-time data, it’s critical to monitor consumer lag to check that it doesn’t become too big. The greater the lag becomes, the further the process moves from the real-time processing objective. Lag is often reduced by adding new consumers to a group.

Image of consumer lag shown from partition offset positions
Figure 6. Consumer lag between the producer and consumer offset

Bind OpenShift Dedicated-based applications to the service

Red Hat OpenShift Dedicated is an enterprise Kubernetes platform managed and supported by Red Hat. OpenShift Dedicated removes the operational complexity of running and maintaining OpenShift on a cloud provider.

If you’re running an OpenShift Dedicated-based client application, you can use the Red Hat Service Binding Operator to bind the application from a given namespace to the OpenShift Streams for Apache Kafka service.

Use the rhoas CLI rhoas cluster connect command to:

  • Create a service account and mount it as a secret into your cluster

  • Create a Kafka Request object to create a ServiceBinding object using the Service Binding operator

Try OpenShift Streams for Apache Kafka

To try OpenShift Streams for Apache Kafka, you can create a no-cost preview instance. To learn more, go to console.redhat.com.