Chapter 2. Distributed tracing architecture

2.1. Distributed tracing architecture

Every time a user takes an action in an application, a request is executed by the architecture that may require dozens of different services to participate to produce a response. Red Hat OpenShift distributed tracing platform lets you perform distributed tracing, which records the path of a request through various microservices that make up an application.

Distributed tracing is a technique that is used to tie the information about different units of work together — usually executed in different processes or hosts — to understand a whole chain of events in a distributed transaction. Developers can visualize call flows in large microservice architectures with distributed tracing. It is valuable for understanding serialization, parallelism, and sources of latency.

Red Hat OpenShift distributed tracing platform records the execution of individual requests across the whole stack of microservices, and presents them as traces. A trace is a data/execution path through the system. An end-to-end trace is comprised of one or more spans.

A span represents a logical unit of work in Red Hat OpenShift distributed tracing platform that has an operation name, the start time of the operation, and the duration, as well as potentially tags and logs. Spans may be nested and ordered to model causal relationships.

2.1.1. Distributed tracing overview

As a service owner, you can use distributed tracing to instrument your services to gather insights into your service architecture. You can use the Red Hat OpenShift distributed tracing platform for monitoring, network profiling, and troubleshooting the interaction between components in modern, cloud-native, microservices-based applications.

With the distributed tracing platform, you can perform the following functions:

  • Monitor distributed transactions
  • Optimize performance and latency
  • Perform root cause analysis

The distributed tracing platform consists of three components:

  • Red Hat OpenShift distributed tracing platform (Jaeger), which is based on the open source Jaeger project.
  • Red Hat OpenShift distributed tracing platform (Tempo), which is based on the open source Grafana Tempo project.
  • Red Hat build of OpenTelemetry, which is based on the open source OpenTelemetry project.

2.1.2. Red Hat OpenShift distributed tracing platform features

Red Hat OpenShift distributed tracing platform provides the following capabilities:

  • Integration with Kiali – When properly configured, you can view distributed tracing platform data from the Kiali console.
  • High scalability – The distributed tracing platform back end is designed to have no single points of failure and to scale with the business needs.
  • Distributed Context Propagation – Enables you to connect data from different components together to create a complete end-to-end trace.
  • Backwards compatibility with Zipkin – Red Hat OpenShift distributed tracing platform has APIs that enable it to be used as a drop-in replacement for Zipkin, but Red Hat is not supporting Zipkin compatibility in this release.

2.1.3. Red Hat OpenShift distributed tracing platform architecture

Red Hat OpenShift distributed tracing platform is made up of several components that work together to collect, store, and display tracing data.

  • Red Hat OpenShift distributed tracing platform (Jaeger) - This component is based on the open source Jaeger project.

    • Client (Jaeger client, Tracer, Reporter, instrumented application, client libraries)- The distributed tracing platform (Jaeger) clients are language-specific implementations of the OpenTracing API. They can be used to instrument applications for distributed tracing either manually or with a variety of existing open source frameworks, such as Camel (Fuse), Spring Boot (RHOAR), MicroProfile (RHOAR/Thorntail), Wildfly (EAP), and many more, that are already integrated with OpenTracing.
    • Agent (Jaeger agent, Server Queue, Processor Workers) - The distributed tracing platform (Jaeger) agent is a network daemon that listens for spans sent over User Datagram Protocol (UDP), which it batches and sends to the Collector. The agent is meant to be placed on the same host as the instrumented application. This is typically accomplished by having a sidecar in container environments such as Kubernetes.
    • Jaeger Collector (Collector, Queue, Workers) - Similar to the Jaeger agent, the Jaeger Collector receives spans and places them in an internal queue for processing. This allows the Jaeger Collector to return immediately to the client/agent instead of waiting for the span to make its way to the storage.
    • Storage (Data Store) - Collectors require a persistent storage backend. Red Hat OpenShift distributed tracing platform (Jaeger) has a pluggable mechanism for span storage. Note that for this release, the only supported storage is Elasticsearch.
    • Query (Query Service) - Query is a service that retrieves traces from storage.
    • Ingester (Ingester Service) - Red Hat OpenShift distributed tracing platform can use Apache Kafka as a buffer between the Collector and the actual Elasticsearch backing storage. Ingester is a service that reads data from Kafka and writes to the Elasticsearch storage backend.
    • Jaeger Console – With the Red Hat OpenShift distributed tracing platform (Jaeger) user interface, you can visualize your distributed tracing data. On the Search page, you can find traces and explore details of the spans that make up an individual trace.
  • Red Hat OpenShift distributed tracing platform (Tempo) - This component is based on the open source Grafana Tempo project.

    • Gateway – The Gateway handles authentication, authorization, and forwarding requests to the Distributor or Query front-end service.
    • Distributor – The Distributor accepts spans in multiple formats including Jaeger, OpenTelemetry, and Zipkin. It routes spans to Ingesters by hashing the traceID and using a distributed consistent hash ring.
    • Ingester – The Ingester batches a trace into blocks, creates bloom filters and indexes, and then flushes it all to the back end.
    • Query Frontend – The Query Frontend is responsible for sharding the search space for an incoming query. The search query is then sent to the Queriers. The Query Frontend deployment exposes the Jaeger UI through the Tempo Query sidecar.
    • Querier - The Querier is responsible for finding the requested trace ID in either the Ingesters or the back-end storage. Depending on parameters, it can query the Ingesters and pull Bloom indexes from the back end to search blocks in object storage.
    • Compactor – The Compactors stream blocks to and from the back-end storage to reduce the total number of blocks.
  • Red Hat build of OpenTelemetry - This component is based on the open source OpenTelemetry project.

    • OpenTelemetry Collector - The OpenTelemetry Collector is a vendor-agnostic way to receive, process, and export telemetry data. The OpenTelemetry Collector supports open-source observability data formats, for example, Jaeger and Prometheus, sending to one or more open-source or commercial back-ends. The Collector is the default location instrumentation libraries export their telemetry data.

2.1.4. Additional resources