Chapter 28. Network Observability

28.1. Network Observability Operator release notes

The Network Observability Operator enables administrators to observe and analyze network traffic flows for OpenShift Container Platform clusters.

These release notes track the development of the Network Observability Operator in the OpenShift Container Platform.

For an overview of the Network Observability Operator, see About Network Observability Operator.

28.1.1. Network Observability Operator 1.3.0

The following advisory is available for the Network Observability Operator 1.3.0:

28.1.1.1. Channel deprecation

You must switch your channel from v1.0.x to stable to receive future Operator updates. The v1.0.x channel is deprecated and planned for removal in the next release.

28.1.1.2. New features and enhancements

28.1.1.2.1. Multi-tenancy in Network Observability
28.1.1.2.2. Flow-based metrics dashboard
  • This release adds a new dashboard, which provides an overview of the network flows in your OpenShift Container Platform cluster. For more information, see Network Observability metrics.
28.1.1.2.3. Troubleshooting with the must-gather tool
  • Information about the Network Observability Operator can now be included in the must-gather data for troubleshooting. For more information, see Network Observability must-gather.
28.1.1.2.4. Multiple architectures now supported
  • Network Observability Operator can now run on an amd64, ppc64le, or arm64 architecture. Previously, it only ran on amd64.

28.1.1.3. Deprecated features

28.1.1.3.1. Deprecated configuration parameter setting

The release of Network Observability Operator 1.3 deprecates the spec.Loki.authToken HOST setting. When using the Loki Operator, you must now only use the FORWARD setting.

28.1.1.4. Bug fixes

  • Previously, when the Operator was installed from the CLI, the Role and RoleBinding that are necessary for the Cluster Monitoring Operator to read the metrics were not installed as expected. The issue did not occur when the operator was installed from the web console. Now, either way of installing the Operator installs the required Role and RoleBinding. (NETOBSERV-1003)
  • Since version 1.2, the Network Observability Operator can raise alerts when a problem occurs with the flows collection. Previously, due to a bug, the related configuration to disable alerts, spec.processor.metrics.disableAlerts was not working as expected and sometimes ineffectual. Now, this configuration is fixed so that it is possible to disable the alerts. (NETOBSERV-976)
  • Previously, when Network Observability was configured with spec.loki.authToken set to DISABLED, only a kubeadmin cluster administrator was able to view network flows. Other types of cluster administrators received authorization failure. Now, any cluster administrator is able to view network flows. (NETOBSERV-972)
  • Previously, a bug prevented users from setting spec.consolePlugin.portNaming.enable to false. Now, this setting can be set to false to disable port-to-service name translation. (NETOBSERV-971)
  • Previously, the metrics exposed by the console plugin were not collected by the Cluster Monitoring Operator (Prometheus), due to an incorrect configuration. Now the configuration has been fixed so that the console plugin metrics are correctly collected and accessible from the OpenShift Container Platform web console. (NETOBSERV-765)
  • Previously, when processor.metrics.tls was set to AUTO in the FlowCollector, the flowlogs-pipeline servicemonitor did not adapt the appropriate TLS scheme, and metrics were not visible in the web console. Now the issue is fixed for AUTO mode. (NETOBSERV-1070)
  • Previously, certificate configuration, such as used for Kafka and Loki, did not allow specifying a namespace field, implying that the certificates had to be in the same namespace where Network Observability is deployed. Moreover, when using Kafka with TLS/mTLS, the user had to manually copy the certificate(s) to the privileged namespace where the eBPF agent pods are deployed and manually manage certificate updates, such as in the case of certificate rotation. Now, Network Observability setup is simplified by adding a namespace field for certificates in the FlowCollector resource. As a result, users can now install Loki or Kafka in different namespaces without needing to manually copy their certificates in the Network Observability namespace. The original certificates are watched so that the copies are automatically updated when needed. (NETOBSERV-773)
  • Previously, the SCTP, ICMPv4 and ICMPv6 protocols were not covered by the Network Observability agents, resulting in a less comprehensive network flows coverage. These protocols are now recognized to improve the flows coverage. (NETOBSERV-934)

28.1.1.5. Known issue

  • When processor.metrics.tls is set to PROVIDED in the FlowCollector, the flowlogs-pipeline servicemonitor is not adapted to the TLS scheme. (NETOBSERV-1087)

28.1.2. Network Observability Operator 1.2.0

The following advisory is available for the Network Observability Operator 1.2.0:

28.1.2.1. Preparing for the next update

The subscription of an installed Operator specifies an update channel that tracks and receives updates for the Operator. Until the 1.2 release of the Network Observability Operator, the only channel available was v1.0.x. The 1.2 release of the Network Observability Operator introduces the stable update channel for tracking and receiving updates. You must switch your channel from v1.0.x to stable to receive future Operator updates. The v1.0.x channel is deprecated and planned for removal in a following release.

28.1.2.2. New features and enhancements

28.1.2.2.1. Histogram in Traffic Flows view
  • You can now choose to show a histogram bar chart of flows over time. The histogram enables you to visualize the history of flows without hitting the Loki query limit. For more information, see Using the histogram.
28.1.2.2.2. Conversation tracking
  • You can now query flows by Log Type, which enables grouping network flows that are part of the same conversation. For more information, see Working with conversations.
28.1.2.2.3. Network Observability health alerts
  • The Network Observability Operator now creates automatic alerts if the flowlogs-pipeline is dropping flows because of errors at the write stage or if the Loki ingestion rate limit has been reached. For more information, see Viewing health information.

28.1.2.3. Bug fixes

  • Previously, after changing the namespace value in the FlowCollector spec, eBPF Agent pods running in the previous namespace were not appropriately deleted. Now, the pods running in the previous namespace are appropriately deleted. (NETOBSERV-774)
  • Previously, after changing the caCert.name value in the FlowCollector spec (such as in Loki section), FlowLogs-Pipeline pods and Console plug-in pods were not restarted, therefore they were unaware of the configuration change. Now, the pods are restarted, so they get the configuration change. (NETOBSERV-772)
  • Previously, network flows between pods running on different nodes were sometimes not correctly identified as being duplicates because they are captured by different network interfaces. This resulted in over-estimated metrics displayed in the console plug-in. Now, flows are correctly identified as duplicates, and the console plug-in displays accurate metrics. (NETOBSERV-755)
  • The "reporter" option in the console plug-in is used to filter flows based on the observation point of either source node or destination node. Previously, this option mixed the flows regardless of the node observation point. This was due to network flows being incorrectly reported as Ingress or Egress at the node level. Now, the network flow direction reporting is correct. The "reporter" option filters for source observation point, or destination observation point, as expected. (NETOBSERV-696)
  • Previously, for agents configured to send flows directly to the processor as gRPC+protobuf requests, the submitted payload could be too large and is rejected by the processors' GRPC server. This occurred under very-high-load scenarios and with only some configurations of the agent. The agent logged an error message, such as: grpc: received message larger than max. As a consequence, there was information loss about those flows. Now, the gRPC payload is split into several messages when the size exceeds a threshold. As a result, the server maintains connectivity. (NETOBSERV-617)

28.1.2.4. Known issue

  • In the 1.2.0 release of the Network Observability Operator, using Loki Operator 5.6, a Loki certificate transition periodically affects the flowlogs-pipeline pods and results in dropped flows rather than flows written to Loki. The problem self-corrects after some time, but it still causes temporary flow data loss during the Loki certificate transition. (NETOBSERV-980)

28.1.2.5. Notable technical changes

  • Previously, you could install the Network Observability Operator using a custom namespace. This release introduces the conversion webhook which changes the ClusterServiceVersion. Because of this change, all the available namespaces are no longer listed. Additionally, to enable Operator metrics collection, namespaces that are shared with other Operators, like the openshift-operators namespace, cannot be used. Now, the Operator must be installed in the openshift-netobserv-operator namespace. You cannot automatically upgrade to the new Operator version if you previously installed the Network Observability Operator using a custom namespace. If you previously installed the Operator using a custom namespace, you must delete the instance of the Operator that was installed and re-install your operator in the openshift-netobserv-operator namespace. It is important to note that custom namespaces, such as the commonly used netobserv namespace, are still possible for the FlowCollector, Loki, Kafka, and other plug-ins. (NETOBSERV-907)(NETOBSERV-956)

28.1.3. Network Observability Operator 1.1.0

The following advisory is available for the Network Observability Operator 1.1.0:

The Network Observability Operator is now stable and the release channel is upgraded to v1.1.0.

28.1.3.1. Bug fix

  • Previously, unless the Loki authToken configuration was set to FORWARD mode, authentication was no longer enforced, allowing any user who could connect to the OpenShift Container Platform console in an OpenShift Container Platform cluster to retrieve flows without authentication. Now, regardless of the Loki authToken mode, only cluster administrators can retrieve flows. (BZ#2169468)

28.2. About Network Observability

Red Hat offers cluster administrators the Network Observability Operator to observe the network traffic for OpenShift Container Platform clusters. The Network Observability Operator uses the eBPF technology to create network flows. The network flows are then enriched with OpenShift Container Platform information and stored in Loki. You can view and analyze the stored network flows information in the OpenShift Container Platform console for further insight and troubleshooting.

28.2.1. Dependency of Network Observability Operator

The Network Observability Operator requires the following Operators:

  • Loki: You must install Loki. Loki is the backend that is used to store all collected flows. It is recommended to install Loki by installing the Red Hat Loki Operator for the installation of Network Observability Operator.

28.2.2. Optional dependencies of the Network Observability Operator

  • Grafana: You can install Grafana for using custom dashboards and querying capabilities, by using the Grafana Operator. Red Hat does not support Grafana Operator.
  • Kafka: It provides scalability, resiliency and high availability in the OpenShift Container Platform cluster. It is recommended to install Kafka using the AMQ Streams operator for large scale deployments.

28.2.3. Network Observability Operator

The Network Observability Operator provides the Flow Collector API custom resource definition. A Flow Collector instance is created during installation and enables configuration of network flow collection. The Flow Collector instance deploys pods and services that form a monitoring pipeline where network flows are then collected and enriched with the Kubernetes metadata before storing in Loki. The eBPF agent, which is deployed as a daemonset object, creates the network flows.

28.2.4. OpenShift Container Platform console integration

OpenShift Container Platform console integration offers overview, topology view and traffic flow tables.

28.2.4.1. Network Observability metrics

The OpenShift Container Platform console offers the Overview tab which displays the overall aggregated metrics of the network traffic flow on the cluster. The information can be displayed by node, namespace, owner, pod, and service. Filters and display options can further refine the metrics.

In ObserveDashboards, the Netobserv dashboard provides a quick overview of the network flows in your OpenShift Container Platform cluster. You can view distillations of the network traffic metrics in the following categories:

  • Top flow rates per source and destination namespaces (1-min rates)
  • Top byte rates emitted per source and destination nodes (1-min rates)
  • Top byte rates received per source and destination nodes (1-min rates)
  • Top byte rates emitted per source and destination workloads (1-min rates)
  • Top byte rates received per source and destination workloads (1-min rates)
  • Top packet rates emitted per source and destination workloads (1-min rates)
  • Top packet rates received per source and destination workloads (1-min rates)

You can configure the FlowCollector spec.processor.metrics to add or remove metrics by changing the ignoreTags list. For more information about available tags, see the Flow Collector API Reference

Also in ObserveDashboards, the Netobserv/Health dashboard provides metrics about the health of the Operator.

28.2.4.2. Network Observability topology views

The OpenShift Container Platform console offers the Topology tab which displays a graphical representation of the network flows and the amount of traffic. The topology view represents traffic between the OpenShift Container Platform components as a network graph. You can refine the graph by using the filters and display options. You can access the information for node, namespace, owner, pod, and service.

28.2.4.3. Traffic flow tables

The traffic flow table view provides a view for raw flows, non aggregated filtering options, and configurable columns. The OpenShift Container Platform console offers the Traffic flows tab which displays the data of the network flows and the amount of traffic.

28.3. Installing the Network Observability Operator

Installing Loki is a prerequisite for using the Network Observability Operator. It is recommended to install Loki using the Loki Operator; therefore, these steps are documented below prior to the Network Observability Operator installation.

The Loki Operator integrates a gateway that implements multi-tenancy & authentication with Loki for data flow storage. The LokiStack resource manages Loki, which is a scalable, highly-available, multi-tenant log aggregation system, and a web proxy with OpenShift Container Platform authentication. The LokiStack proxy uses OpenShift Container Platform authentication to enforce multi-tenancy and facilitate the saving and indexing of data in Loki log stores.

Note

The Loki Operator can also be used for Logging with the LokiStack. The Network Observability Operator requires a dedicated LokiStack separate from Logging.

28.3.1. Installing the Loki Operator

It is recommended to install Loki Operator version 5.7, This version provides the ability to create a LokiStack instance using the openshift-network tenant configuration mode. It also provides fully automatic, in-cluster authentication and authorization support for Network Observability.

Prerequisites

  • Supported Log Store (AWS S3, Google Cloud Storage, Azure, Swift, Minio, OpenShift Data Foundation)
  • OpenShift Container Platform 4.10+.
  • Linux Kernel 4.18+.

There are several ways you can install Loki. One way you can install the Loki Operator is by using the OpenShift Container Platform web console Operator Hub.

Procedure

  1. Install the Loki Operator Operator:

    1. In the OpenShift Container Platform web console, click OperatorsOperatorHub.
    2. Choose Loki Operator from the list of available Operators, and click Install.
    3. Under Installation Mode, select All namespaces on the cluster.
    4. Verify that you installed the Loki Operator. Visit the OperatorsInstalled Operators page and look for Loki Operator.
    5. Verify that Loki Operator is listed with Status as Succeeded in all the projects.
  2. Create a Secret YAML file. You can create this secret in the web console or CLI.

    1. Using the web console, navigate to the ProjectAll Projects dropdown and select Create Project. Name the project netobserv and click Create.
    2. Navigate to the Import icon ,+, in the top right corner. Drop your YAML file into the editor. It is important to create this YAML file in the netobserv namespace that uses the access_key_id and access_key_secret to specify your credentials.
    3. Once you create the secret, you should see it listed under WorkloadsSecrets in the web console.

      The following shows an example secret YAML file:

apiVersion: v1
kind: Secret
metadata:
  name: loki-s3
  namespace: netobserv
stringData:
  access_key_id: QUtJQUlPU0ZPRE5ON0VYQU1QTEUK
  access_key_secret: d0phbHJYVXRuRkVNSS9LN01ERU5HL2JQeFJmaUNZRVhBTVBMRUtFWQo=
  bucketnames: s3-bucket-name
  endpoint: https://s3.eu-central-1.amazonaws.com
  region: eu-central-1
Important

To uninstall Loki, refer to the uninstallation process that corresponds with the method you used to install Loki. You might have remaining ClusterRoles and ClusterRoleBindings, data stored in object store, and persistent volume that must be removed.

28.3.1.1. Create a LokiStack custom resource

It is recommended to deploy the LokiStack in the same namespace referenced by the FlowCollector specification, spec.namespace. You can use the web console or CLI to create a namespace, or new project.

Procedure

  1. Navigate to OperatorsInstalled Operators, viewing All projects from the Project dropdown.
  2. Look for Loki Operator. In the details, under Provided APIs, select LokiStack.
  3. Click Create LokiStack.
  4. Ensure the following fields are specified in either Form View or YAML view:

      apiVersion: loki.grafana.com/v1
      kind: LokiStack
      metadata:
        name: loki
        namespace: netobserv
      spec:
        size: 1x.small
        storage:
          schemas:
          - version: v12
            effectiveDate: '2022-06-01'
          secret:
            name: loki-s3
            type: s3
        storageClassName: gp3  1
        tenants:
          mode: openshift-network
    1
    Use a storage class name that is available on the cluster for ReadWriteOnce access mode. You can use oc get storageclasses to see what is available on your cluster.
    Important

    You must not reuse the same LokiStack that is used for cluster logging.

  5. Click Create.
28.3.1.1.1. Deployment Sizing

Sizing for Loki follows the format of N<x>.<size> where the value <N> is the number of instances and <size> specifies performance capabilities.

Note

1x.extra-small is for demo purposes only, and is not supported.

Table 28.1. Loki Sizing

 1x.extra-small1x.small1x.medium

Data transfer

Demo use only.

500GB/day

2TB/day

Queries per second (QPS)

Demo use only.

25-50 QPS at 200ms

25-75 QPS at 200ms

Replication factor

None

2

3

Total CPU requests

5 vCPUs

36 vCPUs

54 vCPUs

Total Memory requests

7.5Gi

63Gi

139Gi

Total Disk requests

150Gi

300Gi

450Gi

28.3.1.2. LokiStack ingestion limits and health alerts

The LokiStack instance comes with default settings according to the configured size. It is possible to override some of these settings, such as the ingestion and query limits. You might want to update them if you get Loki errors showing up in the Console plugin, or in flowlogs-pipeline logs. An automatic alert in the web console notifies you when these limits are reached.

Here is an example of configured limits:

spec:
  limits:
    global:
      ingestion:
        ingestionBurstSize: 40
        ingestionRate: 20
        maxGlobalStreamsPerTenant: 25000
      queries:
        maxChunksPerQuery: 2000000
        maxEntriesLimitPerQuery: 10000
        maxQuerySeries: 3000

For more information about these settings, see the LokiStack API reference.

28.3.2. Configure authorization and multi-tenancy

Define ClusterRole and ClusterRoleBinding. The netobserv-reader ClusterRole enables multi-tenancy and allows individual user access, or group access, to the flows stored in Loki. You can create a YAML file to define these roles.

Procedure

  1. Using the web console, click the Import icon, +.
  2. Drop your YAML file into the editor and click Create:

Example ClusterRole reader yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: netobserv-reader    1
rules:
- apiGroups:
  - 'loki.grafana.com'
  resources:
  - network
  resourceNames:
  - logs
  verbs:
  - 'get'

1
This role can be used for multi-tenancy.

Example ClusterRole writer yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: netobserv-writer
rules:
- apiGroups:
  - 'loki.grafana.com'
  resources:
  - network
  resourceNames:
  - logs
  verbs:
  - 'create'

Example ClusterRoleBinding yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: netobserv-writer-flp
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: netobserv-writer
subjects:
- kind: ServiceAccount
  name: flowlogs-pipeline    1
  namespace: netobserv
- kind: ServiceAccount
  name: flowlogs-pipeline-transformer
  namespace: netobserv

1
The flowlogs-pipeline writes to Loki. If you are using Kafka, this value is flowlogs-pipeline-transformer.

28.3.3. Enable multi-tenancy in Network Observability

Multi-tenancy in the Network Observability Operator allows and restricts individual user access, or group access, to the flows stored in Loki. Access is enabled for project admins. Project admins who have limited access to some namespaces can access flows for only those namespaces.

Prerequisite

  • You have installed Loki Operator version 5.7
  • The FlowCollector spec.loki.authToken configuration must be set to FORWARD.
  • You must be logged in as a project administrator

Procedure

  1. Authorize reading permission to user1 by running the following command:

    $ oc adm policy add-cluster-role-to-user netobserv-reader user1

    Now, the data is restricted to only allowed user namespaces. For example, a user that has access to a single namespace can see all the flows internal to this namespace, as well as flows going from and to this namespace. Project admins have access to the Administrator perspective in the OpenShift Container Platform console to access the Network Flows Traffic page.

28.3.4. Installing Kafka (optional)

The Kafka Operator is supported for large scale environments. You can install the Kafka Operator as Red Hat AMQ Streams from the Operator Hub, just as the Loki Operator and Network Observability Operator were installed.

Note

To uninstall Kafka, refer to the uninstallation process that corresponds with the method you used to install.

28.3.5. Installing the Network Observability Operator

You can install the Network Observability Operator using the OpenShift Container Platform web console Operator Hub. When you install the Operator, it provides the FlowCollector custom resource definition (CRD). You can set specifications in the web console when you create the FlowCollector.

Prerequisites

  • Installed Loki. It is recommended to install Loki using the Loki Operator version 5.7.
  • One of the following supported architectures is required: amd64, ppc64le, arm64, or s390x.
  • Any CPU supported by Red Hat Enterprise Linux (RHEL) 9
Note

This documentation assumes that your LokiStack instance name is loki. Using a different name requires additional configuration.

Procedure

  1. In the OpenShift Container Platform web console, click OperatorsOperatorHub.
  2. Choose Network Observability Operator from the list of available Operators in the OperatorHub, and click Install.
  3. Select the checkbox Enable Operator recommended cluster monitoring on this Namespace.
  4. Navigate to OperatorsInstalled Operators. Under Provided APIs for Network Observability, select the Flow Collector link.

    1. Navigate to the Flow Collector tab, and click Create FlowCollector. Make the following selections in the form view:

      • spec.agent.ebpf.Sampling : Specify a sampling size for flows. Lower sampling sizes will have higher impact on resource utilization. For more information, see the FlowCollector API reference, under spec.agent.ebpf.
      • spec.deploymentModel: If you are using Kafka, verify Kafka is selected.
      • spec.exporters: If you are using Kafka, you can optionally send network flows to Kafka, so that they can be consumed by any processor or storage that supports Kafka input, such as Splunk, Elasticsearch, or Fluentd. To do this, set the following specifications:

        • Set the type to KAFKA.
        • Set the address as kafka-cluster-kafka-bootstrap.netobserv.
        • Set the topic as netobserv-flows-export. The Operator exports all flows to the configured Kafka topic.
        • Set the following tls specifications:

          • certFile: service-ca.crt, name: kafka-gateway-ca-bundle, and type: configmap.

            You can also configure this option at a later time by directly editing the YAML. For more information, see Export enriched network flow data.

      • loki.url: Since authentication is specified separately, this URL needs to be updated to https://loki-gateway-http.netobserv.svc:8080/api/logs/v1/network. The first part of the URL, "loki", should match the name of your LokiStack.
      • loki.statusUrl: Set this to https://loki-query-frontend-http.netobserv.svc:3100/. The first part of the URL, "loki", should match the name of your LokiStack.
      • loki.authToken: Select the FORWARD value.
      • tls.enable: Verify that the box is checked so it is enabled.
      • statusTls: The enable value is false by default.

        For the first part of the certificate reference names: loki-gateway-ca-bundle, loki-ca-bundle, and loki-query-frontend-http,loki, should match the name of your LokiStack.

    2. Click Create.

Verification

To confirm this was successful, when you navigate to Observe you should see Network Traffic listed in the options.

In the absence of Application Traffic within the OpenShift Container Platform cluster, default filters might show that there are "No results", which results in no visual flow. Beside the filter selections, select Clear all filters to see the flow.

Important

If you installed Loki using the Loki Operator, it is advised not to use querierUrl, as it can break the console access to Loki. If you installed Loki using another type of Loki installation, this does not apply.

Additional resources

28.3.6. Uninstalling the Network Observability Operator

You can uninstall the Network Observability Operator using the OpenShift Container Platform web console Operator Hub, working in the OperatorsInstalled Operators area.

Procedure

  1. Remove the FlowCollector custom resource.

    1. Click Flow Collector, which is next to the Network Observability Operator in the Provided APIs column.
    2. Click the options menu kebab for the cluster and select Delete FlowCollector.
  2. Uninstall the Network Observability Operator.

    1. Navigate back to the OperatorsInstalled Operators area.
    2. Click the options menu kebab next to the Network Observability Operator and select Uninstall Operator.
    3. HomeProjects and select openshift-netobserv-operator
    4. Navigate to Actions and select Delete Project
  3. Remove the FlowCollector custom resource definition (CRD).

    1. Navigate to AdministrationCustomResourceDefinitions.
    2. Look for FlowCollector and click the options menu kebab .
    3. Select Delete CustomResourceDefinition.

      Important

      The Loki Operator and Kafka remain if they were installed and must be removed separately. Additionally, you might have remaining data stored in an object store, and a persistent volume that must be removed.

28.4. Network Observability Operator in OpenShift Container Platform

Network Observability is an OpenShift operator that deploys a monitoring pipeline to collect and enrich network traffic flows that are produced by the Network Observability eBPF agent.

28.4.1. Viewing statuses

The Network Observability Operator provides the Flow Collector API. When a Flow Collector resource is created, it deploys pods and services to create and store network flows in the Loki log store, as well as to display dashboards, metrics, and flows in the OpenShift Container Platform web console.

Procedure

  1. Run the following command to view the state of FlowCollector:

    $ oc get flowcollector/cluster

    Example output

    NAME      AGENT   SAMPLING (EBPF)   DEPLOYMENT MODEL   STATUS
    cluster   EBPF    50                DIRECT             Ready

  2. Check the status of pods running in the netobserv namespace by entering the following command:

    $ oc get pods -n netobserv

    Example output

    NAME                              READY   STATUS    RESTARTS   AGE
    flowlogs-pipeline-56hbp           1/1     Running   0          147m
    flowlogs-pipeline-9plvv           1/1     Running   0          147m
    flowlogs-pipeline-h5gkb           1/1     Running   0          147m
    flowlogs-pipeline-hh6kf           1/1     Running   0          147m
    flowlogs-pipeline-w7vv5           1/1     Running   0          147m
    netobserv-plugin-cdd7dc6c-j8ggp   1/1     Running   0          147m

flowlogs-pipeline pods collect flows, enriches the collected flows, then send flows to the Loki storage. netobserv-plugin pods create a visualization plugin for the OpenShift Container Platform Console.

  1. Check the status of pods running in the namespace netobserv-privileged by entering the following command:

    $ oc get pods -n netobserv-privileged

    Example output

    NAME                         READY   STATUS    RESTARTS   AGE
    netobserv-ebpf-agent-4lpp6   1/1     Running   0          151m
    netobserv-ebpf-agent-6gbrk   1/1     Running   0          151m
    netobserv-ebpf-agent-klpl9   1/1     Running   0          151m
    netobserv-ebpf-agent-vrcnf   1/1     Running   0          151m
    netobserv-ebpf-agent-xf5jh   1/1     Running   0          151m

netobserv-ebpf-agent pods monitor network interfaces of the nodes to get flows and send them to flowlogs-pipeline pods.

  1. If you are using a Loki Operator, check the status of pods running in the openshift-operators-redhat namespace by entering the following command:

    $ oc get pods -n openshift-operators-redhat

    Example output

    NAME                                                READY   STATUS    RESTARTS   AGE
    loki-operator-controller-manager-5f6cff4f9d-jq25h   2/2     Running   0          18h
    lokistack-compactor-0                               1/1     Running   0          18h
    lokistack-distributor-654f87c5bc-qhkhv              1/1     Running   0          18h
    lokistack-distributor-654f87c5bc-skxgm              1/1     Running   0          18h
    lokistack-gateway-796dc6ff7-c54gz                   2/2     Running   0          18h
    lokistack-index-gateway-0                           1/1     Running   0          18h
    lokistack-index-gateway-1                           1/1     Running   0          18h
    lokistack-ingester-0                                1/1     Running   0          18h
    lokistack-ingester-1                                1/1     Running   0          18h
    lokistack-ingester-2                                1/1     Running   0          18h
    lokistack-querier-66747dc666-6vh5x                  1/1     Running   0          18h
    lokistack-querier-66747dc666-cjr45                  1/1     Running   0          18h
    lokistack-querier-66747dc666-xh8rq                  1/1     Running   0          18h
    lokistack-query-frontend-85c6db4fbd-b2xfb           1/1     Running   0          18h
    lokistack-query-frontend-85c6db4fbd-jm94f           1/1     Running   0          18h

28.4.2. Viewing Network Observability Operator status and configuration

You can inspect the status and view the details of the FlowCollector using the oc describe command.

Procedure

  1. Run the following command to view the status and configuration of the Network Observability Operator:

    $ oc describe flowcollector/cluster

28.5. Configuring the Network Observability Operator

You can update the Flow Collector API resource to configure the Network Observability Operator and its managed components. The Flow Collector is explicitly created during installation. Since this resource operates cluster-wide, only a single FlowCollector is allowed, and it has to be named cluster.

28.5.1. View the FlowCollector resource

You can view and edit YAML directly in the OpenShift Container Platform web console.

Procedure

  1. In the web console, navigate to OperatorsInstalled Operators.
  2. Under the Provided APIs heading for the NetObserv Operator, select Flow Collector.
  3. Select cluster then select the YAML tab. There, you can modify the FlowCollector resource to configure the Network Observability operator.

The following example shows a sample FlowCollector resource for OpenShift Container Platform Network Observability operator:

Sample FlowCollector resource

apiVersion: flows.netobserv.io/v1beta1
kind: FlowCollector
metadata:
  name: cluster
spec:
  namespace: netobserv
  deploymentModel: DIRECT
  agent:
    type: EBPF                                1
    ebpf:
      sampling: 50                            2
      logLevel: info
      privileged: false
      resources:
        requests:
          memory: 50Mi
          cpu: 100m
        limits:
          memory: 800Mi
  processor:
    logLevel: info
    resources:
      requests:
        memory: 100Mi
        cpu: 100m
      limits:
        memory: 800Mi
    conversationEndTimeout: 10s
    logTypes: FLOWS                            3
    conversationHeartbeatInterval: 30s
  loki:                                       4
    url: 'https://loki-gateway-http.netobserv.svc:8080/api/logs/v1/network'
    statusUrl: 'https://loki-query-frontend-http.netobserv.svc:3100/'
    authToken: FORWARD
    tls:
      enable: true
      caCert:
        type: configmap
        name: loki-gateway-ca-bundle
        certFile: service-ca.crt
  consolePlugin:
    register: true
    logLevel: info
    portNaming:
      enable: true
      portNames:
        "3100": loki
    quickFilters:                             5
    - name: Applications
      filter:
        src_namespace!: 'openshift-,netobserv'
        dst_namespace!: 'openshift-,netobserv'
      default: true
    - name: Infrastructure
      filter:
        src_namespace: 'openshift-,netobserv'
        dst_namespace: 'openshift-,netobserv'
    - name: Pods network
      filter:
        src_kind: 'Pod'
        dst_kind: 'Pod'
      default: true
    - name: Services network
      filter:
        dst_kind: 'Service'

1
The Agent specification, spec.agent.type, must be EBPF. eBPF is the only OpenShift Container Platform supported option.
2
You can set the Sampling specification, spec.agent.ebpf.sampling, to manage resources. Lower sampling values might consume a large amount of computational, memory and storage resources. You can mitigate this by specifying a sampling ratio value. A value of 100 means 1 flow every 100 is sampled. A value of 0 or 1 means all flows are captured. The lower the value, the increase in returned flows and the accuracy of derived metrics. By default, eBPF sampling is set to a value of 50, so 1 flow every 50 is sampled. Note that more sampled flows also means more storage needed. It is recommend to start with default values and refine empirically, to determine which setting your cluster can manage.
3
The optional specifications spec.processor.logTypes, spec.processor.conversationHeartbeatInterval, and spec.processor.conversationEndTimeout can be set to enable conversation tracking. When enabled, conversation events are queryable in the web console. The values for spec.processor.logTypes are as follows: FLOWS CONVERSATIONS, ENDED_CONVERSATIONS, or ALL. Storage requirements are highest for ALL and lowest for ENDED_CONVERSATIONS.
4
The Loki specification, spec.loki, specifies the Loki client. The default values match the Loki install paths mentioned in the Installing the Loki Operator section. If you used another installation method for Loki, specify the appropriate client information for your install.
5
The spec.quickFilters specification defines filters that show up in the web console. The Application filter keys,src_namespace and dst_namespace, are negated (!), so the Application filter shows all traffic that does not originate from, or have a destination to, any openshift- or netobserv namespaces. For more information, see Configuring quick filters below.

Additional resources

For more information about conversation tracking, see Working with conversations.

28.5.2. Configuring the Flow Collector resource with Kafka

You can configure the FlowCollector resource to use Kafka. A Kafka instance needs to be running, and a Kafka topic dedicated to OpenShift Container Platform Network Observability must be created in that instance. For more information, refer to your Kafka documentation, such as Kafka documentation with AMQ Streams.

The following example shows how to modify the FlowCollector resource for OpenShift Container Platform Network Observability operator to use Kafka:

Sample Kafka configuration in FlowCollector resource

  deploymentModel: KAFKA                                    1
  kafka:
    address: "kafka-cluster-kafka-bootstrap.netobserv"      2
    topic: network-flows                                    3
    tls:
      enable: false                                         4

1
Set spec.deploymentModel to KAFKA instead of DIRECT to enable the Kafka deployment model.
2
spec.kafka.address refers to the Kafka bootstrap server address. You can specify a port if needed, for instance kafka-cluster-kafka-bootstrap.netobserv:9093 for using TLS on port 9093.
3
spec.kafka.topic should match the name of a topic created in Kafka.
4
spec.kafka.tls can be used to encrypt all communications to and from Kafka with TLS or mTLS. When enabled, the Kafka CA certificate must be available as a ConfigMap or a Secret, both in the namespace where the flowlogs-pipeline processor component is deployed (default: netobserv) and where the eBPF agents are deployed (default: netobserv-privileged). It must be referenced with spec.kafka.tls.caCert. When using mTLS, client secrets must be available in these namespaces as well (they can be generated for instance using the AMQ Streams User Operator) and referenced with spec.kafka.tls.userCert.

28.5.3. Export enriched network flow data

You can send network flows to Kafka, so that they can be consumed by any processor or storage that supports Kafka input, such as Splunk, Elasticsearch, or Fluentd.

Prerequisites

  • Installed Kafka

Procedure

  1. In the web console, navigate to OperatorsInstalled Operators.
  2. Under the Provided APIs heading for the NetObserv Operator, select Flow Collector.
  3. Select cluster and then select the YAML tab.
  4. Edit the FlowCollector to configure spec.exporters as follows:

    apiVersion: flows.netobserv.io/v1alpha1
    kind: FlowCollector
    metadata:
      name: cluster
    spec:
      exporters:
      - type: KAFKA
          kafka:
            address: "kafka-cluster-kafka-bootstrap.netobserv"
            topic: netobserv-flows-export   1
            tls:
              enable: false                 2
    1
    The Network Observability Operator exports all flows to the configured Kafka topic.
    2
    You can encrypt all communications to and from Kafka with SSL/TLS or mTLS. When enabled, the Kafka CA certificate must be available as a ConfigMap or a Secret, both in the namespace where the flowlogs-pipeline processor component is deployed (default: netobserv). It must be referenced with spec.exporters.tls.caCert. When using mTLS, client secrets must be available in these namespaces as well (they can be generated for instance using the AMQ Streams User Operator) and referenced with spec.exporters.tls.userCert.
  5. After configuration, network flows data can be sent to an available output in a JSON format. For more information, see Network flows format reference

Additional resources

For more information about specifying flow format, see Network flows format reference.

28.5.4. Updating the Flow Collector resource

As an alternative to editing YAML in the OpenShift Container Platform web console, you can configure specifications, such as eBPF sampling, by patching the flowcollector custom resource (CR):

Procedure

  1. Run the following command to patch the flowcollector CR and update the spec.agent.ebpf.sampling value:

    $ oc patch flowcollector cluster --type=json -p "[{"op": "replace", "path": "/spec/agent/ebpf/sampling", "value": <new value>}] -n netobserv"

28.5.5. Configuring quick filters

You can modify the filters in the FlowCollector resource. Exact matches are possible using double-quotes around values. Otherwise, partial matches are used for textual values. The bang (!) character, placed at the end of a key, means negation. See the sample FlowCollector resource for more context about modifying the YAML.

Note

The filter matching types "all of" or "any of" is a UI setting that the users can modify from the query options. It is not part of this resource configuration.

Here is a list of all available filter keys:

Table 28.2. Filter keys

Universal*SourceDestinationDescription

namespace

src_namespace

dst_namespace

Filter traffic related to a specific namespace.

name

src_name

dst_name

Filter traffic related to a given leaf resource name, such as a specific pod, service, or node (for host-network traffic).

kind

src_kind

dst_kind

Filter traffic related to a given resource kind. The resource kinds include the leaf resource (Pod, Service or Node), or the owner resource (Deployment and StatefulSet).

owner_name

src_owner_name

dst_owner_name

Filter traffic related to a given resource owner; that is, a workload or a set of pods. For example, it can be a Deployment name, a StatefulSet name, etc.

resource

src_resource

dst_resource

Filter traffic related to a specific resource that is denoted by its canonical name, that identifies it uniquely. The canonical notation is kind.namespace.name for namespaced kinds, or node.name for nodes. For example, Deployment.my-namespace.my-web-server.

address

src_address

dst_address

Filter traffic related to an IP address. IPv4 and IPv6 are supported. CIDR ranges are also supported.

mac

src_mac

dst_mac

Filter traffic related to a MAC address.

port

src_port

dst_port

Filter traffic related to a specific port.

host_address

src_host_address

dst_host_address

Filter traffic related to the host IP address where the pods are running.

protocol

N/A

N/A

Filter traffic related to a protocol, such as TCP or UDP.

  • Universal keys filter for any of source or destination. For example, filtering name: 'my-pod' means all traffic from my-pod and all traffic to my-pod, regardless of the matching type used, whether Match all or Match any.

28.5.6. Resource management and performance considerations

The amount of resources required by Network Observability depends on the size of your cluster and your requirements for the cluster to ingest and store observability data. To manage resources and set performance criteria for your cluster, consider configuring the following settings. Configuring these settings might meet your optimal setup and observability needs.

The following settings can help you manage resources and performance from the outset:

eBPF Sampling
You can set the Sampling specification, spec.agent.ebpf.sampling, to manage resources. Smaller sampling values might consume a large amount of computational, memory and storage resources. You can mitigate this by specifying a sampling ratio value. A value of 100 means 1 flow every 100 is sampled. A value of 0 or 1 means all flows are captured. Smaller values result in an increase in returned flows and the accuracy of derived metrics. By default, eBPF sampling is set to a value of 50, so 1 flow every 50 is sampled. Note that more sampled flows also means more storage needed. Consider starting with the default values and refine empirically, in order to determine which setting your cluster can manage.
Restricting or excluding interfaces
Reduce the overall observed traffic by setting the values for spec.agent.ebpf.interfaces and spec.agent.ebpf.excludeInterfaces. By default, the agent fetches all the interfaces in the system, except the ones listed in excludeInterfaces and lo (local interface). Note that the interface names might vary according to the Container Network Interface (CNI) used.

The following settings can be used to fine-tune performance after the Network Observability has been running for a while:

Resource requirements and limits
Adapt the resource requirements and limits to the load and memory usage you expect on your cluster by using the spec.agent.ebpf.resources and spec.processor.resources specifications. The default limits of 800MB might be sufficient for most medium-sized clusters.
Cache max flows timeout
Control how often flows are reported by the agents by using the eBPF agent’s spec.agent.ebpf.cacheMaxFlows and spec.agent.ebpf.cacheActiveTimeout specifications. A larger value results in less traffic being generated by the agents, which correlates with a lower CPU load. However, a larger value leads to a slightly higher memory consumption, and might generate more latency in the flow collection.

28.5.6.1. Resource considerations

The following table outlines examples of resource considerations for clusters with certain workload sizes.

Important

The examples outlined in the table demonstrate scenarios that are tailored to specific workloads. Consider each example only as a baseline from which adjustments can be made to accommodate your workload needs.

Table 28.3. Resource recommendations

 Extra small (10 nodes)Small (25 nodes)Medium (65 nodes) [2]Large (120 nodes) [2]

Worker Node vCPU and memory

4 vCPUs| 16GiB mem [1]

16 vCPUs| 64GiB mem [1]

16 vCPUs| 64GiB mem [1]

16 vCPUs| 64GiB Mem [1]

LokiStack size

1x.extra-small

1x.small

1x.small

1x.medium

Network Observability controller memory limit

400Mi (default)

400Mi (default)

400Mi (default)

800Mi

eBPF sampling rate

50 (default)

50 (default)

50 (default)

50 (default)

eBPF memory limit

800Mi (default)

800Mi (default)

2000Mi

800Mi (default)

FLP memory limit

800Mi (default)

800Mi (default)

800Mi (default)

800Mi (default)

FLP Kafka partitions

N/A

48

48

48

Kafka consumer replicas

N/A

24

24

24

Kafka brokers

N/A

3 (default)

3 (default)

3 (default)

  1. Tested with AWS M6i instances.
  2. In addition to this worker and its controller, 3 infra nodes (size M6i.12xlarge) and 1 workload node (size M6i.8xlarge) were tested.

28.6. Network Policy

As a user with the admin role, you can create a network policy for the netobserv namespace.

28.6.1. Creating a network policy for Network Observability

You might need to create a network policy to secure ingress traffic to the netobserv namespace. In the web console, you can create a network policy using the form view.

Procedure

  1. Navigate to NetworkingNetworkPolicies.
  2. Select the netobserv project from the Project dropdown menu.
  3. Name the policy. For this example, the policy name is allow-ingress.
  4. Click Add ingress rule three times to create three ingress rules.
  5. Specify the following in the form:

    1. Make the following specifications for the first Ingress rule:

      1. From the Add allowed source dropdown menu, select Allow pods from the same namespace.
    2. Make the following specifications for the second Ingress rule:

      1. From the Add allowed source dropdown menu, select Allow pods from inside the cluster.
      2. Click + Add namespace selector.
      3. Add the label, kubernetes.io/metadata.name, and the selector, openshift-console.
    3. Make the following specifications for the third Ingress rule:

      1. From the Add allowed source dropdown menu, select Allow pods from inside the cluster.
      2. Click + Add namespace selector.
      3. Add the label, kubernetes.io/metadata.name, and the selector, openshift-monitoring.

Verification

  1. Navigate to ObserveNetwork Traffic.
  2. View the Traffic Flows tab, or any tab, to verify that the data is displayed.
  3. Navigate to ObserveDashboards. In the NetObserv/Health selection, verify that the flows are being ingested and sent to Loki, which is represented in the first graph.

28.6.2. Example network policy

The following annotates an example NetworkPolicy object for the netobserv namespace:

Sample network policy

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-ingress
  namespace: netobserv
spec:
  podSelector: {}            1
  ingress:
    - from:
        - podSelector: {}    2
          namespaceSelector: 3
            matchLabels:
              kubernetes.io/metadata.name: openshift-console
        - podSelector: {}
          namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: openshift-monitoring
  policyTypes:
    - Ingress
status: {}

1
A selector that describes the pods to which the policy applies. The policy object can only select pods in the project that defines the NetworkPolicy object. In this documentation, it would be the project in which the Network Observability Operator is installed, which is the netobserv project.
2
A selector that matches the pods from which the policy object allows ingress traffic. The default is that the selector matches pods in the same namespace as the NetworkPolicy.
3
When the namespaceSelector is specified, the selector matches pods in the specified namespace.

28.7. Observing the network traffic

As an administrator, you can observe the network traffic in the OpenShift Container Platform console for detailed troubleshooting and analysis. This feature helps you get insights from different graphical representations of traffic flow. There are several available views to observe the network traffic.

28.7.1. Observing the network traffic from the Overview view

The Overview view displays the overall aggregated metrics of the network traffic flow on the cluster. As an administrator, you can monitor the statistics with the available display options.

28.7.1.1. Working with the Overview view

As an administrator, you can navigate to the Overview view to see the graphical representation of the flow rate statistics.

Procedure

  1. Navigate to ObserveNetwork Traffic.
  2. In the Network Traffic page, click the Overview tab.

You can configure the scope of each flow rate data by clicking the menu icon.

28.7.1.2. Configuring advanced options for the Overview view

You can customize the graphical view by using advanced options. To access the advanced options, click Show advanced options.You can configure the details in the graph by using the Display options drop-down menu. The options available are:

  • Metric type: The metrics to be shown in Bytes or Packets. The default value is Bytes.
  • Scope: To select the detail of components between which the network traffic flows. You can set the scope to Node, Namespace, Owner, or Resource. Owner is an aggregation of resources. Resource can be a pod, service, node, in case of host-network traffic, or an unknown IP address. The default value is Namespace.
  • Truncate labels: Select the required width of the label from the drop-down list. The default value is M.
28.7.1.2.1. Managing panels

You can select the required statistics to be displayed, and reorder them. To manage columns, click Manage panels.

28.7.2. Observing the network traffic from the Traffic flows view

The Traffic flows view displays the data of the network flows and the amount of traffic in a table. As an administrator, you can monitor the amount of traffic across the application by using the traffic flow table.

28.7.2.1. Working with the Traffic flows view

As an administrator, you can navigate to Traffic flows table to see network flow information.

Procedure

  1. Navigate to ObserveNetwork Traffic.
  2. In the Network Traffic page, click the Traffic flows tab.

You can click on each row to get the corresponding flow information.

28.7.2.2. Configuring advanced options for the Traffic flows view

You can customize and export the view by using Show advanced options. You can set the row size by using the Display options drop-down menu. The default value is Normal.

28.7.2.2.1. Managing columns

You can select the required columns to be displayed, and reorder them. To manage columns, click Manage columns.

28.7.2.2.2. Exporting the traffic flow data

You can export data from the Traffic flows view.

Procedure

  1. Click Export data.
  2. In the pop-up window, you can select the Export all data checkbox to export all the data, and clear the checkbox to select the required fields to be exported.
  3. Click Export.

28.7.2.3. Working with conversation tracking

As an administrator, you can you can group network flows that are part of the same conversation. A conversation is defined as a grouping of peers that are identified by their IP addresses, ports, and protocols, resulting in an unique Conversation Id. You can query conversation events in the web console. These events are represented in the web console as follows:

  • Conversation start: This event happens when a connection is starting or TCP flag intercepted
  • Conversation tick: This event happens at each specified interval defined in the FlowCollector spec.processor.conversationHeartbeatInterval parameter while the connection is active.
  • Conversation end: This event happens when the FlowCollector spec.processor.conversationEndTimeout parameter is reached or the TCP flag is intercepted.
  • Flow: This is the network traffic flow that occurs within the specified interval.

Procedure

  1. In the web console, navigate to OperatorsInstalled Operators.
  2. Under the Provided APIs heading for the NetObserv Operator, select Flow Collector.
  3. Select cluster then select the YAML tab.
  4. Configure the FlowCollector custom resource so that spec.processor.logTypes, conversationEndTimeout, and conversationHeartbeatInterval parameters are set according to your observation needs. A sample configuration is as follows:

    Configure FlowCollector for conversation tracking

    apiVersion: flows.netobserv.io/v1alpha1
    kind: FlowCollector
    metadata:
      name: cluster
    spec:
     processor:
      conversationEndTimeout: 10s                  1
      logTypes: FLOWS                              2
      conversationHeartbeatInterval: 30s           3

    1
    The Conversation end event represents the point when the conversationEndTimeout is reached or the TCP flag is intercepted.
    2
    When logTypes is set to FLOWS, only the Flow event is exported. If you set the value to ALL, both conversation and flow events are exported and visible in the Network Traffic page. To focus only on conversation events, you can specify CONVERSATIONS which exports the Conversation start, Conversation tick and Conversation end events; or ENDED_CONVERSATIONS exports only the Conversation end events. Storage requirements are highest for ALL and lowest for ENDED_CONVERSATIONS.
    3
    The Conversation tick event represents each specified interval defined in the FlowCollector conversationHeartbeatInterval parameter while the network connection is active.
    Note

    If you update the logType option, the flows from the previous selection do not clear from the console plugin. For example, if you initially set logType to CONVERSATIONS for a span of time until 10 AM and then move to ENDED_CONVERSATIONS, the console plugin shows all conversation events before 10 AM and only ended conversations after 10 AM.

  5. Refresh the Network Traffic page on the Traffic flows tab. Notice there are two new columns, Event/Type and Conversation Id. All the Event/Type fields are Flow when Flow is the selected query option.
  6. Select Query Options and choose the Log Type, Conversation. Now the Event/Type shows all of the desired conversation events.
  7. Next you can filter on a specific conversation ID or switch between the Conversation and Flow log type options from the side panel.
28.7.2.3.1. Using the histogram

You can click Show histogram to display a toolbar view for visualizing the history of flows as a bar chart. The histogram shows the number of logs over time. You can select a part of the histogram to filter the network flow data in the table that follows the toolbar.

28.7.3. Observing the network traffic from the Topology view

The Topology view provides a graphical representation of the network flows and the amount of traffic. As an administrator, you can monitor the traffic data across the application by using the Topology view.

28.7.3.1. Working with the Topology view

As an administrator, you can navigate to the Topology view to see the details and metrics of the component.

Procedure

  1. Navigate to ObserveNetwork Traffic.
  2. In the Network Traffic page, click the Topology tab.

You can click each component in the Topology to view the details and metrics of the component.

28.7.3.2. Configuring the advanced options for the Topology view

You can customize and export the view by using Show advanced options. The advanced options view has the following features:

  • Find in view: To search the required components in the view.
  • Display options: To configure the following options:

    • Layout: To select the layout of the graphical representation. The default value is ColaNoForce.
    • Scope: To select the scope of components between which the network traffic flows. The default value is Namespace.
    • Groups: To enchance the understanding of ownership by grouping the components. The default value is None.
    • Collapse groups: To expand or collapse the groups. The groups are expanded by default. This option is disabled if Groups has value None.
    • Show: To select the details that need to be displayed. All the options are checked by default. The options available are: Edges, Edges label, and Badges.
    • Truncate labels: To select the required width of the label from the drop-down list. The default value is M.
28.7.3.2.1. Exporting the topology view

To export the view, click Export topology view. The view is downloaded in PNG format.

28.7.4. Filtering the network traffic

By default, the Network Traffic page displays the traffic flow data in the cluster based on the default filters configured in the FlowCollector instance. You can use the filter options to observe the required data by changing the preset filter.

Query Options

You can use Query Options to optimize the search results, as listed below:

  • Log Type: The available options Conversation and Flows provide the ability to query flows by log type, such as flow log, new conversation, completed conversation, and a heartbeat, which is a periodic record with updates for long conversations. A conversation is an aggregation of flows between the same peers.
  • Reporter Node: Every flow can be reported from both source and destination nodes. For cluster ingress, the flow is reported from the destination node and for cluster egress, the flow is reported from the source node. You can select either Source or Destination. The option Both is disabled for the Overview and Topology view. The default selected value is Destination.
  • Match filters: You can determine the relation between different filter parameters selected in the advanced filter. The available options are Match all and Match any. Match all provides results that match all the values, and Match any provides results that match any of the values entered. The default value is Match all.
  • Limit: The data limit for internal backend queries. Depending upon the matching and the filter settings, the number of traffic flow data is displayed within the specified limit.
Quick filters
The default values in Quick filters drop-down menu are defined in the FlowCollector configuration. You can modify the options from console.
Advanced filters
You can set the advanced filters by providing the parameter to be filtered and its corresponding text value. The section Common in the parameter drop-down list filters the results that match either Source or Destination. To enable or disable the applied filter, you can click on the applied filter listed below the filter options.
Note

To understand the rules of specifying the text value, click Learn More.

You can click Reset default filter to remove the existing filters, and apply the filter defined in FlowCollector configuration.

Alternatively, you can access the traffic flow data in the Network Traffic tab of the Namespaces, Services, Routes, Nodes, and Workloads pages which provide the filtered data of the corresponding aggregations.

Additional resources

For more information about configuring quick filters in the FlowCollector, see Configuring Quick Filters and the Flow Collector sample resource.

28.8. Monitoring the Network Observability Operator

You can use the web console to monitor alerts related to the health of the Network Observability Operator.

28.8.1. Viewing health information

You can access metrics about health and resource usage of the Network Observability Operator from the Dashboards page in the web console. A health alert banner that directs you to the dashboard can appear on the Network Traffic and Home pages in the event that an alert is triggered. Alerts are generated in the following cases:

  • The NetObservLokiError alert occurs if the flowlogs-pipeline workload is dropping flows because of Loki errors, such as if the Loki ingestion rate limit has been reached.
  • The NetObservNoFlows alert occurs if no flows are ingested for a certain amount of time..Prerequisites
  • You have the Network Observability Operator installed.
  • You have access to the cluster as a user with the cluster-admin role or with view permissions for all projects.

Procedure

  1. From the Administrator perspective in the web console, navigate to ObserveDashboards.
  2. From the Dashboards dropdown, select Netobserv/Health. Metrics about the health of the Operator are displayed on the page.

28.8.1.1. Disabling health alerts

You can opt out of health alerting by editing the FlowCollector resource:

  1. In the web console, navigate to OperatorsInstalled Operators.
  2. Under the Provided APIs heading for the NetObserv Operator, select Flow Collector.
  3. Select cluster then select the YAML tab.
  4. Add spec.processor.metrics.disableAlerts to disable health alerts, as in the following YAML sample:
apiVersion: flows.netobserv.io/v1alpha1
kind: FlowCollector
metadata:
  name: cluster
spec:
  processor:
    metrics:
      disableAlerts: [NetObservLokiError, NetObservNoFlows] 1
1
You can specify one or a list with both types of alerts to disable.

28.9. FlowCollector configuration parameters

FlowCollector is the Schema for the network flows collection API, which pilots and configures the underlying deployments.

28.9.1. FlowCollector API specifications

Description
FlowCollector is the schema for the network flows collection API, which pilots and configures the underlying deployments.
Type
object
PropertyTypeDescription

apiVersion

string

APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and might reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources

kind

string

Kind is a string value representing the REST resource this object represents. Servers might infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds

metadata

object

Standard object’s metadata. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata

spec

object

FlowCollectorSpec defines the desired state of the FlowCollector resource.

*: the mention of "unsupported", or "deprecated" for a feature throughout this document means that this feature is not officially supported by Red Hat. It might have been, for instance, contributed by the community and accepted without a formal agreement for maintenance. The product maintainers might provide some support for these features as a best effort only.

28.9.1.1. .metadata

Description
Standard object’s metadata. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
Type
object

28.9.1.2. .spec

Description
FlowCollectorSpec defines the desired state of the FlowCollector resource.

*: the mention of "unsupported", or "deprecated" for a feature throughout this document means that this feature is not officially supported by Red Hat. It might have been, for instance, contributed by the community and accepted without a formal agreement for maintenance. The product maintainers might provide some support for these features as a best effort only.
Type
object
Required
  • agent
  • deploymentModel
PropertyTypeDescription

agent

object

Agent configuration for flows extraction.

consolePlugin

object

consolePlugin defines the settings related to the OpenShift Container Platform Console plugin, when available.

deploymentModel

string

deploymentModel defines the desired type of deployment for flow processing. Possible values are:
- DIRECT (default) to make the flow processor listening directly from the agents.
- KAFKA to make flows sent to a Kafka pipeline before consumption by the processor.
Kafka can provide better scalability, resiliency, and high availability (for more details, see https://www.redhat.com/en/topics/integration/what-is-apache-kafka).

exporters

array

exporters define additional optional exporters for custom consumption or storage.

kafka

object

Kafka configuration, allowing to use Kafka as a broker as part of the flow collection pipeline. Available when the spec.deploymentModel is KAFKA.

loki

object

Loki, the flow store, client settings.

namespace

string

Namespace where NetObserv pods are deployed. If empty, the namespace of the operator is going to be used.

processor

object

processor defines the settings of the component that receives the flows from the agent, enriches them, generates metrics, and forwards them to the Loki persistence layer and/or any available exporter.

28.9.1.3. .spec.agent

Description
Agent configuration for flows extraction.
Type
object
Required
  • type
PropertyTypeDescription

ebpf

object

ebpf describes the settings related to the eBPF-based flow reporter when spec.agent.type is set to EBPF.

ipfix

object

ipfix - deprecated (*) - describes the settings related to the IPFIX-based flow reporter when spec.agent.type is set to IPFIX.

type

string

type selects the flows tracing agent. Possible values are:
- EBPF (default) to use NetObserv eBPF agent.
- IPFIX - deprecated (*) - to use the legacy IPFIX collector.
EBPF is recommended as it offers better performances and should work regardless of the CNI installed on the cluster. IPFIX works with OVN-Kubernetes CNI (other CNIs could work if they support exporting IPFIX, but they would require manual configuration).

28.9.1.4. .spec.agent.ebpf

Description
ebpf describes the settings related to the eBPF-based flow reporter when spec.agent.type is set to EBPF.
Type
object
PropertyTypeDescription

cacheActiveTimeout

string

cacheActiveTimeout is the max period during which the reporter will aggregate flows before sending. Increasing cacheMaxFlows and cacheActiveTimeout can decrease the network traffic overhead and the CPU load, however you can expect higher memory consumption and an increased latency in the flow collection.

cacheMaxFlows

integer

cacheMaxFlows is the max number of flows in an aggregate; when reached, the reporter sends the flows. Increasing cacheMaxFlows and cacheActiveTimeout can decrease the network traffic overhead and the CPU load, however you can expect higher memory consumption and an increased latency in the flow collection.

debug

object

debug allows setting some aspects of the internal configuration of the eBPF agent. This section is aimed exclusively for debugging and fine-grained performance optimizations, such as GOGC and GOMAXPROCS env vars. Users setting its values do it at their own risk.

excludeInterfaces

array (string)

excludeInterfaces contains the interface names that will be excluded from flow tracing. An entry is enclosed by slashes, such as /br-/, is matched as a regular expression. Otherwise it is matched as a case-sensitive string.

imagePullPolicy

string

imagePullPolicy is the Kubernetes pull policy for the image defined above

interfaces

array (string)

interfaces contains the interface names from where flows will be collected. If empty, the agent will fetch all the interfaces in the system, excepting the ones listed in ExcludeInterfaces. An entry is enclosed by slashes, such as /br-/, is matched as a regular expression. Otherwise it is matched as a case-sensitive string.

kafkaBatchSize

integer

kafkaBatchSize limits the maximum size of a request in bytes before being sent to a partition. Ignored when not using Kafka. Default: 10MB.

logLevel

string

logLevel defines the log level for the NetObserv eBPF Agent

privileged

boolean

Privileged mode for the eBPF Agent container. In general this setting can be ignored or set to false: in that case, the operator will set granular capabilities (BPF, PERFMON, NET_ADMIN, SYS_RESOURCE) to the container, to enable its correct operation. If for some reason these capabilities cannot be set, such as if an old kernel version not knowing CAP_BPF is in use, then you can turn on this mode for more global privileges.

resources

object

resources are the compute resources required by this container. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

sampling

integer

Sampling rate of the flow reporter. 100 means one flow on 100 is sent. 0 or 1 means all flows are sampled.

28.9.1.5. .spec.agent.ebpf.debug

Description
debug allows setting some aspects of the internal configuration of the eBPF agent. This section is aimed exclusively for debugging and fine-grained performance optimizations, such as GOGC and GOMAXPROCS env vars. Users setting its values do it at their own risk.
Type
object
PropertyTypeDescription

env

object (string)

env allows passing custom environment variables to underlying components. Useful for passing some very concrete performance-tuning options, such as GOGC and GOMAXPROCS, that should not be publicly exposed as part of the FlowCollector descriptor, as they are only useful in edge debug or support scenarios.

28.9.1.6. .spec.agent.ebpf.resources

Description
resources are the compute resources required by this container. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
Type
object
PropertyTypeDescription

limits

integer-or-string

Limits describes the maximum amount of compute resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

requests

integer-or-string

Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

28.9.1.7. .spec.agent.ipfix

Description
ipfix - deprecated (*) - describes the settings related to the IPFIX-based flow reporter when spec.agent.type is set to IPFIX.
Type
object
PropertyTypeDescription

cacheActiveTimeout

string

cacheActiveTimeout is the max period during which the reporter will aggregate flows before sending

cacheMaxFlows

integer

cacheMaxFlows is the max number of flows in an aggregate; when reached, the reporter sends the flows

clusterNetworkOperator

object

clusterNetworkOperator defines the settings related to the OpenShift Container Platform Cluster Network Operator, when available.

forceSampleAll

boolean

forceSampleAll allows disabling sampling in the IPFIX-based flow reporter. It is not recommended to sample all the traffic with IPFIX, as it might generate cluster instability. If you REALLY want to do that, set this flag to true. Use at your own risk. When it is set to true, the value of sampling is ignored.

ovnKubernetes

object

ovnKubernetes defines the settings of the OVN-Kubernetes CNI, when available. This configuration is used when using OVN’s IPFIX exports, without OpenShift Container Platform. When using OpenShift Container Platform, refer to the clusterNetworkOperator property instead.

sampling

integer

sampling is the sampling rate on the reporter. 100 means one flow on 100 is sent. To ensure cluster stability, it is not possible to set a value below 2. If you really want to sample every packet, which might impact the cluster stability, refer to forceSampleAll. Alternatively, you can use the eBPF Agent instead of IPFIX.

28.9.1.8. .spec.agent.ipfix.clusterNetworkOperator

Description
clusterNetworkOperator defines the settings related to the OpenShift Container Platform Cluster Network Operator, when available.
Type
object
PropertyTypeDescription

namespace

string

Namespace where the config map is going to be deployed.

28.9.1.9. .spec.agent.ipfix.ovnKubernetes

Description
ovnKubernetes defines the settings of the OVN-Kubernetes CNI, when available. This configuration is used when using OVN’s IPFIX exports, without OpenShift Container Platform. When using OpenShift Container Platform, refer to the clusterNetworkOperator property instead.
Type
object
PropertyTypeDescription

containerName

string

containerName defines the name of the container to configure for IPFIX.

daemonSetName

string

daemonSetName defines the name of the DaemonSet controlling the OVN-Kubernetes pods.

namespace

string

Namespace where OVN-Kubernetes pods are deployed.

28.9.1.10. .spec.consolePlugin

Description
consolePlugin defines the settings related to the OpenShift Container Platform Console plugin, when available.
Type
object
PropertyTypeDescription

autoscaler

object

autoscaler spec of a horizontal pod autoscaler to set up for the plugin Deployment. Refer to HorizontalPodAutoscaler documentation (autoscaling/v2).

imagePullPolicy

string

imagePullPolicy is the Kubernetes pull policy for the image defined above

logLevel

string

logLevel for the console plugin backend

port

integer

port is the plugin service port. Do not use 9002, which is reserved for metrics.

portNaming

object

portNaming defines the configuration of the port-to-service name translation

quickFilters

array

quickFilters configures quick filter presets for the Console plugin

register

boolean

register allows, when set to true, to automatically register the provided console plugin with the OpenShift Container Platform Console operator. When set to false, you can still register it manually by editing console.operator.openshift.io/cluster with the following command: oc patch console.operator.openshift.io cluster --type='json' -p '[{"op": "add", "path": "/spec/plugins/-", "value": "netobserv-plugin"}]'

replicas

integer

replicas defines the number of replicas (pods) to start.

resources

object

resources, in terms of compute resources, required by this container. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

28.9.1.11. .spec.consolePlugin.autoscaler

Description
autoscaler spec of a horizontal pod autoscaler to set up for the plugin Deployment. Refer to HorizontalPodAutoscaler documentation (autoscaling/v2).
Type
object

28.9.1.12. .spec.consolePlugin.portNaming

Description
portNaming defines the configuration of the port-to-service name translation
Type
object
PropertyTypeDescription

enable

boolean

Enable the console plugin port-to-service name translation

portNames

object (string)

portNames defines additional port names to use in the console, for example, portNames: {"3100": "loki"}.

28.9.1.13. .spec.consolePlugin.quickFilters

Description
quickFilters configures quick filter presets for the Console plugin
Type
array

28.9.1.14. .spec.consolePlugin.quickFilters[]

Description
QuickFilter defines preset configuration for Console’s quick filters
Type
object
Required
  • filter
  • name
PropertyTypeDescription

default

boolean

default defines whether this filter should be active by default or not

filter

object (string)

filter is a set of keys and values to be set when this filter is selected. Each key can relate to a list of values using a coma-separated string, for example, filter: {"src_namespace": "namespace1,namespace2"}.

name

string

Name of the filter, that will be displayed in Console

28.9.1.15. .spec.consolePlugin.resources

Description
resources, in terms of compute resources, required by this container. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
Type
object
PropertyTypeDescription

limits

integer-or-string

Limits describes the maximum amount of compute resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

requests

integer-or-string

Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

28.9.1.16. .spec.exporters

Description
exporters define additional optional exporters for custom consumption or storage.
Type
array

28.9.1.17. .spec.exporters[]

Description
FlowCollectorExporter defines an additional exporter to send enriched flows to.
Type
object
Required
  • type
PropertyTypeDescription

ipfix

object

IPFIX configuration, such as the IP address and port to send enriched IPFIX flows to. Unsupported (*).

kafka

object

Kafka configuration, such as the address and topic, to send enriched flows to.

type

string

type selects the type of exporters. The available options are KAFKA and IPFIX. IPFIX is unsupported (*).

28.9.1.18. .spec.exporters[].ipfix

Description
IPFIX configuration, such as the IP address and port to send enriched IPFIX flows to. Unsupported (*).
Type
object
Required
  • targetHost
  • targetPort
PropertyTypeDescription

targetHost

string

Address of the IPFIX external receiver

targetPort

integer

Port for the IPFIX external receiver

transport

string

Transport protocol (TCP or UDP) to be used for the IPFIX connection, defaults to TCP.

28.9.1.19. .spec.exporters[].kafka

Description
Kafka configuration, such as the address and topic, to send enriched flows to.
Type
object
Required
  • address
  • topic
PropertyTypeDescription

address

string

Address of the Kafka server

tls

object

TLS client configuration. When using TLS, verify that the address matches the Kafka port used for TLS, generally 9093. Note that, when eBPF agents are used, the Kafka certificate needs to be copied in the agent namespace (by default it is netobserv-privileged).

topic

string

Kafka topic to use. It must exist, NetObserv will not create it.

28.9.1.20. .spec.exporters[].kafka.tls

Description
TLS client configuration. When using TLS, verify that the address matches the Kafka port used for TLS, generally 9093. Note that, when eBPF agents are used, the Kafka certificate needs to be copied in the agent namespace (by default it is netobserv-privileged).
Type
object
PropertyTypeDescription

caCert

object

caCert defines the reference of the certificate for the Certificate Authority

enable

boolean

Enable TLS

insecureSkipVerify

boolean

insecureSkipVerify allows skipping client-side verification of the server certificate. If set to true, the caCert field is ignored.

userCert

object

userCert defines the user certificate reference and is used for mTLS (you can ignore it when using one-way TLS)

28.9.1.21. .spec.exporters[].kafka.tls.caCert

Description
caCert defines the reference of the certificate for the Certificate Authority
Type
object
PropertyTypeDescription

certFile

string

certFile defines the path to the certificate file name within the config map or secret

certKey

string

certKey defines the path to the certificate private key file name within the config map or secret. Omit when the key is not necessary.

name

string

Name of the config map or secret containing certificates

namespace

string

Namespace of the config map or secret containing certificates. If omitted, assumes the same namespace as where NetObserv is deployed. If the namespace is different, the config map or the secret will be copied so that it can be mounted as required.

type

string

Type for the certificate reference: configmap or secret

28.9.1.22. .spec.exporters[].kafka.tls.userCert

Description
userCert defines the user certificate reference and is used for mTLS (you can ignore it when using one-way TLS)
Type
object
PropertyTypeDescription

certFile

string

certFile defines the path to the certificate file name within the config map or secret

certKey

string

certKey defines the path to the certificate private key file name within the config map or secret. Omit when the key is not necessary.

name

string

Name of the config map or secret containing certificates

namespace

string

Namespace of the config map or secret containing certificates. If omitted, assumes the same namespace as where NetObserv is deployed. If the namespace is different, the config map or the secret will be copied so that it can be mounted as required.

type

string

Type for the certificate reference: configmap or secret

28.9.1.23. .spec.kafka

Description
Kafka configuration, allowing to use Kafka as a broker as part of the flow collection pipeline. Available when the spec.deploymentModel is KAFKA.
Type
object
Required
  • address
  • topic
PropertyTypeDescription

address

string

Address of the Kafka server

tls

object

TLS client configuration. When using TLS, verify that the address matches the Kafka port used for TLS, generally 9093. Note that, when eBPF agents are used, the Kafka certificate needs to be copied in the agent namespace (by default it is netobserv-privileged).

topic

string

Kafka topic to use. It must exist, NetObserv will not create it.

28.9.1.24. .spec.kafka.tls

Description
TLS client configuration. When using TLS, verify that the address matches the Kafka port used for TLS, generally 9093. Note that, when eBPF agents are used, the Kafka certificate needs to be copied in the agent namespace (by default it is netobserv-privileged).
Type
object
PropertyTypeDescription

caCert

object

caCert defines the reference of the certificate for the Certificate Authority

enable

boolean

Enable TLS

insecureSkipVerify

boolean

insecureSkipVerify allows skipping client-side verification of the server certificate. If set to true, the caCert field is ignored.

userCert

object

userCert defines the user certificate reference and is used for mTLS (you can ignore it when using one-way TLS)

28.9.1.25. .spec.kafka.tls.caCert

Description
caCert defines the reference of the certificate for the Certificate Authority
Type
object
PropertyTypeDescription

certFile

string

certFile defines the path to the certificate file name within the config map or secret

certKey

string

certKey defines the path to the certificate private key file name within the config map or secret. Omit when the key is not necessary.

name

string

Name of the config map or secret containing certificates

namespace

string

Namespace of the config map or secret containing certificates. If omitted, assumes the same namespace as where NetObserv is deployed. If the namespace is different, the config map or the secret will be copied so that it can be mounted as required.

type

string

Type for the certificate reference: configmap or secret

28.9.1.26. .spec.kafka.tls.userCert

Description
userCert defines the user certificate reference and is used for mTLS (you can ignore it when using one-way TLS)
Type
object
PropertyTypeDescription

certFile

string

certFile defines the path to the certificate file name within the config map or secret

certKey

string

certKey defines the path to the certificate private key file name within the config map or secret. Omit when the key is not necessary.

name

string

Name of the config map or secret containing certificates

namespace

string

Namespace of the config map or secret containing certificates. If omitted, assumes the same namespace as where NetObserv is deployed. If the namespace is different, the config map or the secret will be copied so that it can be mounted as required.

type

string

Type for the certificate reference: configmap or secret

28.9.1.27. .spec.loki

Description
Loki, the flow store, client settings.
Type
object
PropertyTypeDescription

authToken

string

authToken describes the way to get a token to authenticate to Loki.
- DISABLED will not send any token with the request.
- FORWARD will forward the user token for authorization.
- HOST - deprecated (*) - will use the local pod service account to authenticate to Loki.
When using the Loki Operator, this must be set to FORWARD.

batchSize

integer

batchSize is the maximum batch size (in bytes) of logs to accumulate before sending.

batchWait

string

batchWait is the maximum time to wait before sending a batch.

maxBackoff

string

maxBackoff is the maximum backoff time for client connection between retries.

maxRetries

integer

maxRetries is the maximum number of retries for client connections.

minBackoff

string

minBackoff is the initial backoff time for client connection between retries.

querierUrl

string

querierURL specifies the address of the Loki querier service, in case it is different from the Loki ingester URL. If empty, the URL value will be used (assuming that the Loki ingester and querier are in the same server). When using the Loki Operator, do not set it, since ingestion and queries use the Loki gateway.

staticLabels

object (string)

staticLabels is a map of common labels to set on each flow.

statusTls

object

TLS client configuration for Loki status URL.

statusUrl

string

statusURL specifies the address of the Loki /ready, /metrics and /config endpoints, in case it is different from the Loki querier URL. If empty, the querierURL value will be used. This is useful to show error messages and some context in the frontend. When using the Loki Operator, set it to the Loki HTTP query frontend service, for example https://loki-query-frontend-http.netobserv.svc:3100/. statusTLS configuration will be used when statusUrl is set.

tenantID

string

tenantID is the Loki X-Scope-OrgID that identifies the tenant for each request. When using the Loki Operator, set it to network, which corresponds to a special tenant mode.

timeout

string

timeout is the maximum time connection / request limit. A timeout of zero means no timeout.

tls

object

TLS client configuration for Loki URL.

url

string

url is the address of an existing Loki service to push the flows to. When using the Loki Operator, set it to the Loki gateway service with the network tenant set in path, for example https://loki-gateway-http.netobserv.svc:8080/api/logs/v1/network.

28.9.1.28. .spec.loki.statusTls

Description
TLS client configuration for Loki status URL.
Type
object
PropertyTypeDescription

caCert

object

caCert defines the reference of the certificate for the Certificate Authority

enable

boolean

Enable TLS

insecureSkipVerify

boolean

insecureSkipVerify allows skipping client-side verification of the server certificate. If set to true, the caCert field is ignored.

userCert

object

userCert defines the user certificate reference and is used for mTLS (you can ignore it when using one-way TLS)

28.9.1.29. .spec.loki.statusTls.caCert

Description
caCert defines the reference of the certificate for the Certificate Authority
Type
object
PropertyTypeDescription

certFile

string

certFile defines the path to the certificate file name within the config map or secret

certKey

string

certKey defines the path to the certificate private key file name within the config map or secret. Omit when the key is not necessary.

name

string

Name of the config map or secret containing certificates

namespace

string

Namespace of the config map or secret containing certificates. If omitted, assumes the same namespace as where NetObserv is deployed. If the namespace is different, the config map or the secret will be copied so that it can be mounted as required.

type

string

Type for the certificate reference: configmap or secret

28.9.1.30. .spec.loki.statusTls.userCert

Description
userCert defines the user certificate reference and is used for mTLS (you can ignore it when using one-way TLS)
Type
object
PropertyTypeDescription

certFile

string

certFile defines the path to the certificate file name within the config map or secret

certKey

string

certKey defines the path to the certificate private key file name within the config map or secret. Omit when the key is not necessary.

name

string

Name of the config map or secret containing certificates

namespace

string

Namespace of the config map or secret containing certificates. If omitted, assumes the same namespace as where NetObserv is deployed. If the namespace is different, the config map or the secret will be copied so that it can be mounted as required.

type

string

Type for the certificate reference: configmap or secret

28.9.1.31. .spec.loki.tls

Description
TLS client configuration for Loki URL.
Type
object
PropertyTypeDescription

caCert

object

caCert defines the reference of the certificate for the Certificate Authority

enable

boolean

Enable TLS

insecureSkipVerify

boolean

insecureSkipVerify allows skipping client-side verification of the server certificate. If set to true, the caCert field is ignored.

userCert

object

userCert defines the user certificate reference and is used for mTLS (you can ignore it when using one-way TLS)

28.9.1.32. .spec.loki.tls.caCert

Description
caCert defines the reference of the certificate for the Certificate Authority
Type
object
PropertyTypeDescription

certFile

string

certFile defines the path to the certificate file name within the config map or secret

certKey

string

certKey defines the path to the certificate private key file name within the config map or secret. Omit when the key is not necessary.

name

string

Name of the config map or secret containing certificates

namespace

string

Namespace of the config map or secret containing certificates. If omitted, assumes the same namespace as where NetObserv is deployed. If the namespace is different, the config map or the secret will be copied so that it can be mounted as required.

type

string

Type for the certificate reference: configmap or secret

28.9.1.33. .spec.loki.tls.userCert

Description
userCert defines the user certificate reference and is used for mTLS (you can ignore it when using one-way TLS)
Type
object
PropertyTypeDescription

certFile

string

certFile defines the path to the certificate file name within the config map or secret

certKey

string

certKey defines the path to the certificate private key file name within the config map or secret. Omit when the key is not necessary.

name

string

Name of the config map or secret containing certificates

namespace

string

Namespace of the config map or secret containing certificates. If omitted, assumes the same namespace as where NetObserv is deployed. If the namespace is different, the config map or the secret will be copied so that it can be mounted as required.

type

string

Type for the certificate reference: configmap or secret

28.9.1.34. .spec.processor

Description
processor defines the settings of the component that receives the flows from the agent, enriches them, generates metrics, and forwards them to the Loki persistence layer and/or any available exporter.
Type
object
PropertyTypeDescription

conversationEndTimeout

string

conversationEndTimeout is the time to wait after a network flow is received, to consider the conversation ended. This delay is ignored when a FIN packet is collected for TCP flows (see conversationTerminatingTimeout instead).

conversationHeartbeatInterval

string

conversationHeartbeatInterval is the time to wait between "tick" events of a conversation

conversationTerminatingTimeout

string

conversationTerminatingTimeout is the time to wait from detected FIN flag to end a conversation. Only relevant for TCP flows.

debug

object

debug allows setting some aspects of the internal configuration of the flow processor. This section is aimed exclusively for debugging and fine-grained performance optimizations, such as GOGC and GOMAXPROCS env vars. Users setting its values do it at their own risk.

dropUnusedFields

boolean

dropUnusedFields allows, when set to true, to drop fields that are known to be unused by OVS, to save storage space.

enableKubeProbes

boolean

enableKubeProbes is a flag to enable or disable Kubernetes liveness and readiness probes

healthPort

integer

healthPort is a collector HTTP port in the Pod that exposes the health check API

imagePullPolicy

string

imagePullPolicy is the Kubernetes pull policy for the image defined above

kafkaConsumerAutoscaler

object

kafkaConsumerAutoscaler is the spec of a horizontal pod autoscaler to set up for flowlogs-pipeline-transformer, which consumes Kafka messages. This setting is ignored when Kafka is disabled. Refer to HorizontalPodAutoscaler documentation (autoscaling/v2).

kafkaConsumerBatchSize

integer

kafkaConsumerBatchSize indicates to the broker the maximum batch size, in bytes, that the consumer will accept. Ignored when not using Kafka. Default: 10MB.

kafkaConsumerQueueCapacity

integer

kafkaConsumerQueueCapacity defines the capacity of the internal message queue used in the Kafka consumer client. Ignored when not using Kafka.

kafkaConsumerReplicas

integer

kafkaConsumerReplicas defines the number of replicas (pods) to start for flowlogs-pipeline-transformer, which consumes Kafka messages. This setting is ignored when Kafka is disabled.

logLevel

string

logLevel of the processor runtime

logTypes

string

logTypes defines the desired record types to generate. Possible values are:
- FLOWS (default) to export regular network flows
- CONVERSATIONS to generate events for started conversations, ended conversations as well as periodic "tick" updates
- ENDED_CONVERSATIONS to generate only ended conversations events
- ALL to generate both network flows and all conversations events

metrics

object

Metrics define the processor configuration regarding metrics

port

integer

Port of the flow collector (host port). By convention, some values are forbidden. It must be greater than 1024 and different from 4500, 4789 and 6081.

profilePort

integer

profilePort allows setting up a Go pprof profiler listening to this port

resources

object

resources are the compute resources required by this container. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

28.9.1.35. .spec.processor.debug

Description
debug allows setting some aspects of the internal configuration of the flow processor. This section is aimed exclusively for debugging and fine-grained performance optimizations, such as GOGC and GOMAXPROCS env vars. Users setting its values do it at their own risk.
Type
object
PropertyTypeDescription

env

object (string)

env allows passing custom environment variables to underlying components. Useful for passing some very concrete performance-tuning options, such as GOGC and GOMAXPROCS, that should not be publicly exposed as part of the FlowCollector descriptor, as they are only useful in edge debug or support scenarios.

28.9.1.36. .spec.processor.kafkaConsumerAutoscaler

Description
kafkaConsumerAutoscaler is the spec of a horizontal pod autoscaler to set up for flowlogs-pipeline-transformer, which consumes Kafka messages. This setting is ignored when Kafka is disabled. Refer to HorizontalPodAutoscaler documentation (autoscaling/v2).
Type
object

28.9.1.37. .spec.processor.metrics

Description
Metrics define the processor configuration regarding metrics
Type
object
PropertyTypeDescription

disableAlerts

array (string)

disableAlerts is a list of alerts that should be disabled. Possible values are:
NetObservNoFlows, which is triggered when no flows are being observed for a certain period.
NetObservLokiError, which is triggered when flows are being dropped due to Loki errors.

ignoreTags

array (string)

ignoreTags is a list of tags to specify which metrics to ignore. Each metric is associated with a list of tags. More details in https://github.com/netobserv/network-observability-operator/tree/main/controllers/flowlogspipeline/metrics_definitions . Available tags are: egress, ingress, flows, bytes, packets, namespaces, nodes, workloads.

server

object

Metrics server endpoint configuration for Prometheus scraper

28.9.1.38. .spec.processor.metrics.server

Description
Metrics server endpoint configuration for Prometheus scraper
Type
object
PropertyTypeDescription

port

integer

The prometheus HTTP port

tls

object

TLS configuration.

28.9.1.39. .spec.processor.metrics.server.tls

Description
TLS configuration.
Type
object
PropertyTypeDescription

provided

object

TLS configuration when type is set to PROVIDED.

type

string

Select the type of TLS configuration:
- DISABLED (default) to not configure TLS for the endpoint. - PROVIDED to manually provide cert file and a key file. - AUTO to use OpenShift Container Platform auto generated certificate using annotations.

28.9.1.40. .spec.processor.metrics.server.tls.provided

Description
TLS configuration when type is set to PROVIDED.
Type
object
PropertyTypeDescription

certFile

string

certFile defines the path to the certificate file name within the config map or secret

certKey

string

certKey defines the path to the certificate private key file name within the config map or secret. Omit when the key is not necessary.

name

string

Name of the config map or secret containing certificates

namespace

string

Namespace of the config map or secret containing certificates. If omitted, assumes the same namespace as where NetObserv is deployed. If the namespace is different, the config map or the secret will be copied so that it can be mounted as required.

type

string

Type for the certificate reference: configmap or secret

28.9.1.41. .spec.processor.resources

Description
resources are the compute resources required by this container. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
Type
object
PropertyTypeDescription

limits

integer-or-string

Limits describes the maximum amount of compute resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

requests

integer-or-string

Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

28.10. Network flows format reference

These are the specifications for network flows format, used both internally and when exporting flows to Kafka.

28.10.1. Network Flows format reference

The document is organized in two main categories: Labels and regular Fields. This distinction only matters when querying Loki. This is because Labels, unlike Fields, must be used in stream selectors.

If you are reading this specification as a reference for the Kafka export feature, you must treat all Labels and Fields as regualr fields and ignore any distinctions between them that are specific to Loki.

28.10.1.1. Labels

SrcK8S_Namespace
  • Optional SrcK8S_Namespace: string

Source namespace

DstK8S_Namespace
  • Optional DstK8S_Namespace: string

Destination namespace

SrcK8S_OwnerName
  • Optional SrcK8S_OwnerName: string

Source owner, such as Deployment, StatefulSet, etc.

DstK8S_OwnerName
  • Optional DstK8S_OwnerName: string

Destination owner, such as Deployment, StatefulSet, etc.

FlowDirection
  • FlowDirection: see the following section, Enumeration: FlowDirection for more details.

Flow direction from the node observation point

_RecordType
  • Optional _RecordType: RecordType

Type of record: 'flowLog' for regular flow logs, or 'allConnections', 'newConnection', 'heartbeat', 'endConnection' for conversation tracking

28.10.1.2. Fields

SrcAddr
  • SrcAddr: string

Source IP address (ipv4 or ipv6)

DstAddr
  • DstAddr: string

Destination IP address (ipv4 or ipv6)

SrcMac
  • SrcMac: string

Source MAC address

DstMac
  • DstMac: string

Destination MAC address

SrcK8S_Name
  • Optional SrcK8S_Name: string

Name of the source matched Kubernetes object, such as Pod name, Service name, etc.

DstK8S_Name
  • Optional DstK8S_Name: string

Name of the destination matched Kubernetes object, such as Pod name, Service name, etc.

SrcK8S_Type
  • Optional SrcK8S_Type: string

Kind of the source matched Kubernetes object, such as Pod, Service, etc.

DstK8S_Type
  • Optional DstK8S_Type: string

Kind of the destination matched Kubernetes object, such as Pod name, Service name, etc.

SrcPort
  • SrcPort: number

Source port

DstPort
  • DstPort: number

Destination port

SrcK8S_OwnerType
  • Optional SrcK8S_OwnerType: string

Kind of the source Kubernetes owner, such as Deployment, StatefulSet, etc.

DstK8S_OwnerType
  • Optional DstK8S_OwnerType: string

Kind of the destination Kubernetes owner, such as Deployment, StatefulSet, etc.

SrcK8S_HostIP
  • Optional SrcK8S_HostIP: string

Source node IP

DstK8S_HostIP
  • Optional DstK8S_HostIP: string

Destination node IP

SrcK8S_HostName
  • Optional SrcK8S_HostName: string

Source node name

DstK8S_HostName
  • Optional DstK8S_HostName: string

Destination node name

Proto
  • Proto: number

L4 protocol

Interface
  • Optional Interface: string

Network interface

Packets
  • Packets: number

Number of packets in this flow

Packets_AB
  • Optional Packets_AB: number

In conversation tracking, A to B packets counter per conversation

Packets_BA
  • Optional Packets_BA: number

In conversation tracking, B to A packets counter per conversation

Bytes
  • Bytes: number

Number of bytes in this flow

Bytes_AB
  • Optional Bytes_AB: number

In conversation tracking, A to B bytes counter per conversation

Bytes_BA
  • Optional Bytes_BA: number

In conversation tracking, B to A bytes counter per conversation

TimeFlowStartMs
  • TimeFlowStartMs: number

Start timestamp of this flow, in milliseconds

TimeFlowEndMs
  • TimeFlowEndMs: number

End timestamp of this flow, in milliseconds

TimeReceived
  • TimeReceived: number

Timestamp when this flow was received and processed by the flow collector, in seconds

_HashId
  • Optional _HashId: string

In conversation tracking, the conversation identifier

_IsFirst
  • Optional _IsFirst: string

In conversation tracking, a flag identifying the first flow

numFlowLogs
  • Optional numFlowLogs: number

In conversation tracking, a counter of flow logs per conversation

28.10.1.3. Enumeration: FlowDirection

Ingress
  • Ingress = "0"

Incoming traffic, from node observation point

Egress
  • Egress = "1"

Outgoing traffic, from node observation point

28.11. Troubleshooting Network Observability

To assist in troubleshooting Network Observability issues, you can perform some troubleshooting actions.

28.11.1. Using the must-gather tool

You can use the must-gather tool to collect information about the Network Observability Operator resources and cluster-wide resources, such as pod logs, FlowCollector, and webhook configurations.

Procedure

  1. Navigate to the directory where you want to store the must-gather data.
  2. Run the following command to collect cluster-wide must-gather resources:

    $ oc adm must-gather
     --image-stream=openshift/must-gather \
     --image=quay.io/netobserv/must-gather

28.11.2. Configuring network traffic menu entry in the OpenShift Container Platform console

Manually configure the network traffic menu entry in the OpenShift Container Platform console when the network traffic menu entry is not listed in Observe menu in the OpenShift Container Platform console.

Prerequisites

  • You have installed OpenShift Container Platform version 4.10 or newer.

Procedure

  1. Check if the spec.consolePlugin.register field is set to true by running the following command:

    $ oc -n netobserv get flowcollector cluster -o yaml

    Example output

    apiVersion: flows.netobserv.io/v1alpha1
    kind: FlowCollector
    metadata:
      name: cluster
    spec:
      consolePlugin:
        register: false

  2. Optional: Add the netobserv-plugin plugin by manually editing the Console Operator config:

    $ oc edit console.operator.openshift.io cluster

    Example output

    ...
    spec:
      plugins:
      - netobserv-plugin
    ...

  3. Optional: Set the spec.consolePlugin.register field to true by running the following command:

    $ oc -n netobserv edit flowcollector cluster -o yaml

    Example output

    apiVersion: flows.netobserv.io/v1alpha1
    kind: FlowCollector
    metadata:
      name: cluster
    spec:
      consolePlugin:
        register: true

  4. Ensure the status of console pods is running by running the following command:

    $ oc get pods -n openshift-console -l app=console
  5. Restart the console pods by running the following command:

    $ oc delete pods -n openshift-console -l app=console
  6. Clear your browser cache and history.
  7. Check the status of Network Observability plugin pods by running the following command:

    $ oc get pods -n netobserv -l app=netobserv-plugin

    Example output

    NAME                                READY   STATUS    RESTARTS   AGE
    netobserv-plugin-68c7bbb9bb-b69q6   1/1     Running   0          21s

  8. Check the logs of the Network Observability plugin pods by running the following command:

    $ oc logs -n netobserv -l app=netobserv-plugin

    Example output

    time="2022-12-13T12:06:49Z" level=info msg="Starting netobserv-console-plugin [build version: , build date: 2022-10-21 15:15] at log level info" module=main
    time="2022-12-13T12:06:49Z" level=info msg="listening on https://:9001" module=server

28.11.3. Flowlogs-Pipeline does not consume network flows after installing Kafka

If you deployed the flow collector first with deploymentModel: KAFKA and then deployed Kafka, the flow collector might not connect correctly to Kafka. Manually restart the flow-pipeline pods where Flowlogs-pipeline does not consume network flows from Kafka.

Procedure

  1. Delete the flow-pipeline pods to restart them by running the following command:

    $ oc delete pods -n netobserv -l app=flowlogs-pipeline-transformer

28.11.4. Failing to see network flows from both br-int and br-ex interfaces

br-ex` and br-int are virtual bridge devices operated at OSI layer 2. The eBPF agent works at the IP and TCP levels, layers 3 and 4 respectively. You can expect that the eBPF agent captures the network traffic passing through br-ex and br-int, when the network traffic is processed by other interfaces such as physical host or virtual pod interfaces. If you restrict the eBPF agent network interfaces to attach only to br-ex and br-int, you do not see any network flow.

Manually remove the part in the interfaces or excludeInterfaces that restricts the network interfaces to br-int and br-ex.

Procedure

  1. Remove the interfaces: [ 'br-int', 'br-ex' ] field. This allows the agent to fetch information from all the interfaces. Alternatively, you can specify the Layer-3 interface for example, eth0. Run the following command:

    $ oc edit -n netobserv flowcollector.yaml -o yaml

    Example output

    apiVersion: flows.netobserv.io/v1alpha1
    kind: FlowCollector
    metadata:
      name: cluster
    spec:
      agent:
        type: EBPF
        ebpf:
          interfaces: [ 'br-int', 'br-ex' ] 1

    1
    Specifies the network interfaces.

28.11.5. Network Observability controller manager pod runs out of memory

You can increase memory limits for the Network Observability operator by patching the Cluster Service Version (CSV), where Network Observability controller manager pod runs out of memory.

Procedure

  1. Run the following command to patch the CSV:

    $ oc -n netobserv patch csv network-observability-operator.v1.0.0 --type='json' -p='[{"op": "replace", "path":"/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/limits/memory", value: "1Gi"}]'

    Example output

    clusterserviceversion.operators.coreos.com/network-observability-operator.v1.0.0 patched

  2. Run the following command to view the updated CSV:

    $ oc -n netobserv get csv network-observability-operator.v1.0.0 -o jsonpath='{.spec.install.spec.deployments[0].spec.template.spec.containers[0].resources.limits.memory}'
    1Gi