Chapter 11. Logging, events, and monitoring

11.1. Viewing virtual machine logs

11.1.1. Understanding virtual machine logs

Logs are collected for OpenShift Container Platform Builds, Deployments, and pods. In container-native virtualization, virtual machine logs can be retrieved from the virtual machine launcher Pod in either the web console or the CLI.

The -f option follows the log output in real time, which is useful for monitoring progress and error checking.

If the launcher Pod is failing to start, use the --previous option to see the logs of the last attempt.

Warning

ErrImagePull and ImagePullBackOff errors can be caused by an incorrect Deployment configuration or problems with the images that are referenced.

11.1.2. Viewing virtual machine logs in the CLI

Get virtual machine logs from the virtual machine launcher pod.

Procedure

  • User the following command:

    $ oc logs <virt-launcher-name>

11.1.3. Viewing virtual machine logs in the web console

Get virtual machine logs from the associated virtual machine launcher Pod.

Procedure

  1. In the container-native virtualization console, click WorkloadsVirtual Machines.
  2. Click the virtual machine to open the Virtual Machine Details panel.
  3. In the Overview tab, click the virt-launcher-<name> Pod in the POD section.
  4. Click Logs.

11.2. Viewing events

11.2.1. Understanding virtual machine events

OpenShift Container Platform events are records of important life-cycle information in a namespace and are useful for monitoring and troubleshooting resource scheduling, creation, and deletion issues.

Container-native virtualization adds events for virtual machines and virtual machine instances. These can be viewed from either the web console or the CLI.

See also: Viewing system event information in an OpenShift Container Platform cluster.

11.2.2. Viewing the events for a virtual machine in the web console

You can view the stream events for a running a virtual machine from the Virtual Machine Details panel of the web console.

The ▮▮ button pauses the events stream.
The ▶ button continues a paused events stream.

Procedure

  1. Click WorkloadsVirtual Machines from the side menu.
  2. Select a Virtual Machine.
  3. Click Events to view all events for the virtual machine.

11.2.3. Viewing namespace events in the CLI

Use the OpenShift Container Platform client to get the events for a namespace.

Procedure

  • In the namespace, use the oc get command:

    $ oc get events

11.2.4. Viewing resource events in the CLI

Events are included in the resource description, which you can get using the OpenShift Container Platform client.

Procedure

  • In the namespace, use the oc describe command. The following example shows how to get the events for a virtual machine, a virtual machine instance, and the virt-launcher Pod for a virtual machine:

    $ oc describe vm <vm>
    $ oc describe vmi <vmi>
    $ oc describe pod virt-launcher-<name>

11.3. Viewing information about virtual machine workloads

You can view high-level information about your virtual machines by using the Virtual Machines dashboard in the OpenShift Container Platform web console.

11.3.1. About the Virtual Machines dashboard

Access the Virtual Machines dashboard from the OpenShift Container Platform web console by navigating to the WorkloadsVirtual Machines page and selecting a virtual machine.

The dashboard includes the following cards:

  • Details provides identifying information about the virtual machine, including:

    • Name
    • Namespace
    • Date of creation
    • Node name
    • IP address
  • Inventory lists the virtual machine’s resources, including:

    • Network interface controllers (NICs)
    • Disks
  • Status includes:

    • The current status of the virtual machine
    • A note indicating whether or not the QEMU guest agent is installed on the virtual machine
  • Utilization includes charts that display usage data for:

    • CPU
    • Memory
    • Filesystem
    • Network transfer
Note

Use the drop-down list to choose a duration for the utilization data. The available options are 1 Hour, 6 Hours, and 24 Hours.

  • Events lists messages about virtual machine activity over the past hour. To view additional events, click View all.

11.4. Monitoring virtual machine health

Use this procedure to create liveness and readiness probes to monitor virtual machine health.

11.4.1. About liveness and readiness probes

When a VirtualMachineInstance (VMI) fails, liveness probes stop the VMI. Controllers such as VirtualMachine then spawn other VMIs, restoring virtual machine responsiveness.

Readiness probes tell services and endpoints that the VirtualMachineInstance is ready to receive traffic from services. If readiness probes fail, the VirtualMachineInstance is removed from applicable endpoints until the probe recovers.

11.4.2. Define an HTTP liveness probe

This procedure provides an example configuration file for defining HTTP liveness probes.

Procedure

  1. Customize a YAML configuration file to create an HTTP liveness probe, using the following code block as an example. In this example:

    • You configure a probe using spec.livenessProbe.httpGet, which queries port 1500 of the VirtualMachineInstance, after an initial delay of 120 seconds.
    • The VirtualMachineInstance installs and runs a minimal HTTP server on port 1500 using cloud-init.

      apiVersion: kubevirt.io/v1alpha3
      kind: VirtualMachineInstance
      metadata:
        labels:
          special: vmi-fedora
        name: vmi-fedora
      spec:
        domain:
          devices:
            disks:
            - disk:
                bus: virtio
              name: containerdisk
            - disk:
                bus: virtio
              name: cloudinitdisk
          resources:
            requests:
              memory: 1024M
        livenessProbe:
          initialDelaySeconds: 120
          periodSeconds: 20
          httpGet:
            port: 1500
          timeoutSeconds: 10
        terminationGracePeriodSeconds: 0
        volumes:
        - name: containerdisk
          registryDisk:
            image: kubevirt/fedora-cloud-registry-disk-demo
        - cloudInitNoCloud:
            userData: |-
              #cloud-config
              password: fedora
              chpasswd: { expire: False }
              bootcmd:
                - setenforce 0
                - dnf install -y nmap-ncat
                - systemd-run --unit=httpserver nc -klp 1500 -e '/usr/bin/echo -e HTTP/1.1 200 OK\\n\\nHello World!'
          name: cloudinitdisk
  2. Create the VirtualMachineInstance by running the following command:

    $ oc create -f <file name>.yaml

11.4.3. Define a TCP liveness probe

This procedure provides an example configuration file for defining TCP liveness probes.

Procedure

  1. Customize a YAML configuration file to create an TCP liveness probe, using this code block as an example. In this example:

    • You configure a probe using spec.livenessProbe.tcpSocket, which queries port 1500 of the VirtualMachineInstance, after an initial delay of 120 seconds.
    • The VirtualMachineInstance installs and runs a minimal HTTP server on port 1500 using cloud-init.

      apiVersion: kubevirt.io/v1alpha3
      kind: VirtualMachineInstance
      metadata:
        labels:
          special: vmi-fedora
        name: vmi-fedora
      spec:
        domain:
          devices:
            disks:
            - disk:
                bus: virtio
              name: containerdisk
            - disk:
                bus: virtio
              name: cloudinitdisk
          resources:
            requests:
              memory: 1024M
        livenessProbe:
          initialDelaySeconds: 120
          periodSeconds: 20
          tcpSocket:
            port: 1500
          timeoutSeconds: 10
        terminationGracePeriodSeconds: 0
        volumes:
        - name: containerdisk
          registryDisk:
            image: kubevirt/fedora-cloud-registry-disk-demo
        - cloudInitNoCloud:
            userData: |-
              #cloud-config
              password: fedora
              chpasswd: { expire: False }
              bootcmd:
                - setenforce 0
                - dnf install -y nmap-ncat
                - systemd-run --unit=httpserver nc -klp 1500 -e '/usr/bin/echo -e HTTP/1.1 200 OK\\n\\nHello World!'
          name: cloudinitdisk
  2. Create the VirtualMachineInstance by running the following command:

    $ oc create -f <file name>.yaml

11.4.4. Define a readiness probe

This procedure provides an example configuration file for defining readiness probes.

Procedure

  1. Customize a YAML configuration file to create a readiness probe. Readiness probes are configured in a similar manner to liveness probes. However, note the following differences in this example:

    • Readiness probes are saved using a different spec name. For example, you create a readiness probe as spec.readinessProbe instead of as spec.livenessProbe.<type-of-probe>.
    • When creating a readiness probe, you optionally set a failureThreshold and a successThreshold to switch between ready and non-ready states, should the probe succeed or fail multiple times.

      apiVersion: kubevirt.io/v1alpha3
      kind: VirtualMachineInstance
      metadata:
        labels:
          special: vmi-fedora
        name: vmi-fedora
      spec:
        domain:
          devices:
            disks:
            - disk:
                bus: virtio
              name: containerdisk
            - disk:
                bus: virtio
              name: cloudinitdisk
          resources:
            requests:
              memory: 1024M
        readinessProbe:
          httpGet:
            port: 1500
          initialDelaySeconds: 120
          periodSeconds: 20
          timeoutSeconds: 10
          failureThreshold: 3
          successThreshold: 3
        terminationGracePeriodSeconds: 0
        volumes:
        - name: containerdisk
          registryDisk:
            image: kubevirt/fedora-cloud-registry-disk-demo
        - cloudInitNoCloud:
            userData: |-
              #cloud-config
              password: fedora
              chpasswd: { expire: False }
              bootcmd:
                - setenforce 0
                - dnf install -y nmap-ncat
                - systemd-run --unit=httpserver nc -klp 1500 -e '/usr/bin/echo -e HTTP/1.1 200 OK\\n\\nHello World!'
          name: cloudinitdisk
  2. Create the VirtualMachineInstance by running the following command:

    $ oc create -f <file name>.yaml

11.5. Using the OpenShift Container Platform dashboard to get cluster information

Access the OpenShift Container Platform dashboard, which captures high-level information about the cluster, by clicking Home > Dashboards > Overview from the OpenShift Container Platform web console.

The OpenShift Container Platform dashboard provides various cluster information, captured in individual dashboard cards.

11.5.1. About the OpenShift Container Platform dashboards page

The OpenShift Container Platform dashboard consists of the following cards:

  • Details provides a brief overview of informational cluster details.

    Status include ok, error, warning, in progress, and unknown. Resources can add custom status names.

    • Cluster ID
    • Provider
    • Version
  • Cluster Inventory details number of resources and associated statuses. It is helpful when intervention is required to resolve problems, including information about:

    • Number of nodes
    • Number of pods
    • Persistent storage volume claims
    • Virtual machines (available if container-native virtualization is installed)
    • Bare metal hosts in the cluster, listed according to their state (only available in metal3 environment).
  • Cluster Health summarizes the current health of the cluster as a whole, including relevant alerts and descriptions. If container-native virtualization is installed, the overall health of container-native virtualization is diagnosed as well. If more than one subsystem is present, click See All to view the status of each subsystem.
  • Cluster Capacity charts help administrators understand when additional resources are required in the cluster. The charts contain an inner ring that displays current consumption, while an outer ring displays thresholds configured for the resource, including information about:

    • CPU time
    • Memory allocation
    • Storage consumed
    • Network resources consumed
  • Cluster Utilization shows the capacity of various resources over a specified period of time, to help administrators understand the scale and frequency of high resource consumption.
  • Events lists messages related to recent activity in the cluster, such as pod creation or virtual machine migration to another host.
  • Top Consumers helps administrators understand how cluster resources are consumed. Click on a resource to jump to a detailed page listing pods and nodes that consume the largest amount of the specified cluster resource (CPU, memory, or storage).

11.6. OpenShift Container Platform cluster monitoring, logging, and Telemetry

OpenShift Container Platform provides various resources for monitoring at the cluster level.

11.6.1. About OpenShift Container Platform cluster monitoring

OpenShift Container Platform includes a pre-configured, pre-installed, and self-updating monitoring stack that is based on the Prometheus open source project and its wider eco-system. It provides monitoring of cluster components and includes a set of alerts to immediately notify the cluster administrator about any occurring problems and a set of Grafana dashboards. The cluster monitoring stack is only supported for monitoring OpenShift Container Platform clusters.

Important

To ensure compatibility with future OpenShift Container Platform updates, configuring only the specified monitoring stack options is supported.

11.6.2. Cluster logging components

The cluster logging components are based upon Elasticsearch, Fluentd, and Kibana (EFK). The collector, Fluentd, is deployed to each node in the OpenShift Container Platform cluster. It collects all node and container logs and writes them to Elasticsearch (ES). Kibana is the centralized, web UI where users and administrators can create rich visualizations and dashboards with the aggregated data.

There are currently 5 different types of cluster logging components:

  • logStore - This is where the logs will be stored. The current implementation is Elasticsearch.
  • collection - This is the component that collects logs from the node, formats them, and stores them in the logStore. The current implementation is Fluentd.
  • visualization - This is the UI component used to view logs, graphs, charts, and so forth. The current implementation is Kibana.
  • curation - This is the component that trims logs by age. The current implementation is Curator.

For more information on cluster logging, see the OpenShift Container Platform cluster logging documentation.

11.6.3. About Telemetry

Telemetry sends a carefully chosen subset of the cluster monitoring metrics to Red Hat. These metrics are sent continuously and describe:

  • The size of an OpenShift Container Platform cluster
  • The health and status of OpenShift Container Platform components
  • The health and status of any upgrade being performed
  • Limited usage information about OpenShift Container Platform components and features
  • Summary info about alerts reported by the cluster monitoring component

This continuous stream of data is used by Red Hat to monitor the health of clusters in real time and to react as necessary to problems that impact our customers. It also allows Red Hat to roll out OpenShift Container Platform upgrades to customers so as to minimize service impact and continuously improve the upgrade experience.

This debugging information is available to Red Hat Support and engineering teams with the same restrictions as accessing data reported via support cases. All connected cluster information is used by Red Hat to help make OpenShift Container Platform better and more intuitive to use. None of the information is shared with third parties.

11.6.3.1. Information collected by Telemetry

Primary information collected by Telemetry includes:

  • The number of updates available per cluster
  • Channel and image repository used for an update
  • The number of errors that occurred during an update
  • Progress information of running updates
  • The number of machines per cluster
  • The number of CPU cores and size of RAM of the machines
  • The number of members in the etcd cluster and number of objects currently stored in the etcd cluster
  • The number of CPU cores and RAM used per machine type - infra or master
  • The number of CPU cores and RAM used per cluster
  • The number of running virtual machine instances in the cluster
  • Use of OpenShift Container Platform framework components per cluster
  • The version of the OpenShift Container Platform cluster
  • Health, condition, and status for any OpenShift Container Platform framework component that is installed on the cluster, for example Cluster Version Operator, Cluster Monitoring, Image Registry, and Elasticsearch for Logging
  • A unique random identifier that is generated during installation
  • The name of the platform that OpenShift Container Platform is deployed on, such as Amazon Web Services

Telemetry does not collect identifying information such as user names, passwords, or the names or addresses of user resources.

11.6.4. CLI troubleshooting and debugging commands

For a list of the oc client troubleshooting and debugging commands, see the OpenShift Container Platform CLI tools documentation.

11.7. Collecting container-native virtualization data for Red Hat Support

When opening a support case, it is helpful to provide debugging information about your cluster to Red Hat Support.

The must-gather tool enables you to collect diagnostic information about your OpenShift Container Platform cluster, including virtual machines and other data related to container-native virtualization.

For prompt support, supply diagnostic information for both OpenShift Container Platform and container-native virtualization.

Important

Container-native virtualization is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/.

11.7.1. About the must-gather tool

The oc adm must-gather CLI command collects the information from your cluster that is most likely needed for debugging issues, such as:

  • Resource definitions
  • Audit logs
  • Service logs

You can specify one or more images when you run the command by including the --image argument. When you specify an image, the tool collects data related to that feature or product.

When you run oc adm must-gather, a new pod is created on the cluster. The data is collected on that pod and saved in a new directory that starts with must-gather.local. This directory is created in the current working directory.

11.7.2. About collecting container-native virtualization data

You can use the oc adm must-gather CLI command to collect information about your cluster, including features and objects associated with container-native virtualization:

  • The Hyperconverged Cluster Operator namespaces (and child objects)
  • All namespaces (and their child objects) that belong to any container-native virtualization resources
  • All container-native virtualization Custom Resource Definitions (CRDs)
  • All namespaces that contain virtual machines
  • All virtual machine definitions

To collect container-native virtualization data with must-gather, you must specify the container-native virtualization image: --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel8.

11.7.3. Gathering data about specific features

You can gather debugging information about specific features by using the oc adm must-gather CLI command with the --image or --image-stream argument. The must-gather tool supports multiple images, so you can gather data about more than one feature by running a single command.

Note

To collect the default must-gather data in addition to specific feature data, add the --image-stream=openshift/must-gather argument.

Prerequisites

  • Access to the cluster as a user with the cluster-admin role.
  • The OpenShift Container Platform CLI (oc) installed.

Procedure

  1. Navigate to the directory where you want to store the must-gather data.
  2. Run the oc adm must-gather command with one or more --image or --image-stream arguments. For example, the following command gathers both the default cluster data and information specific to {VirtProductName}:

    $ oc adm must-gather \
     --image-stream=openshift/must-gather \ 1
     --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel8 2
    1
    The default OpenShift Container Platform must-gather image
    2
    The must-gather image for {VirtProductName}

    You can use the must-gather tool with additional arguments to gather data that is specifically related to cluster logging and the Cluster Logging Operator in your cluster. For cluster logging, run the following command:

    $ oc adm must-gather --image=$(oc -n openshift-logging get deployment.apps/cluster-logging-operator \
     -o jsonpath='{.spec.template.spec.containers[?(@.name == "cluster-logging-operator")].image}')

    Example 11.1. Example must-gather output for cluster logging

    ├── cluster-logging
    │  ├── clo
    │  │  ├── cluster-logging-operator-74dd5994f-6ttgt
    │  │  ├── clusterlogforwarder_cr
    │  │  ├── cr
    │  │  ├── csv
    │  │  ├── deployment
    │  │  └── logforwarding_cr
    │  ├── collector
    │  │  ├── fluentd-2tr64
    │  ├── curator
    │  │  └── curator-1596028500-zkz4s
    │  ├── eo
    │  │  ├── csv
    │  │  ├── deployment
    │  │  └── elasticsearch-operator-7dc7d97b9d-jb4r4
    │  ├── es
    │  │  ├── cluster-elasticsearch
    │  │  │  ├── aliases
    │  │  │  ├── health
    │  │  │  ├── indices
    │  │  │  ├── latest_documents.json
    │  │  │  ├── nodes
    │  │  │  ├── nodes_stats.json
    │  │  │  └── thread_pool
    │  │  ├── cr
    │  │  ├── elasticsearch-cdm-lp8l38m0-1-794d6dd989-4jxms
    │  │  └── logs
    │  │     ├── elasticsearch-cdm-lp8l38m0-1-794d6dd989-4jxms
    │  ├── install
    │  │  ├── co_logs
    │  │  ├── install_plan
    │  │  ├── olmo_logs
    │  │  └── subscription
    │  └── kibana
    │     ├── cr
    │     ├── kibana-9d69668d4-2rkvz
    ├── cluster-scoped-resources
    │  └── core
    │     ├── nodes
    │     │  ├── ip-10-0-146-180.eu-west-1.compute.internal.yaml
    │     └── persistentvolumes
    │        ├── pvc-0a8d65d9-54aa-4c44-9ecc-33d9381e41c1.yaml
    ├── event-filter.html
    ├── gather-debug.log
    └── namespaces
       ├── openshift-logging
       │  ├── apps
       │  │  ├── daemonsets.yaml
       │  │  ├── deployments.yaml
       │  │  ├── replicasets.yaml
       │  │  └── statefulsets.yaml
       │  ├── batch
       │  │  ├── cronjobs.yaml
       │  │  └── jobs.yaml
       │  ├── core
       │  │  ├── configmaps.yaml
       │  │  ├── endpoints.yaml
       │  │  ├── events
       │  │  │  ├── curator-1596021300-wn2ks.162634ebf0055a94.yaml
       │  │  │  ├── curator.162638330681bee2.yaml
       │  │  │  ├── elasticsearch-delete-app-1596020400-gm6nl.1626341a296c16a1.yaml
       │  │  │  ├── elasticsearch-delete-audit-1596020400-9l9n4.1626341a2af81bbd.yaml
       │  │  │  ├── elasticsearch-delete-infra-1596020400-v98tk.1626341a2d821069.yaml
       │  │  │  ├── elasticsearch-rollover-app-1596020400-cc5vc.1626341a3019b238.yaml
       │  │  │  ├── elasticsearch-rollover-audit-1596020400-s8d5s.1626341a31f7b315.yaml
       │  │  │  ├── elasticsearch-rollover-infra-1596020400-7mgv8.1626341a35ea59ed.yaml
       │  │  ├── events.yaml
       │  │  ├── persistentvolumeclaims.yaml
       │  │  ├── pods.yaml
       │  │  ├── replicationcontrollers.yaml
       │  │  ├── secrets.yaml
       │  │  └── services.yaml
       │  ├── openshift-logging.yaml
       │  ├── pods
       │  │  ├── cluster-logging-operator-74dd5994f-6ttgt
       │  │  │  ├── cluster-logging-operator
       │  │  │  │  └── cluster-logging-operator
       │  │  │  │     └── logs
       │  │  │  │        ├── current.log
       │  │  │  │        ├── previous.insecure.log
       │  │  │  │        └── previous.log
       │  │  │  └── cluster-logging-operator-74dd5994f-6ttgt.yaml
       │  │  ├── cluster-logging-operator-registry-6df49d7d4-mxxff
       │  │  │  ├── cluster-logging-operator-registry
       │  │  │  │  └── cluster-logging-operator-registry
       │  │  │  │     └── logs
       │  │  │  │        ├── current.log
       │  │  │  │        ├── previous.insecure.log
       │  │  │  │        └── previous.log
       │  │  │  ├── cluster-logging-operator-registry-6df49d7d4-mxxff.yaml
       │  │  │  └── mutate-csv-and-generate-sqlite-db
       │  │  │     └── mutate-csv-and-generate-sqlite-db
       │  │  │        └── logs
       │  │  │           ├── current.log
       │  │  │           ├── previous.insecure.log
       │  │  │           └── previous.log
       │  │  ├── curator-1596028500-zkz4s
       │  │  ├── elasticsearch-cdm-lp8l38m0-1-794d6dd989-4jxms
       │  │  ├── elasticsearch-delete-app-1596030300-bpgcx
       │  │  │  ├── elasticsearch-delete-app-1596030300-bpgcx.yaml
       │  │  │  └── indexmanagement
       │  │  │     └── indexmanagement
       │  │  │        └── logs
       │  │  │           ├── current.log
       │  │  │           ├── previous.insecure.log
       │  │  │           └── previous.log
       │  │  ├── fluentd-2tr64
       │  │  │  ├── fluentd
       │  │  │  │  └── fluentd
       │  │  │  │     └── logs
       │  │  │  │        ├── current.log
       │  │  │  │        ├── previous.insecure.log
       │  │  │  │        └── previous.log
       │  │  │  ├── fluentd-2tr64.yaml
       │  │  │  └── fluentd-init
       │  │  │     └── fluentd-init
       │  │  │        └── logs
       │  │  │           ├── current.log
       │  │  │           ├── previous.insecure.log
       │  │  │           └── previous.log
       │  │  ├── kibana-9d69668d4-2rkvz
       │  │  │  ├── kibana
       │  │  │  │  └── kibana
       │  │  │  │     └── logs
       │  │  │  │        ├── current.log
       │  │  │  │        ├── previous.insecure.log
       │  │  │  │        └── previous.log
       │  │  │  ├── kibana-9d69668d4-2rkvz.yaml
       │  │  │  └── kibana-proxy
       │  │  │     └── kibana-proxy
       │  │  │        └── logs
       │  │  │           ├── current.log
       │  │  │           ├── previous.insecure.log
       │  │  │           └── previous.log
       │  └── route.openshift.io
       │     └── routes.yaml
       └── openshift-operators-redhat
          ├── ...
  3. Create a compressed file from the must-gather directory that was just created in your working directory. For example, on a computer that uses a Linux operating system, run the following command:

    $ tar cvaf must-gather.tar.gz must-gather.local.5421342344627712289/ 1
    1
    Make sure to replace must-gather-local.5421342344627712289/ with the actual directory name.
  4. Attach the compressed file to your support case on the Red Hat Customer Portal.