Chapter 6. Monitoring the Dev Workspace operator

This chapter describes how to configure an example monitoring stack to process metrics exposed by the Dev Workspace operator. You must enable the Dev Workspace operator to follow the instructions in this chapter. See https://access.redhat.com/documentation/en-us/red_hat_codeready_workspaces/2.15/html-single/installation_guide/index#enabling-dev-workspace-operator.adoc.

6.1. Collecting Dev Workspace operator metrics with Prometheus

This section describes how to use the Prometheus to collect, store, and query metrics about the Dev Workspace operator.

Prerequisites

  • The devworkspace-controller-metrics service is exposing metrics on port 8443.
  • The devworkspace-webhookserver service is exposing metrics on port 9443. By default, the service exposes metrics on port 9443.
  • Prometheus 2.26.0 or later is running. The Prometheus console is running on port 9090 with a corresponding service and route. See First steps with Prometheus.

Procedure

  1. Create a ClusterRoleBinding to bind the ServiceAccount associated with Prometheus to the devworkspace-controller-metrics-reader ClusterRole. Without the ClusterRoleBinding, you cannot access Dev Workspace metrics because they are protected with role-based access control (RBAC).

    Example 6.1. ClusterRole example

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: devworkspace-controller-metrics-reader
    rules:
    - nonResourceURLs:
      - /metrics
      verbs:
      - get

    Example 6.2. ClusterRoleBinding example

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: devworkspace-controller-metrics-binding
    subjects:
      - kind: ServiceAccount
        name: <ServiceAccount name associated with the Prometheus Pod>
        namespace: <Prometheus namespace>
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: devworkspace-controller-metrics-reader
  2. Configure Prometheus to scrape metrics from the 8443 port exposed by the devworkspace-controller-metrics service, and 9443 port exposed by the devworkspace-webhookserver service.

    Example 6.3. Prometheus configuration example

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: prometheus-config
    data:
      prometheus.yml: |-
          global:
            scrape_interval:     5s             1
            evaluation_interval: 5s             2
          scrape_configs:                       3
            - job_name: 'DevWorkspace'
              authorization:
                type: Bearer
                credentials_file: '/var/run/secrets/kubernetes.io/serviceaccount/token'
              tls_config:
                insecure_skip_verify: true
              static_configs:
                - targets: ['devworkspace-controller-metrics:8443']  4
            - job_name: 'DevWorkspace webhooks'
              authorization:
                type: Bearer
                credentials_file: '/var/run/secrets/kubernetes.io/serviceaccount/token'
              tls_config:
                insecure_skip_verify: true
              static_configs:
                - targets: ['devworkspace-webhookserver:9443']  5
1
Rate at which a target is scraped.
2
Rate at which recording and alerting rules are re-checked.
3
Resources that Prometheus monitors. In the default configuration, two jobs (DevWorkspace and DevWorkspace webhooks), scrape the time series data exposed by the devworkspace-controller-metrics and devworkspace-webhookserver services.
4
Scrape metrics from the 8443 port.
5
Scrape metrics from the 9443 port.

Verification steps

6.2. Dev Workspace-specific metrics

This section describes the Dev Workspace-specific metrics exposed by the devworkspace-controller-metrics service.

Table 6.1. Metrics

NameTypeDescriptionLabels

devworkspace_started_total

Counter

Number of Dev Workspace starting events.

source, routingclass

devworkspace_started_success_total

Counter

Number of Dev Workspaces successfully entering the Running phase.

source, routingclass

devworkspace_fail_total

Counter

Number of failed Dev Workspaces.

source, reason

devworkspace_startup_time

Histogram

Total time taken to start a Dev Workspace, in seconds.

source, routingclass

Table 6.2. Labels

NameDescriptionValues

source

The controller.devfile.io/devworkspace-source label of the Dev Workspace.

string

routingclass

The spec.routingclass of the Dev Workspace.

"basic|cluster|cluster-tls|web-terminal"

reason

The workspace startup failure reason.

"BadRequest|InfrastructureFailure|Unknown"

Table 6.3. Startup failure reasons

NameDescription

BadRequest

Startup failure due to an invalid devfile used to create a Dev Workspace.

InfrastructureFailure

Startup failure due to the following errors: CreateContainerError, RunContainerError, FailedScheduling, FailedMount.

Unknown

Unknown failure reason.