Chapter 3. Monitoring Camel K operator

Red Hat Integration - Camel K monitoring is based on the OpenShift monitoring system. This chapter explains how to use the available options for monitoring Red Hat Integration - Camel K operator at runtime. You can use the Prometheus Operator that is already deployed as part of OpenShift Monitoring to monitor your own applications.

3.1. Camel K Operator metrics

The Camel K operator monitoring endpoint exposes the following metrics:

Table 3.1. Camel K operator metrics

NameTypeDescriptionBucketsLabels

camel_k_reconciliation_duration_seconds

HistogramVec

Reconciliation request duration

0.25s, 0.5s, 1s, 5s

namespace, group, version, kind, result: Reconciled|Errored|Requeued, tag: ""|PlatformError|UserError

camel_k_build_duration_seconds

HistogramVec

Build duration

30s, 1m, 1.5m, 2m, 5m, 10m

result: Succeeded|Error

camel_k_build_recovery_attempts

Histogram

Build recovery attempts

0, 1, 2, 3, 4, 5

result: Succeeded|Error

camel_k_build_queue_duration_seconds

Histogram

Build queue duration

5s, 15s, 30s, 1m, 5m,

N/A

camel_k_integration_first_readiness_seconds

Histogram

Time to first integration readiness

5s, 10s, 30s, 1m, 2m

N/A

3.2. Enabling Camel K Operator monitoring

OpenShift 4.3 or higher includes an embedded Prometheus Operator already deployed as part of OpenShift Monitoring. This section explains how to enable monitoring of your own application services in OpenShift Monitoring.

Prerequisites

Procedure

  1. Create a PodMonitor resource targeting the operator metrics endpoint, so that the Prometheus server can scrape the metrics exposed by the operator.

    operator-pod-monitor.yaml

    apiVersion: monitoring.coreos.com/v1
    kind: PodMonitor
    metadata:
      name: camel-k-operator
      labels:
        app: "camel-k"
        camel.apache.org/component: operator
    spec:
      selector:
        matchLabels:
          app: "camel-k"
          camel.apache.org/component: operator
      podMetricsEndpoints:
        - port: metrics

  2. Create PodMonitor resource.

    oc apply -f operator-pod-monitor.yaml

Additional Resources

3.3. Camel K operator alerts

You can create a PrometheusRule resource so that the AlertManager instance from the OpenShift monitoring stack can trigger alerts, based on the metrics exposed by the Camel K operator.

Example

You can create a PrometheusRule resource with alerting rules based on the exposed metrics as shown below.

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
 name: camel-k-operator
spec:
 groups:
   - name: camel-k-operator
     rules:
       - alert: CamelKReconciliationDuration
         expr: |
           (
           1 - sum(rate(camel_k_reconciliation_duration_seconds_bucket{le="0.5"}[5m])) by (job)
           /
           sum(rate(camel_k_reconciliation_duration_seconds_count[5m])) by (job)
           )
           * 100
           > 10
         for: 1m
         labels:
           severity: warning
         annotations:
           message: |
             {{ printf "%0.0f" $value }}% of the reconciliation requests
             for {{ $labels.job }} have their duration above 0.5s.
       - alert: CamelKReconciliationFailure
         expr: |
           sum(rate(camel_k_reconciliation_duration_seconds_count{result="Errored"}[5m])) by (job)
           /
           sum(rate(camel_k_reconciliation_duration_seconds_count[5m])) by (job)
           * 100
           > 1
         for: 10m
         labels:
           severity: warning
         annotations:
           message: |
             {{ printf "%0.0f" $value }}% of the reconciliation requests
             for {{ $labels.job }} have failed.
       - alert: CamelKSuccessBuildDuration2m
         expr: |
           (
           1 - sum(rate(camel_k_build_duration_seconds_bucket{le="120",result="Succeeded"}[5m])) by (job)
           /
           sum(rate(camel_k_build_duration_seconds_count{result="Succeeded"}[5m])) by (job)
           )
           * 100
           > 10
         for: 1m
         labels:
           severity: warning
         annotations:
           message: |
             {{ printf "%0.0f" $value }}% of the successful builds
             for {{ $labels.job }} have their duration above 2m.
       - alert: CamelKSuccessBuildDuration5m
         expr: |
           (
           1 - sum(rate(camel_k_build_duration_seconds_bucket{le="300",result="Succeeded"}[5m])) by (job)
           /
           sum(rate(camel_k_build_duration_seconds_count{result="Succeeded"}[5m])) by (job)
           )
           * 100
           > 1
         for: 1m
         labels:
           severity: critical
         annotations:
           message: |
             {{ printf "%0.0f" $value }}% of the successful builds
             for {{ $labels.job }} have their duration above 5m.
       - alert: CamelKBuildFailure
         expr: |
           sum(rate(camel_k_build_duration_seconds_count{result="Failed"}[5m])) by (job)
           /
           sum(rate(camel_k_build_duration_seconds_count[5m])) by (job)
           * 100
           > 1
         for: 10m
         labels:
           severity: warning
         annotations:
           message: |
             {{ printf "%0.0f" $value }}% of the builds for {{ $labels.job }} have failed.
       - alert: CamelKBuildError
         expr: |
           sum(rate(camel_k_build_duration_seconds_count{result="Error"}[5m])) by (job)
           /
           sum(rate(camel_k_build_duration_seconds_count[5m])) by (job)
           * 100
           > 1
         for: 10m
         labels:
           severity: critical
         annotations:
           message: |
             {{ printf "%0.0f" $value }}% of the builds for {{ $labels.job }} have errored.
       - alert: CamelKBuildQueueDuration1m
         expr: |
           (
           1 - sum(rate(camel_k_build_queue_duration_seconds_bucket{le="60"}[5m])) by (job)
           /
           sum(rate(camel_k_build_queue_duration_seconds_count[5m])) by (job)
           )
           * 100
           > 1
         for: 1m
         labels:
           severity: warning
         annotations:
           message: |
             {{ printf "%0.0f" $value }}% of the builds for {{ $labels.job }}
             have been queued for more than 1m.
       - alert: CamelKBuildQueueDuration5m
         expr: |
           (
           1 - sum(rate(camel_k_build_queue_duration_seconds_bucket{le="300"}[5m])) by (job)
           /
           sum(rate(camel_k_build_queue_duration_seconds_count[5m])) by (job)
           )
           * 100
           > 1
         for: 1m
         labels:
           severity: critical
         annotations:
           message: |
             {{ printf "%0.0f" $value }}% of the builds for {{ $labels.job }}
             have been queued for more than 5m.

Camel K operator alerts

Following table shows the alerting rules that are defined in the PrometheusRule resource.

NameSeverityDescription

CamelKReconciliationDuration

warning

More than 10% of the reconciliation requests have their duration above 0.5s over at least 1 min.

CamelKReconciliationFailure

warning

More than 1% of the reconciliation requests have failed over at least 10 min.

CamelKSuccessBuildDuration2m

warning

More than 10% of the successful builds have their duration above 2 min over at least 1 min.

CamelKSuccessBuildDuration5m

critical

More than 1% of the successful builds have their duration above 5 min over at least 1 min.

CamelKBuildError

critical

More than 1% of the builds have errored over at least 10 min.

CamelKBuildQueueDuration1m

warning

More than 1% of the builds have been queued for more than 1 min over at least 1 min.

CamelKBuildQueueDuration5m

critical

More than 1% of the builds have been queued for more than 5 min over at least 1 min.

You can find more information about alerts in Creating alerting rules from the OpenShift documentation.