Chapter 3. Monitoring Camel K operator
Red Hat Integration - Camel K monitoring is based on the OpenShift monitoring system. This chapter explains how to use the available options for monitoring Red Hat Integration - Camel K operator at runtime. You can use the Prometheus Operator that is already deployed as part of OpenShift Monitoring to monitor your own applications.
3.1. Camel K Operator metrics
The Camel K operator monitoring endpoint exposes the following metrics:
Table 3.1. Camel K operator metrics
| Name | Type | Description | Buckets | Labels |
|---|---|---|---|---|
|
|
| Reconciliation request duration | 0.25s, 0.5s, 1s, 5s |
|
|
|
| Build duration | 30s, 1m, 1.5m, 2m, 5m, 10m |
|
|
|
| Build recovery attempts | 0, 1, 2, 3, 4, 5 |
|
|
|
| Build queue duration | 5s, 15s, 30s, 1m, 5m, | N/A |
|
|
| Time to first integration readiness | 5s, 10s, 30s, 1m, 2m | N/A |
3.2. Enabling Camel K Operator monitoring
OpenShift 4.3 or higher includes an embedded Prometheus Operator already deployed as part of OpenShift Monitoring. This section explains how to enable monitoring of your own application services in OpenShift Monitoring.
Prerequisites
- You must have cluster administrator access to an OpenShift cluster on which the Camel K Operator is installed. See Installing Camel K.
- You must have already enabled monitoring of your own services in OpenShift. See Enabling user workload monitoring in OpenShift.
Procedure
Create a
PodMonitorresource targeting the operator metrics endpoint, so that the Prometheus server can scrape the metrics exposed by the operator.operator-pod-monitor.yaml
apiVersion: monitoring.coreos.com/v1 kind: PodMonitor metadata: name: camel-k-operator labels: app: "camel-k" camel.apache.org/component: operator spec: selector: matchLabels: app: "camel-k" camel.apache.org/component: operator podMetricsEndpoints: - port: metricsCreate
PodMonitorresource.oc apply -f operator-pod-monitor.yaml
Additional Resources
- For more information about the discovery mechanism and the relationship between the operator resources see Prometheus Operator getting started guide.
-
In case your operator metrics are not discovered, you can find more information in Troubleshooting ServiceMonitor changes, which also applies to
PodMonitorresources troubleshooting.
3.3. Camel K operator alerts
You can create a PrometheusRule resource so that the AlertManager instance from the OpenShift monitoring stack can trigger alerts, based on the metrics exposed by the Camel K operator.
Example
You can create a PrometheusRule resource with alerting rules based on the exposed metrics as shown below.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: camel-k-operator
spec:
groups:
- name: camel-k-operator
rules:
- alert: CamelKReconciliationDuration
expr: |
(
1 - sum(rate(camel_k_reconciliation_duration_seconds_bucket{le="0.5"}[5m])) by (job)
/
sum(rate(camel_k_reconciliation_duration_seconds_count[5m])) by (job)
)
* 100
> 10
for: 1m
labels:
severity: warning
annotations:
message: |
{{ printf "%0.0f" $value }}% of the reconciliation requests
for {{ $labels.job }} have their duration above 0.5s.
- alert: CamelKReconciliationFailure
expr: |
sum(rate(camel_k_reconciliation_duration_seconds_count{result="Errored"}[5m])) by (job)
/
sum(rate(camel_k_reconciliation_duration_seconds_count[5m])) by (job)
* 100
> 1
for: 10m
labels:
severity: warning
annotations:
message: |
{{ printf "%0.0f" $value }}% of the reconciliation requests
for {{ $labels.job }} have failed.
- alert: CamelKSuccessBuildDuration2m
expr: |
(
1 - sum(rate(camel_k_build_duration_seconds_bucket{le="120",result="Succeeded"}[5m])) by (job)
/
sum(rate(camel_k_build_duration_seconds_count{result="Succeeded"}[5m])) by (job)
)
* 100
> 10
for: 1m
labels:
severity: warning
annotations:
message: |
{{ printf "%0.0f" $value }}% of the successful builds
for {{ $labels.job }} have their duration above 2m.
- alert: CamelKSuccessBuildDuration5m
expr: |
(
1 - sum(rate(camel_k_build_duration_seconds_bucket{le="300",result="Succeeded"}[5m])) by (job)
/
sum(rate(camel_k_build_duration_seconds_count{result="Succeeded"}[5m])) by (job)
)
* 100
> 1
for: 1m
labels:
severity: critical
annotations:
message: |
{{ printf "%0.0f" $value }}% of the successful builds
for {{ $labels.job }} have their duration above 5m.
- alert: CamelKBuildFailure
expr: |
sum(rate(camel_k_build_duration_seconds_count{result="Failed"}[5m])) by (job)
/
sum(rate(camel_k_build_duration_seconds_count[5m])) by (job)
* 100
> 1
for: 10m
labels:
severity: warning
annotations:
message: |
{{ printf "%0.0f" $value }}% of the builds for {{ $labels.job }} have failed.
- alert: CamelKBuildError
expr: |
sum(rate(camel_k_build_duration_seconds_count{result="Error"}[5m])) by (job)
/
sum(rate(camel_k_build_duration_seconds_count[5m])) by (job)
* 100
> 1
for: 10m
labels:
severity: critical
annotations:
message: |
{{ printf "%0.0f" $value }}% of the builds for {{ $labels.job }} have errored.
- alert: CamelKBuildQueueDuration1m
expr: |
(
1 - sum(rate(camel_k_build_queue_duration_seconds_bucket{le="60"}[5m])) by (job)
/
sum(rate(camel_k_build_queue_duration_seconds_count[5m])) by (job)
)
* 100
> 1
for: 1m
labels:
severity: warning
annotations:
message: |
{{ printf "%0.0f" $value }}% of the builds for {{ $labels.job }}
have been queued for more than 1m.
- alert: CamelKBuildQueueDuration5m
expr: |
(
1 - sum(rate(camel_k_build_queue_duration_seconds_bucket{le="300"}[5m])) by (job)
/
sum(rate(camel_k_build_queue_duration_seconds_count[5m])) by (job)
)
* 100
> 1
for: 1m
labels:
severity: critical
annotations:
message: |
{{ printf "%0.0f" $value }}% of the builds for {{ $labels.job }}
have been queued for more than 5m.Camel K operator alerts
Following table shows the alerting rules that are defined in the PrometheusRule resource.
| Name | Severity | Description |
|---|---|---|
|
| warning | More than 10% of the reconciliation requests have their duration above 0.5s over at least 1 min. |
|
| warning | More than 1% of the reconciliation requests have failed over at least 10 min. |
|
| warning | More than 10% of the successful builds have their duration above 2 min over at least 1 min. |
|
| critical | More than 1% of the successful builds have their duration above 5 min over at least 1 min. |
|
| critical | More than 1% of the builds have errored over at least 10 min. |
|
| warning | More than 1% of the builds have been queued for more than 1 min over at least 1 min. |
|
| critical | More than 1% of the builds have been queued for more than 5 min over at least 1 min. |
You can find more information about alerts in Creating alerting rules from the OpenShift documentation.