Chapter 5. Configuring pod topology spread constraints for monitoring

You can use pod topology spread constraints to control how Prometheus, Thanos Ruler, and Alertmanager pods are spread across a network topology when OpenShift Container Platform pods are deployed in multiple availability zones.

Pod topology spread constraints are suitable for controlling pod scheduling within hierarchical topologies in which nodes are spread across different infrastructure levels, such as regions and zones within those regions. Additionally, by being able to schedule pods in different zones, you can improve network latency in certain scenarios.

5.1. Setting up pod topology spread constraints for Prometheus

For core OpenShift Container Platform platform monitoring, you can set up pod topology spread constraints for Prometheus to fine tune how pod replicas are scheduled to nodes across zones. Doing so helps ensure that Prometheus pods are highly available and run more efficiently, because workloads are spread across nodes in different data centers or hierarchical infrastructure zones.

You configure pod topology spread constraints for Prometheus in the cluster-monitoring-config config map.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin cluster role.
  • You have created the cluster-monitoring-config ConfigMap object.
  • You have installed the OpenShift CLI (oc).

Procedure

  1. Edit the cluster-monitoring-config ConfigMap object in the openshift-monitoring namespace:

    $ oc -n openshift-monitoring edit configmap cluster-monitoring-config
  2. Add values for the following settings under data/config.yaml/prometheusK8s to configure pod topology spread constraints:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: |
        prometheusK8s:
          topologySpreadConstraints:
          - maxSkew: 1 1
            topologyKey: monitoring 2
            whenUnsatisfiable: DoNotSchedule 3
            labelSelector:
              matchLabels: 4
                app.kubernetes.io/name: prometheus
    1
    Specify a numeric value for maxSkew, which defines the degree to which pods are allowed to be unevenly distributed. This field is required, and the value must be greater than zero. The value specified has a different effect depending on what value you specify for whenUnsatisfiable.
    2
    Specify a key of node labels for topologyKey. This field is required. Nodes that have a label with this key and identical values are considered to be in the same topology. The scheduler will try to put a balanced number of pods into each domain.
    3
    Specify a value for whenUnsatisfiable. This field is required. Available options are DoNotSchedule and ScheduleAnyway. Specify DoNotSchedule if you want the maxSkew value to define the maximum difference allowed between the number of matching pods in the target topology and the global minimum. Specify ScheduleAnyway if you want the scheduler to still schedule the pod but to give higher priority to nodes that might reduce the skew.
    4
    Specify a value for matchLabels. This value is used to identify the set of matching pods to which to apply the constraints.
  3. Save the file to apply the changes automatically.

    Warning

    When you save changes to the cluster-monitoring-config config map, the pods and other resources in the openshift-monitoring project might be redeployed. The running monitoring processes in that project might also restart.

5.2. Setting up pod topology spread constraints for Alertmanager

For core OpenShift Container Platform platform monitoring, you can set up pod topology spread constraints for Alertmanager to fine tune how pod replicas are scheduled to nodes across zones. Doing so helps ensure that Alertmanager pods are highly available and run more efficiently, because workloads are spread across nodes in different data centers or hierarchical infrastructure zones.

You configure pod topology spread constraints for Alertmanager in the cluster-monitoring-config config map.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin cluster role.
  • You have created the cluster-monitoring-config ConfigMap object.
  • You have installed the OpenShift CLI (oc).

Procedure

  1. Edit the cluster-monitoring-config ConfigMap object in the openshift-monitoring namespace:

    $ oc -n openshift-monitoring edit configmap cluster-monitoring-config
  2. Add values for the following settings under data/config.yaml/alertmanagermain to configure pod topology spread constraints:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: |
        alertmanagerMain:
          topologySpreadConstraints:
          - maxSkew: 1 1
            topologyKey: monitoring 2
            whenUnsatisfiable: DoNotSchedule 3
            labelSelector:
              matchLabels: 4
                app.kubernetes.io/name: alertmanager
    1
    Specify a numeric value for maxSkew, which defines the degree to which pods are allowed to be unevenly distributed. This field is required, and the value must be greater than zero. The value specified has a different effect depending on what value you specify for whenUnsatisfiable.
    2
    Specify a key of node labels for topologyKey. This field is required. Nodes that have a label with this key and identical values are considered to be in the same topology. The scheduler will try to put a balanced number of pods into each domain.
    3
    Specify a value for whenUnsatisfiable. This field is required. Available options are DoNotSchedule and ScheduleAnyway. Specify DoNotSchedule if you want the maxSkew value to define the maximum difference allowed between the number of matching pods in the target topology and the global minimum. Specify ScheduleAnyway if you want the scheduler to still schedule the pod but to give higher priority to nodes that might reduce the skew.
    4
    Specify a value for matchLabels. This value is used to identify the set of matching pods to which to apply the constraints.
  3. Save the file to apply the changes automatically.

    Warning

    When you save changes to the cluster-monitoring-config config map, the pods and other resources in the openshift-monitoring project might be redeployed. The running monitoring processes in that project might also restart.

5.3. Setting up pod topology spread constraints for Thanos Ruler

For user-defined monitoring, you can set up pod topology spread constraints for Thanos Ruler to fine tune how pod replicas are scheduled to nodes across zones. Doing so helps ensure that Thanos Ruler pods are highly available and run more efficiently, because workloads are spread across nodes in different data centers or hierarchical infrastructure zones.

You configure pod topology spread constraints for Thanos Ruler in the user-workload-monitoring-config config map.

Prerequisites

  • A cluster administrator has enabled monitoring for user-defined projects.
  • You have access to the cluster as a user with the cluster-admin cluster role, or as a user with the user-workload-monitoring-config-edit role in the openshift-user-workload-monitoring project.
  • You have created the user-workload-monitoring-config ConfigMap object.
  • You have installed the OpenShift CLI (oc).

Procedure

  1. Edit the user-workload-monitoring-config config map in the openshift-user-workload-monitoring namespace:

    $ oc -n openshift-user-workload-monitoring edit configmap user-workload-monitoring-config
  2. Add values for the following settings under data/config.yaml/thanosRuler to configure pod topology spread constraints:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: user-workload-monitoring-config
      namespace: openshift-user-workload-monitoring
    data:
      config.yaml: |
        thanosRuler:
          topologySpreadConstraints:
          - maxSkew: 1 1
            topologyKey: monitoring 2
            whenUnsatisfiable: ScheduleAnyway 3
            labelSelector:
              matchLabels: 4
                app.kubernetes.io/name: thanos-ruler
    1
    Specify a numeric value for maxSkew, which defines the degree to which pods are allowed to be unevenly distributed. This field is required, and the value must be greater than zero. The value specified has a different effect depending on what value you specify for whenUnsatisfiable.
    2
    Specify a key of node labels for topologyKey. This field is required. Nodes that have a label with this key and identical values are considered to be in the same topology. The scheduler will try to put a balanced number of pods into each domain.
    3
    Specify a value for whenUnsatisfiable. This field is required. Available options are DoNotSchedule and ScheduleAnyway. Specify DoNotSchedule if you want the maxSkew value to define the maximum difference allowed between the number of matching pods in the target topology and the global minimum. Specify ScheduleAnyway if you want the scheduler to still schedule the pod but to give higher priority to nodes that might reduce the skew.
    4
    Specify a value for matchLabels. This value is used to identify the set of matching pods to which to apply the constraints.
  3. Save the file to apply the changes automatically.

    Warning

    When you save changes to the user-workload-monitoring-config config map, the pods and other resources in the openshift-user-workload-monitoring project might be redeployed. The running monitoring processes in that project might also restart.

5.4. Setting log levels for monitoring components

You can configure the log level for Alertmanager, Prometheus Operator, Prometheus, Thanos Querier, and Thanos Ruler.

The following log levels can be applied to the relevant component in the cluster-monitoring-config and user-workload-monitoring-config ConfigMap objects:

  • debug. Log debug, informational, warning, and error messages.
  • info. Log informational, warning, and error messages.
  • warn. Log warning and error messages only.
  • error. Log error messages only.

The default log level is info.

Prerequisites

  • If you are setting a log level for Alertmanager, Prometheus Operator, Prometheus, or Thanos Querier in the openshift-monitoring project:

    • You have access to the cluster as a user with the cluster-admin cluster role.
    • You have created the cluster-monitoring-config ConfigMap object.
  • If you are setting a log level for Prometheus Operator, Prometheus, or Thanos Ruler in the openshift-user-workload-monitoring project:

    • You have access to the cluster as a user with the cluster-admin cluster role, or as a user with the user-workload-monitoring-config-edit role in the openshift-user-workload-monitoring project.
    • You have created the user-workload-monitoring-config ConfigMap object.
  • You have installed the OpenShift CLI (oc).

Procedure

  1. Edit the ConfigMap object:

    • To set a log level for a component in the openshift-monitoring project:

      1. Edit the cluster-monitoring-config ConfigMap object in the openshift-monitoring project:

        $ oc -n openshift-monitoring edit configmap cluster-monitoring-config
      2. Add logLevel: <log_level> for a component under data/config.yaml:

        apiVersion: v1
        kind: ConfigMap
        metadata:
          name: cluster-monitoring-config
          namespace: openshift-monitoring
        data:
          config.yaml: |
            <component>: 1
              logLevel: <log_level> 2
        1
        The monitoring stack component for which you are setting a log level. For default platform monitoring, available component values are prometheusK8s, alertmanagerMain, prometheusOperator, and thanosQuerier.
        2
        The log level to set for the component. The available values are error, warn, info, and debug. The default value is info.
    • To set a log level for a component in the openshift-user-workload-monitoring project:

      1. Edit the user-workload-monitoring-config ConfigMap object in the openshift-user-workload-monitoring project:

        $ oc -n openshift-user-workload-monitoring edit configmap user-workload-monitoring-config
      2. Add logLevel: <log_level> for a component under data/config.yaml:

        apiVersion: v1
        kind: ConfigMap
        metadata:
          name: user-workload-monitoring-config
          namespace: openshift-user-workload-monitoring
        data:
          config.yaml: |
            <component>: 1
              logLevel: <log_level> 2
        1
        The monitoring stack component for which you are setting a log level. For user workload monitoring, available component values are alertmanager, prometheus, prometheusOperator, and thanosRuler.
        2
        The log level to apply to the component. The available values are error, warn, info, and debug. The default value is info.
  2. Save the file to apply the changes. The pods for the component restart automatically when you apply the log-level change.

    Note

    Configurations applied to the user-workload-monitoring-config ConfigMap object are not activated unless a cluster administrator has enabled monitoring for user-defined projects.

    Warning

    When changes are saved to a monitoring config map, the pods and other resources in the related project might be redeployed. The running monitoring processes in that project might also be restarted.

  3. Confirm that the log-level has been applied by reviewing the deployment or pod configuration in the related project. The following example checks the log level in the prometheus-operator deployment in the openshift-user-workload-monitoring project:

    $ oc -n openshift-user-workload-monitoring get deploy prometheus-operator -o yaml | grep "log-level"

    Example output

            - --log-level=debug

  4. Check that the pods for the component are running. The following example lists the status of pods in the openshift-user-workload-monitoring project:

    $ oc -n openshift-user-workload-monitoring get pods
    Note

    If an unrecognized logLevel value is included in the ConfigMap object, the pods for the component might not restart successfully.

5.5. Enabling the query log file for Prometheus

You can configure Prometheus to write all queries that have been run by the engine to a log file. You can do so for default platform monitoring and for user-defined workload monitoring.

Important

Because log rotation is not supported, only enable this feature temporarily when you need to troubleshoot an issue. After you finish troubleshooting, disable query logging by reverting the changes you made to the ConfigMap object to enable the feature.

Prerequisites

  • If you are enabling the query log file feature for Prometheus in the openshift-monitoring project:

    • You have access to the cluster as a user with the cluster-admin cluster role.
    • You have created the cluster-monitoring-config ConfigMap object.
  • If you are enabling the query log file feature for Prometheus in the openshift-user-workload-monitoring project:

    • You have access to the cluster as a user with the cluster-admin cluster role, or as a user with the user-workload-monitoring-config-edit role in the openshift-user-workload-monitoring project.
    • You have created the user-workload-monitoring-config ConfigMap object.
  • You have installed the OpenShift CLI (oc).

Procedure

  • To set the query log file for Prometheus in the openshift-monitoring project:

    1. Edit the cluster-monitoring-config ConfigMap object in the openshift-monitoring project:

      $ oc -n openshift-monitoring edit configmap cluster-monitoring-config
    2. Add queryLogFile: <path> for prometheusK8s under data/config.yaml:

      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: cluster-monitoring-config
        namespace: openshift-monitoring
      data:
        config.yaml: |
          prometheusK8s:
            queryLogFile: <path> 1
      1
      The full path to the file in which queries will be logged.
    3. Save the file to apply the changes.

      Warning

      When you save changes to a monitoring config map, pods and other resources in the related project might be redeployed. The running monitoring processes in that project might also be restarted.

    4. Verify that the pods for the component are running. The following sample command lists the status of pods in the openshift-monitoring project:

      $ oc -n openshift-monitoring get pods
    5. Read the query log:

      $ oc -n openshift-monitoring exec prometheus-k8s-0 -- cat <path>
      Important

      Revert the setting in the config map after you have examined the logged query information.

  • To set the query log file for Prometheus in the openshift-user-workload-monitoring project:

    1. Edit the user-workload-monitoring-config ConfigMap object in the openshift-user-workload-monitoring project:

      $ oc -n openshift-user-workload-monitoring edit configmap user-workload-monitoring-config
    2. Add queryLogFile: <path> for prometheus under data/config.yaml:

      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: user-workload-monitoring-config
        namespace: openshift-user-workload-monitoring
      data:
        config.yaml: |
          prometheus:
            queryLogFile: <path> 1
      1
      The full path to the file in which queries will be logged.
    3. Save the file to apply the changes.

      Note

      Configurations applied to the user-workload-monitoring-config ConfigMap object are not activated unless a cluster administrator has enabled monitoring for user-defined projects.

      Warning

      When you save changes to a monitoring config map, pods and other resources in the related project might be redeployed. The running monitoring processes in that project might also be restarted.

    4. Verify that the pods for the component are running. The following example command lists the status of pods in the openshift-user-workload-monitoring project:

      $ oc -n openshift-user-workload-monitoring get pods
    5. Read the query log:

      $ oc -n openshift-user-workload-monitoring exec prometheus-user-workload-0 -- cat <path>
      Important

      Revert the setting in the config map after you have examined the logged query information.

Additional resources

5.6. Enabling query logging for Thanos Querier

For default platform monitoring in the openshift-monitoring project, you can enable the Cluster Monitoring Operator to log all queries run by Thanos Querier.

Important

Because log rotation is not supported, only enable this feature temporarily when you need to troubleshoot an issue. After you finish troubleshooting, disable query logging by reverting the changes you made to the ConfigMap object to enable the feature.

Prerequisites

  • You have installed the OpenShift CLI (oc).
  • You have access to the cluster as a user with the cluster-admin cluster role.
  • You have created the cluster-monitoring-config ConfigMap object.

Procedure

You can enable query logging for Thanos Querier in the openshift-monitoring project:

  1. Edit the cluster-monitoring-config ConfigMap object in the openshift-monitoring project:

    $ oc -n openshift-monitoring edit configmap cluster-monitoring-config
  2. Add a thanosQuerier section under data/config.yaml and add values as shown in the following example:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: |
        thanosQuerier:
          enableRequestLogging: <value> 1
          logLevel: <value> 2
    1
    Set the value to true to enable logging and false to disable logging. The default value is false.
    2
    Set the value to debug, info, warn, or error. If no value exists for logLevel, the log level defaults to error.
  3. Save the file to apply the changes.

    Warning

    When you save changes to a monitoring config map, pods and other resources in the related project might be redeployed. The running monitoring processes in that project might also be restarted.

Verification

  1. Verify that the Thanos Querier pods are running. The following sample command lists the status of pods in the openshift-monitoring project:

    $ oc -n openshift-monitoring get pods
  2. Run a test query using the following sample commands as a model:

    $ token=`oc create token prometheus-k8s -n openshift-monitoring`
    $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=cluster_version'
  3. Run the following command to read the query log:

    $ oc -n openshift-monitoring logs <thanos_querier_pod_name> -c thanos-query
    Note

    Because the thanos-querier pods are highly available (HA) pods, you might be able to see logs in only one pod.

  4. After you examine the logged query information, disable query logging by changing the enableRequestLogging value to false in the config map.

Additional resources