Chapter 14. Monitoring Data Grid services

Data Grid exposes metrics that can be used by Prometheus and Grafana for monitoring and visualizing the cluster state.

Note

This documentation explains how to set up monitoring on OpenShift Container Platform. If you’re working with community Prometheus deployments, you might find these instructions useful as a general guide. However you should refer to the Prometheus documentation for installation and usage instructions.

See the Prometheus Operator documentation.

14.1. Creating a Prometheus service monitor

Data Grid Operator automatically creates a Prometheus ServiceMonitor that scrapes metrics from your Data Grid cluster.

Procedure

Enable monitoring for user-defined projects on OpenShift Container Platform.

When the Operator detects an Infinispan CR with the monitoring annotation set to true, which is the default, Data Grid Operator does the following:

  • Creates a ServiceMonitor named <cluster_name>-monitor.
  • Adds the infinispan.org/monitoring: 'true' annotation to your Infinispan CR metadata, if the value is not already explicitly set:

    apiVersion: infinispan.org/v1
    kind: Infinispan
    metadata:
      name: infinispan
      annotations:
        infinispan.org/monitoring: 'true'
Note

To authenticate with Data Grid, Prometheus uses the operator credentials.

Verification

You can check that Prometheus is scraping Data Grid metrics as follows:

  1. In the OpenShift Web Console, select the </> Developer perspective and then select Monitoring.
  2. Open the Dashboard tab for the namespace where your Data Grid cluster runs.
  3. Open the Metrics tab and confirm that you can query Data Grid metrics such as:

    vendor_cache_manager_default_cluster_size

14.1.1. Disabling the Prometheus service monitor

You can disable the ServiceMonitor if you do not want Prometheus to scrape metrics for your Data Grid cluster.

Procedure

  1. Set 'false' as the value for the infinispan.org/monitoring annotation in your Infinispan CR.

    apiVersion: infinispan.org/v1
    kind: Infinispan
    metadata:
      name: infinispan
      annotations:
        infinispan.org/monitoring: 'false'
  2. Apply the changes.

14.1.2. Configuring Service Monitor Target Labels

You can configure the generated ServiceMonitor to propagate Service labels to the underlying metrics using the ServiceMonitor spec.targetLabels field. Use the Service labels to filter and aggregate the metrics collected from the monitored endpoints.

Procedure

  1. Define labels to apply to your service by setting the infinispan.org/targetLabels annotation in your Infinispan CR.
  2. Specify a comma-separated list of the labels required in your metrics using the infinispan.org/serviceMonitorTargetLabels annotation on your Infinispan CR.

    apiVersion: infinispan.org/v1
    kind: Infinispan
    metadata:
      name: infinispan
      annotations:
        infinispan.org/targetLabels: "label1,label2,label3"
        infinispan.org/serviceMonitorTargetLabels: "label1,label2"
  3. Apply the changes.

14.2. Installing the Grafana Operator

To support various needs, Data Grid Operator integrates with the community version of the Grafana Operator to create dashboards for Data Grid services.

Until Grafana is integrated with OpenShift user workload monitoring, the only option is to rely on the community version. You can install the Grafana Operator on OpenShift from the OperatorHub and should create a subscription for the alpha channel.

However, as is the policy for all Community Operators, Red Hat does not certify the Grafana Operator and does not provide support for it in combination with Data Grid. When you install the Grafana Operator you are prompted to acknowledge a warning about the community version before you can continue.

14.3. Creating Grafana data sources

Create a GrafanaDatasource CR so you can visualize Data Grid metrics in Grafana dashboards.

Prerequisites

  • Have an oc client.
  • Have cluster-admin access to OpenShift Container Platform.
  • Enable monitoring for user-defined projects on OpenShift Container Platform.
  • Install the Grafana Operator from the alpha channel and create a Grafana CR.

Procedure

  1. Create a ServiceAccount that lets Grafana read Data Grid metrics from Prometheus.

    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: infinispan-monitoring
    1. Apply the ServiceAccount.

      oc apply -f service-account.yaml
    2. Grant cluster-monitoring-view permissions to the ServiceAccount.

      oc adm policy add-cluster-role-to-user cluster-monitoring-view -z infinispan-monitoring
  2. Create a Grafana data source.

    1. Retrieve the token for the ServiceAccount.

      oc serviceaccounts get-token infinispan-monitoring
    2. Define a GrafanaDataSource that includes the token in the spec.datasources.secureJsonData.httpHeaderValue1 field, as in the following example:

      apiVersion: integreatly.org/v1alpha1
      kind: GrafanaDataSource
      metadata:
        name: grafanadatasource
      spec:
        name: datasource.yaml
        datasources:
          - access: proxy
            editable: true
            isDefault: true
            jsonData:
              httpHeaderName1: Authorization
              timeInterval: 5s
              tlsSkipVerify: true
            name: Prometheus
            secureJsonData:
              httpHeaderValue1: >-
                Bearer
                eyJhbGciOiJSUzI1NiIsImtpZCI6Imc4O...
            type: prometheus
            url: 'https://thanos-querier.openshift-monitoring.svc.cluster.local:9091'
  3. Apply the GrafanaDataSource.

    oc apply -f grafana-datasource.yaml

Next steps

Enable Grafana dashboards with the Data Grid Operator configuration properties.

14.4. Configuring Data Grid dashboards

Data Grid Operator provides global configuration properties that let you configure Grafana dashboards for Data Grid clusters.

Note

You can modify global configuration properties while Data Grid Operator is running.

Prerequisites

  • Data Grid Operator must watch the namespace where the Grafana Operator is running.

Procedure

  1. Create a ConfigMap named infinispan-operator-config in the Data Grid Operator namespace.

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: infinispan-operator-config
    data:
      grafana.dashboard.namespace: infinispan
      grafana.dashboard.name: infinispan
      grafana.dashboard.monitoring.key: middleware
  2. Specify the namespace of your Data Grid cluster with the data.grafana.dashboard.namespace property.

    Note

    Deleting the value for this property removes the dashboard. Changing the value moves the dashboard to that namespace.

  3. Specify a name for the dashboard with the data.grafana.dashboard.name property.
  4. If necessary, specify a monitoring key with the data.grafana.dashboard.monitoring.key property.
  5. Create infinispan-operator-config or update the configuration.

    oc apply -f infinispan-operator-config.yaml
  6. Open the Grafana UI, which is available at:

    oc get routes grafana-route -o jsonpath=https://"{.spec.host}"

14.5. Enabling JMX remote ports for Data Grid clusters

Enable JMX remote ports to expose Data Grid MBeans and to integrate Data Grid with external monitoring systems such as Cryostat.

When you enable JMX for Data Grid cluster, the following occurs:

  1. Each Data Grid server pod exposes an authenticated JMX endpoint on port 9999 utilizing the "admin" security-realm, which includes the Operator user credentials.
  2. The <cluster-name>-admin Service exposes port 9999.
Note

You can enable or disable JMX only during the creation of the Infinispan CR. Once the CR instance is created, you cannot modify the JMX settings.

Procedure

  1. Enable JMX in your Infinispan CR.

    apiVersion: infinispan.org/v1
    kind: Infinispan
    metadata:
      name: infinispan
    spec:
      jmx:
        enabled: true
  2. Retrieve the Operator user credentials to authenticate client JMX connections.

    oc get secret infinispan-generated-operator-secret -o jsonpath="{.data.identities\.yaml}" | base64 --decode

Additional resources

14.6. Setting up JFR recordings with Cryostat

Enable JDK Flight Recorder (JFR) monitoring for your Data Grid clusters that run on OpenShift.

JFR recordings with Cryostat

JFR provides insights into various aspects of JVM performance to ease cluster inspection and debugging. Depending on your requirements, you can store and analyze your recordings using the integrated tools provided by Cryostat or export the recordings to an external monitoring application.

Prerequisites

  • Install the Cryostat Operator. You can install the Cryostat Operator in your OpenShift project by using Operator Lifecycle Manager (OLM).
  • Have JMX enabled on your Data Grid cluster. You must enable JMX before deploying the cluster, as JMX settings cannot be modified after deployment.

Procedure

  1. Create a Cryostat CR in the same namespace as your Infinispan CR.

    apiVersion: operator.cryostat.io/v1beta1
    kind: Cryostat
    metadata:
      name: cryostat-sample
    spec:
      minimal: false
      enableCertManager: true
    Note

    The Cryostat Operator requires cert-manager for traffic encryption. If the cert-manager is enabled but not installed, the deployment fails. For details, see the Installing Cryostat guide.

  2. Wait for the Cryostat CR to be ready.

    oc wait -n <namespace> --for=condition=MainDeploymentAvailable cryostat/cryostat-sample
  3. Open the Cryostat status.applicationUrl.

    oc -n <namespace> get cryostat cryostat-sample
  4. Retrieve the Operator user credentials to authenticate client JMX connections in the Cryostat UI.

    oc get secret infinispan-generated-operator-secret -o jsonpath="{.data.identities\.yaml}" | base64 --decode
  5. In the Cryostat UI, navigate to the Security menu.
  6. In the Store Credentials window, click the Add button. The Store Credentials window opens.
  7. In the Match Expression filed, enter match expression details in the following format:

    target.labels['infinispan_cr'] == '<cluster_name>'