Monitoring your OpenShift cluster health with Insights Advisor
Using Insights Advisor to monitor your OpenShift cluster health
Abstract
Chapter 1. About Red Hat Insights Advisor for OpenShift Container Platform
Use Red Hat Insights Advisor for OpenShift Container Platform to identify and solve issues with your clusters.
1.1. About Red Hat Insights Advisor for OpenShift Container Platform
You can use Insights Advisor to assess and monitor the health of your OpenShift Container Platform clusters. Whether you are concerned about individual clusters, or with your whole infrastructure, it is important to be aware of your exposure to issues that can affect service availability, fault tolerance, performance, or security.
Insights repeatedly analyzes the data that Insights Operator sends using a database of recommendations, which are sets of conditions that can leave your OpenShift Container Platform clusters at risk. Your data is then uploaded to the Insights Advisor service on Red Hat Hybrid Cloud Console where you can perform the following actions:
- See clusters impacted by a specific recommendation.
- Use robust filtering capabilities to refine your results to those recommendations.
- Learn more about individual recommendations, details about the risks they present, and get resolutions tailored to your individual clusters.
- Share results with other stakeholders.
To use Insights Advisor, your cluster must be registered to OpenShift Cluster Manager. To register a disconnected cluster, see Registering OpenShift Container Platform clusters to OpenShift Cluster Manager.
Additional resources
- Insights Advisor does not collect identifying information, such as user names, passwords, or certificates. See Red Hat Insights Data & Application Security for information about Red Hat Insights data collection and controls.
- For more information on how Insights Advisor gathers data from OpenShift, see the OpenShift Container Platform documentation:
- About remote health monitoring
- Showing data collected by remote health monitoring
- Opting out of remote health reporting
1.2. Understanding Insights Advisor recommendations
Insights Advisor bundles information about various cluster states and component configurations that can negatively affect the service availability, fault tolerance, performance, or security of your clusters. This information set is called a recommendation in Insights Advisor and includes the following information:
- Name: A concise description of the recommendation
- Added: When the recommendation was published to the Insights Advisor archive
- Category: Whether the issue has the potential to negatively affect service availability, fault tolerance, performance, or security
- Total risk: A value derived from the likelihood that the condition will negatively affect your infrastructure, and the impact on operations if that were to happen
- Clusters: A list of clusters on which a recommendation is detected
- Link to associated topics: More information from Red Hat about the issue
Chapter 2. Using Red Hat Insights Advisor for OpenShift Container Platform
Insights Advisor repeatedly analyzes the data Insights Operator sends. You can view and manage reports showing Insights Advisor data for your OpenShift Container Platform cluster from the Insights Advisor service on Red Hat Hybrid Cloud Console.
2.1. Displaying potential issues with your cluster
This section describes how to display the Insights report in Insights Advisor on Red Hat Hybrid Cloud Console.
Note that Insights repeatedly analyzes your cluster and shows the latest results. These results can change, for example, if you fix an issue or a new issue has been detected.
Prerequisites
- Your cluster is registered with OpenShift Cluster Manager.
- Remote health reporting is enabled, which is the default.
- You are logged in to Red Hat Hybrid Cloud Console.
Procedure
Navigate to Advisor → Recommendations on Red Hat Hybrid Cloud Console.
Depending on the result, Insights Advisor displays one of the following:
- No matching recommendations found, if Insights did not identify any issues.
- A list of issues Insights has detected, grouped by risk (low, moderate, important, and critical).
- No clusters yet, if Insights has not yet analyzed the cluster. The analysis starts shortly after the cluster has been installed, registered, and connected to the internet.
If any issues are displayed, click the > icon in front of the entry for more details.
Depending on the issue, the details can also contain a link to more information from Red Hat about the issue.
2.2. Displaying all Insights Advisor recommendations
The Recommendations view, by default, only displays the recommendations that are detected on your clusters. However, you can view all of the recommendations in the Insights Advisor archive.
Prerequisites
- Remote health reporting is enabled, which is the default.
- Your cluster is registered with OpenShift Cluster Manager.
- You are logged in to Red Hat Hybrid Cloud Console.
Procedure
- Navigate to Advisor → Recommendations on Red Hat Hybrid Cloud Console.
Click the X icons next to the Clusters Impacted and Status filters.
You can now browse through all of the potential recommendations for your cluster.
2.3. Disabling Insights Advisor recommendations
You can disable specific recommendations that affect your clusters, so that they no longer appear in your reports. It is possible to disable a recommendation for a single cluster or all of your clusters.
Disabling a recommendation for all of your clusters also applies to any future clusters.
Prerequisites
- Remote health reporting is enabled, which is the default.
- Your cluster is registered with OpenShift Cluster Manager.
- You are logged in to Red Hat Hybrid Cloud Console.
Procedure
- Navigate to Advisor → Recommendations on Red Hat Hybrid Cloud Console.
To disable the recommendation for a single cluster:
- Click the name of the recommendation to disable. You are directed to the single recommendation page.
-
Click the Options menu
for that cluster, and then click Disable recommendation for cluster.
- Enter a justification note and click Save.
To disable the recommendation for all of your clusters:
- Click the name of the recommendation to disable. You are directed to the single recommendation page.
- Click Actions → Disable recommendation.
- Enter a justification note and click Save.
2.4. Enabling a previously disabled Insights Advisor recommendation
When a recommendation is disabled for all clusters, you will no longer see the recommendation in Insights Advisor. You can change this behavior.
Prerequisites
- Remote health reporting is enabled, which is the default.
- Your cluster is registered with OpenShift Cluster Manager.
- You are logged in to Red Hat Hybrid Cloud Console.
Procedure
- Navigate to Advisor → Recommendations on Red Hat Hybrid Cloud Console.
- Filter the recommendations by Status → Disabled.
- Locate the recommendation to enable.
-
Click the Options menu
, and then click Enable recommendation.
2.5. Displaying the Insights Advisor status in the web console
Insights Advisor repeatedly analyzes your cluster and you can display the status of identified potential issues of your cluster in the OpenShift Container Platform web console. This status shows the number of issues in the different categories and, for further details, links to the reports in Red Hat Hybrid Cloud Console.
Prerequisites
- Your cluster is registered with OpenShift Cluster Manager.
- Remote health reporting is enabled, which is the default.
- You are logged in to the OpenShift Container Platform web console.
Procedure
- Navigate to Home → Overview in the OpenShift Container Platform web console.
Click Insights on the Status card.
The pop-up window lists potential issues grouped by risk. Click the individual categories or View all recommendations in Insights Advisor to display more details.
2.6. Using update-risk assessment to identify and mitigate cluster-update risks
The Red Hat Insights advisor service assesses the risk of cluster-update failure. Update-risk assessment uses machine learning developed in collaboration with IBM Research to compare the recent state of the cluster with conditions known to cause updates to fail.
Managing updates in complex, production Kubernetes environments is a challenging task. Over 60 independently-working components usually form the infrastructure of such environments and each component has a different operational state and configuration, which can cause minor and major version updates to fail.
Update-risk assessment shows you a list of risks present in your cluster, including failing operator conditions, alerts, and other metrics. The assessment also provides links to specific information about each issue. You can use the update-risk feature to generate a checklist of issues to fix before beginning a cluster update.
Prerequisites
- Cluster is connected to the Red Hat Hybrid Cloud Console using instructions in OpenShift, Remote Health Monitoring documentation, Enabling remote health reporting.
- Cluster has sent data to Red Hat within the last two hours.
- You are logged into the Red Hat Hybrid Cloud Console.
Procedure
- Navigate to https://console.redhat.com/openshift/insights/advisor/clusters.
Click on a cluster to view cluster details.
- If update risks exist for the selected cluster, a “Resolve update risks” banner is visible.
- If no risks exist for the cluster, a banner displays the message, "No known update risks identified for this cluster."
- If the cluster has not checked in for more than two hours, the banner message says "Warning alert:Update risks are not currently available. This cluster has gone more than two hours without sending metrics. Check the cluster’s web console if you think that this is incorrect."
- Click the Update risks tab.
- If update risks are detected, view alerts or cluster operator risks for the cluster.
- Click on an alert to open the in-cluster Alert details page for that alert in the Red Hat OpenShift web console.
- Click a cluster operator to open the in-cluster, ClusterOperator details page in the Red Hat OpenShift web console.
For more information about alerts, see OpenShift documentation, Getting information about alerts, silences, and alerting rules.
2.7. Using the Deployment Validation Operator in your Red Hat Insights for OpenShift workflow
The Deployment Validation Operator (DVO) validates on-premises and managed clusters against a curated collection of KubeLinter checks. These checks implement best practices for Kubernetes-native workloads, helping to ensure that applications are optimized for the operational stability of the cluster.
The Insights Operator gathers the DVO checks every two hours by default and presents data in the Red Hat Hybrid Cloud Console, Insights Advisor service. If the DVO detects issues, cluster administrators can view resolutions in the Insights Advisor service. If the DVO detects no issues, no results are visible in the Insights Advisor service.
Curated KubeLinter checks
The Deployment Validation Operator (DVO) checks a curated, limited collection of all of the available KubeLinter checks. The DVO does not execute the whole list of available KubeLinter checks. Insights Advisor service recommendations do not exist for all available KubeLinter checks.
Supported OpenShift Container Platform versions
All Red Hat-supported versions of OpenShift Container Platform, OpenShift Dedicated, and Red Hat OpenShift Service on AWS support the Deployment Validation Operator (DVO).
Managed OpenShift clusters have the Deployment Validation Operator (DVO) installed and operational, by default. On-premesis clusters must download the DVO from OperatorHub and can modify the default list of checks.
Additional resources
2.7.1. The Deployment Validation Operator (DVO) on managed OpenShift clusters
The Deployment Validation Operator (DVO) is already installed and operational on managed OpenShift clusters. This includes clusters on OpenShift Dedicated and Red Hat OpenShift Service on AWS.
DVO Configuration
The DVO for managed clusters comes preconfigured, by default. The DVO configuration file contains the default, curated set of KubeLinter checks and is not editable.
DVO Updates
On managed clusters, the DVO updates automatically.
2.7.2. The Deployment Validation Operator (DVO) on on-premises OpenShift clusters
Administrators of on-premises clusters must install the Deployment Validation Operator (DVO) from OperatorHub in the OpenShift web console. On-premises cluster administrators can also configure the default set of KubeLinter checks.
The Insights Advisor service does not have recommendations for all of the checks that KubeLinter has available.
2.7.2.1. Installing the Deployment Validation Operator (DVO) on on-premises Openshift clusters
You can find the DVO in OperatorHub and install it from there.
Prerequisites
- You are logged into the Red Hat OpenShift web console as a cluster administrator.
Procedure
- Navigate to Red Hat OpenShift web console > Operators > OperatorHub.
- In the Search box, start typing “deployment-validation-operator."
- Click on the DVO card when you see it.
- When you see the pop-up window appear, click Continue to proceed.
- The Deployment Validation Operator card displays information about capabilities, configuration, version, and GitHub source files. When you are ready to install the DVO, click Install.
- Choose the namespace or use the default.
- Click Install and the Operator installs. You can confirm the installation in the InstalledOperators view. The DVO is also visible in the Pods and Deployments views for the cluster in the corresponding namespace.
2.7.2.2. Configuring the list of Deployment Validation Operator (DVO) checks on on-premises clusters
Administrators of on-premises OpenShift clusters can change the default list of DVO checks to focus on specific best practices of interest. Refer to the section, Configuring Checks, in the DVO technical documentation in GitHub.
2.7.2.3. Updating the Deployment Validation Operator (DVO) on on-premises clusters
You can set the DVO to automatically update during the installation from OperatorHub.
2.7.3. Viewing Deployment Validation Operator (DVO) results in the Insights Advisor service
If the DVO detects issues, the Overview page for the cluster in the OpenShift web console shows an Insights link with the number of detected issues in the Status information block. Click the link to learn more about the issue and how to resolve it in the Insights Advisor service in the Red Hat Hybrid Cloud Console.
If the latest DVO returns no issues, you will not see any issues in Insights.
Prerequisites
- You are logged into the Red Hat Hybrid Cloud Console.
Procedure
- Navigate to the Overview page for the cluster in the OpenShift web console.
- Look for the Status block, an Insights link, and the number of detected issues.
If one or more issues exist, click the Insights link to open the Insights Advisor service in the Hybrid Cloud Console.
The link takes you to Insights > Advisor > Clusters, to the cluster information page.
- In the Recommendations tab, you can see the issues detected on the cluster. Click the arrow to view complete information about the issue, including the necessary actions to resolve it.
2.7.4. Viewing the default list of Deployment Validation Operator (DVO) checks
The Deployment Validation Operator (DVO) checks your cluster against a default list of checks. You can view the list of DVO checks in GitHub.
Administrators of managed clusters cannot modify the list of default checks, but can view them using the link above.