[OpenShift 3] Slow and failing OpenShift metrics

Solution In Progress - Updated -

Issue

On growing developer cluster (around 300 namespaces, around 100 jenkins with dev pipelines for project on clusters) and alomost default installation of metrics (cassandra, hawcular), we are experiencing slow metrics (even up to minute waiting for graphs in openshift web console) and periodical (weeks) crashes of whole metrics.

We can also see BusyPoolException messages in pod-hawkular-metrics logs:

2020-12-03T16:07:18.059306593Z ^[[0m^[[31m2020-12-03 16:07:18,034 ERROR [org.hawkular.metrics.api.jaxrs.util.ApiUtils] (RxComputationScheduler-1) HAWKMETRICS200010: Failed to process request: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: hawkular-cassandra/192.168.87.231:9042 (com.datastax.driver.core.exceptions.BusyPoolException: [hawkular-cassandra/192.168.87.231] Pool is busy (no available connection and the queue has reached its max size 256)))

Environment

  • Red Hat OpenShift Container Platform (OCP) 3.11

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content