[OpenShift 3] Slow and failing OpenShift metrics

Issue

On growing developer cluster (around 300 namespaces, around 100 jenkins with dev pipelines for project on clusters) and alomost default installation of metrics (cassandra, hawcular), we are experiencing slow metrics (even up to minute waiting for graphs in openshift web console) and periodical (weeks) crashes of whole metrics.

We can also see BusyPoolException messages in pod-hawkular-metrics logs:

2020-12-03T16:07:18.059306593Z ^[[0m^[[31m2020-12-03 16:07:18,034 ERROR [org.hawkular.metrics.api.jaxrs.util.ApiUtils] (RxComputationScheduler-1) HAWKMETRICS200010: Failed to process request: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: hawkular-cassandra/192.168.87.231:9042 (com.datastax.driver.core.exceptions.BusyPoolException: [hawkular-cassandra/192.168.87.231] Pool is busy (no available connection and the queue has reached its max size 256)))

Environment

Red Hat OpenShift Container Platform (OCP) 3.11

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

[OpenShift 3] Slow and failing OpenShift metrics

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links