Hawkular metrics restarted due to performance problems

Solution Verified - Updated -

Issue

The hawkular-metrics pod is restarted frequently due to performance problems. The container is marked as unhealthy since the liveness probe fails within the configured timeout, several errors are reported to the container log pointing out to performance issues while connecting to the Cassandra database.
The following errors are reported to the container log:

  • The time took to complete the POST request exceeded the default treshold.
^[[0m^[[33m2018-10-01 12:15:03,032 WARN  [org.hawkular.metrics.api.jaxrs.log.time.RequestTimeLogger] (RxComputationScheduler-3) Request POST /hawkular/metrics/m/stats/query took: 12968 ms, exceeds 10000 ms threshold, tenant-id: project1-pro:86341117-ae64-11e7-9951-02010ac97bd9
  • /status endpoint (the one used for the liveness and readiness probes) is failing.
^[[0m^[[31m2018-10-01 12:19:00,870 ERROR [org.jboss.resteasy.resteasy_jaxrs.i18n] (RxComputationScheduler-2) RESTEASY002020: Unhandled asynchronous exception, sending back 500: org.jboss.resteasy.spi.UnhandledException: RESTEASY003770: Response is committed, can't handle exception

openshift-infra_hawkular-metrics-d55bp.txt:2018-10-01 12:52:54,132 ERROR [io.undertow.request] (default task-4) UT005023: Exception handling request to /hawkular/metrics/status: org.jboss.resteasy.spi.UnhandledException: RESTEASY003770: Response is committed, can't handle exception

This could be a complex issue and find the root cause could be a little bit complicated, the most common causes are:

  • Performance issue while fetching data from metrics_tags_idx table in Cassandra, this Bugzilla describes the problem and provides a workaround to fix the issue.
  • Garbage Collection high activity due to a high heap pressure in the Hawkular metrics JVM, can seriously degrade the performance.
  • NFS as the backend for Cassandra database, is not recommended to use NAS storage with production workloads, even corruption may occur. This is already documented on the OpenShift documentation.

Environment

  • OpenShift Container Platform
    • 3.6

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content