Metering pods are restarting frequently

Solution Verified - Updated -

Issue

  • reporting-operator and presto coordinator pods are restarting intermittently.
  • Below alert is being received:
Pod openshift-metering/reporting-operator-xxx (reporting-operator-auth-proxy) is restarting xx times / xx minutes.
  • reporting-operator pod is showing too much load and ExceededMemoryLimitException errors:
time="2020-10-06T02:08:52Z" level=error msg="creating usage report FAILED!" app=metering error="presto: query failed (200 OK):
\"io.prestosql.operator.PageTransportTimeoutException: Encountered too many errors talking to a worker node. The node may have crashed or be under too much load. This is probably a transient issue, so please retry your query in a few minutes.

hive.metering.report_openshift_metering_namespace_persistentvolumeclaim_usage: presto: query failed (200 OK):
\"io.prestosql.ExceededMemoryLimitException: Query exceeded per-node user memory limit of 204.80MB [Allocated: 204.80MB, Delta: 1.61kB, Top Consumers: {PartitionedOutputOperator=224.90MB, HashBuilderOperator=204.80MB, LazyOutputBuffer=32.07MB}]\"" logID=xxx

Environment

  • Red Hat OpenShift Container Platform
    • 4.5

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content