Metering pods are restarting frequently
Issue
reporting-operator
andpresto coordinator
pods are restarting intermittently.- Below alert is being received:
Pod openshift-metering/reporting-operator-xxx (reporting-operator-auth-proxy) is restarting xx times / xx minutes.
- reporting-operator pod is showing
too much load
andExceededMemoryLimitException
errors:
time="2020-10-06T02:08:52Z" level=error msg="creating usage report FAILED!" app=metering error="presto: query failed (200 OK):
\"io.prestosql.operator.PageTransportTimeoutException: Encountered too many errors talking to a worker node. The node may have crashed or be under too much load. This is probably a transient issue, so please retry your query in a few minutes.
hive.metering.report_openshift_metering_namespace_persistentvolumeclaim_usage: presto: query failed (200 OK):
\"io.prestosql.ExceededMemoryLimitException: Query exceeded per-node user memory limit of 204.80MB [Allocated: 204.80MB, Delta: 1.61kB, Top Consumers: {PartitionedOutputOperator=224.90MB, HashBuilderOperator=204.80MB, LazyOutputBuffer=32.07MB}]\"" logID=xxx
Environment
- Red Hat OpenShift Container Platform
- 4.5
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.