[RHOCP 4] Ingress Router pods retarting frequently causing intermittent performance issues

Solution Verified - Updated -

Issue

  • Router pods restarting very frequently causing long delays (performance issue) and cluster usage issues for end users in production. Users are seeing intermittent performance issues accessing the application API.

  • Router pods log shows below logs before they got restarted in the openshift-ingress project:

2022-08-31T15:17:42.999012044Z E0831 15:17:42.998942       1 haproxy.go:442] unexpected error while reading CSV: read unix @->/var/lib/haproxy/run/haproxy.sock.19.tmp: i/o timeout
2022-08-31T15:17:49.174222897Z I0831 15:17:49.174154       1 template.go:704] router "msg"="Shutdown requested, waiting 45s for new connections to cease"  
2022-08-31T15:17:52.472351742Z E0831 15:17:52.472285       1 limiter.go:165] error reloading router: exit status 1
2022-08-31T15:17:52.472351742Z  - Checking http://localhost:80 ...
2022-08-31T15:17:52.472351742Z  - Exceeded max wait time (30) in health check - 56 retry attempt(s).
2022-08-31T15:17:53.627488236Z E0831 15:17:53.627409       1 haproxy.go:442] unexpected error while reading CSV: read unix @->/var/lib/haproxy/run/haproxy.sock.19.tmp: i/o timeout
  • openshift-ingress project shows Readiness and Liveness probe failures for router pods.
$ omg get events

LAST SEEN  TYPE     REASON      OBJECT                               MESSAGE
12m        Normal   Pulled      pod/router-default-786c4d686f-8mxct  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:141de49d17f735d45f9da4fdc375617aadc09b3136c802b05d3d299bd8fd0c62" already present on machine
12m        Normal   Created     pod/router-default-786c4d686f-8mxct  Created container router
12m        Normal   Started     pod/router-default-786c4d686f-8mxct  Started container router
12m        Warning  ProbeError  pod/router-default-786c4d686f-8mxct  Readiness probe error: Get "http://localhost:1936/healthz/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
                                                                     body:
12m        Warning  Unhealthy   pod/router-default-786c4d686f-8mxct  Readiness probe failed: Get "http://localhost:1936/healthz/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
12m        Warning  ProbeError  pod/router-default-786c4d686f-8mxct  Liveness probe error: Get "http://localhost:1936/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
                                                                     body:
12m        Warning  Unhealthy   pod/router-default-786c4d686f-8mxct  Liveness probe failed: Get "http://localhost:1936/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
12m        Normal   Killing     pod/router-default-786c4d686f-8mxct  Container router failed liveness probe, will be restarted

Environment

  • Red Hat OpenShift Container Platform (RHOCP) 4

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content