[RHOCP 4] Ingress Router pods restarting frequently causing intermittent performance issues
Issue
-
Router pods restarting very frequently causing long delays (performance issue) and cluster usage issues for end users in production. Users are seeing intermittent performance issues accessing the application API.
-
Router pods log shows below logs before they got restarted in the
openshift-ingress
project:
2022-08-31T15:17:42.999012044Z E0831 15:17:42.998942 1 haproxy.go:442] unexpected error while reading CSV: read unix @->/var/lib/haproxy/run/haproxy.sock.19.tmp: i/o timeout
2022-08-31T15:17:49.174222897Z I0831 15:17:49.174154 1 template.go:704] router "msg"="Shutdown requested, waiting 45s for new connections to cease"
2022-08-31T15:17:52.472351742Z E0831 15:17:52.472285 1 limiter.go:165] error reloading router: exit status 1
2022-08-31T15:17:52.472351742Z - Checking http://localhost:80 ...
2022-08-31T15:17:52.472351742Z - Exceeded max wait time (30) in health check - 56 retry attempt(s).
2022-08-31T15:17:53.627488236Z E0831 15:17:53.627409 1 haproxy.go:442] unexpected error while reading CSV: read unix @->/var/lib/haproxy/run/haproxy.sock.19.tmp: i/o timeout
openshift-ingress
project shows Readiness and Liveness probe failures for router pods.
$ omg get events
LAST SEEN TYPE REASON OBJECT MESSAGE
12m Normal Pulled pod/router-default-786c4d686f-8mxct Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:141de49d17f735d45f9da4fdc375617aadc09b3136c802b05d3d299bd8fd0c62" already present on machine
12m Normal Created pod/router-default-786c4d686f-8mxct Created container router
12m Normal Started pod/router-default-786c4d686f-8mxct Started container router
12m Warning ProbeError pod/router-default-786c4d686f-8mxct Readiness probe error: Get "http://localhost:1936/healthz/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
body:
12m Warning Unhealthy pod/router-default-786c4d686f-8mxct Readiness probe failed: Get "http://localhost:1936/healthz/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
12m Warning ProbeError pod/router-default-786c4d686f-8mxct Liveness probe error: Get "http://localhost:1936/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
body:
12m Warning Unhealthy pod/router-default-786c4d686f-8mxct Liveness probe failed: Get "http://localhost:1936/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
12m Normal Killing pod/router-default-786c4d686f-8mxct Container router failed liveness probe, will be restarted
Environment
- Red Hat OpenShift Container Platform (RHOCP) 4
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.