OpenShift 4: AWS with NLB loadbalancer connection handling issues (timeouts or rejected calls to routes)
Issue
-
Connection Resets or timeouts or failed connections to routes after moving from CLB to NLB on OpenShift running on AWS infrastructure or ROSA
-
OpenShift 4 cluster intermittently responds to new client requests with a challenge ack
(SYN - ACK - RST) pattern
instead of(SYN - SACK - ACK) pattern
on new connections. -
New connections have an intermittent chance of failure to all routes.
- Intermittent connection resets are occurring on calls to openshift made through the router pods
- NLB is in use plumbed to *.apps A-record for Ingress and forwards to router-default pods (or similar configuration for ingress-shard)
There are two issues that may be present, and both of them have the same root cause and solve:
- New connections to openshift routes via NLB loadbalancer calling a router pod may be rejected with a challenge ack packet and fail to establish successfully (must be retried).
client --> LB --> router [SYN]
router --> LB --> client [ACK] #EXPECTING A SYN/ACK!!
client --> LB --> router [RST] #abort call
- Existing connections to openshift routes via NLB loadbalancer calling a router pod may be rejected/time-out and must-be retried. (connection reset).
client --> NLB --> router [SYN]
router --> NLB --> client [SYN/ACK]
client --> NLB --> router [GET /]
router --> NLB --> client [ack 200]
client --> NLB --> router [PSH/ACK]
<...> # some time later, idle connection but no closures on either side - connection still open <...>
client --> NLB --> router [Get /exampledata] #new packet sent after 350s of idle time
router --> NLB --> client [RST] ## unexpected closure of existing session!!
Environment
- Red Hat OpenShift Container Platform (RHOCP) 4.x
- AWS Infrastructure
- Red Hat OpenShift Service on AWS (ROSA)
- NLB as primary Loadbalancer for impacted traffic (does not occur on CLB)
- Traffic impacted via route/ingresscontroller
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.