TLS handshake fails due to large packets discarded for OpenShift 4 on Azure
Issue
- TLS handshake errors occur although TCP communication is possible. A traffic capture shows large packets being discarded (see "Diagnostic Steps").
- Unexpected ICMP fragmentation needed messages are received for direct communications happening between OpenShift nodes but without vxlan encapsulation. Requested MTU is lower than the one set in both ends and/or required by any intermediate element.
- Routing cache shows bad entries as described in "Diagnostic Steps".
- After some time, the OpenShift Cluster becomes very slow and many operators start to become unhealthy (degraded state).
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- OpenShift SDN
- 4.6
- 4.7
- OVN-Kubernetes
- 4.8
- 4.9
- 4.10
- OpenShift SDN
- Azure
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.