Connectivity issues after upgrading to OpenShift Container Platform 4.6.9
Issue
- After upgrading OpenShift Container Platform from 4.6.6 to 4.6.9 we experience random connectivity issues in certain Pods.
-
For example, the
authentication-operator
is reporting the following error messages:[..] status: conditions: - lastTransitionTime: '2021-01-07T09:38:22Z' message: >- OAuthRouteCheckEndpointAccessibleControllerDegraded: Get "https://oauth-openshift.apps.openshift.example.com/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) reason: AsExpected status: 'False' type: Degraded - lastTransitionTime: '2021-01-07T09:34:03Z' reason: AsExpected status: 'False' type: Progressing - lastTransitionTime: '2021-01-07T10:23:00Z' message: >- OAuthRouteCheckEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.openshift.example.com/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) reason: OAuthRouteCheckEndpointAccessibleController_EndpointUnavailable status: 'False' type: Available [..]
-
Other symptoms include DNS errors such as "
no such host
" in multiple components:E0107 11:30:48.746431 1 base_controller.go:250] "OAuthRouteCheckEndpointAccessibleController" controller failed to sync "key", err: Get "https://oauth-openshift.apps.openshift.example.com/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) [..] 2021-01-07T16:24:40Z auth: failed to get latest auth source data: request to OAuth issuer endpoint https://oauth-openshift.apps.openshift.example.com/oauth/token failed: Head "https://oauth-openshift.apps.openshift.example.com": context deadline exceeded (Client.Timeout exceeded while awaiting headers) 2021-01-07T16:30:13Z Failed to dial backend: 'dial tcp: lookup kubernetes.default.svc on 10.140.0.90:53: no such host' 2021-01-07T16:30:33Z Failed to dial backend: 'dial tcp: lookup kubernetes.default.svc on 10.140.0.90:53: no such host' 2021-01-07T16:30:59Z Failed to dial backend: 'dial tcp: lookup kubernetes.default.svc on 10.140.0.90:53: no such host'
-
In the SDN Pods, the following error messages are visible and Pods may appear stuck in the
ContainerCreating
phase:Error executing ovs-ofctl: ovs-ofctl: -:2: 0/0: invalid IP address
-
The issue only appears on OpenShift Container Platform clusters that are using
NetworkPolicies
.
Environment
- Red Hat OpenShift Container Platform (OCP) 4.6.9
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.