Connectivity issues after upgrading to OpenShift Container Platform 4.6.9

Solution Verified - Updated -

Issue

  • After upgrading OpenShift Container Platform from 4.6.6 to 4.6.9 we experience random connectivity issues in certain Pods.
  • For example, the authentication-operator is reporting the following error messages:

    [..]
    status:
      conditions:
        - lastTransitionTime: '2021-01-07T09:38:22Z'
          message: >-
            OAuthRouteCheckEndpointAccessibleControllerDegraded: Get
            "https://oauth-openshift.apps.openshift.example.com/healthz": context
            deadline exceeded (Client.Timeout exceeded while awaiting headers)
          reason: AsExpected
          status: 'False'
          type: Degraded
        - lastTransitionTime: '2021-01-07T09:34:03Z'
          reason: AsExpected
          status: 'False'
          type: Progressing
        - lastTransitionTime: '2021-01-07T10:23:00Z'
          message: >-
            OAuthRouteCheckEndpointAccessibleControllerAvailable: Get
            "https://oauth-openshift.apps.openshift.example.com/healthz": context
            deadline exceeded (Client.Timeout exceeded while awaiting headers)
          reason: OAuthRouteCheckEndpointAccessibleController_EndpointUnavailable
          status: 'False'
          type: Available
    [..]
    
  • Other symptoms include DNS errors such as "no such host" in multiple components:

    E0107 11:30:48.746431       1 base_controller.go:250] "OAuthRouteCheckEndpointAccessibleController" controller failed to sync "key", err: Get "https://oauth-openshift.apps.openshift.example.com/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) 
    [..]
    2021-01-07T16:24:40Z auth: failed to get latest auth source data: request to OAuth issuer endpoint https://oauth-openshift.apps.openshift.example.com/oauth/token failed: Head "https://oauth-openshift.apps.openshift.example.com": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    2021-01-07T16:30:13Z Failed to dial backend: 'dial tcp: lookup kubernetes.default.svc on 10.140.0.90:53: no such host'
    2021-01-07T16:30:33Z Failed to dial backend: 'dial tcp: lookup kubernetes.default.svc on 10.140.0.90:53: no such host'
    2021-01-07T16:30:59Z Failed to dial backend: 'dial tcp: lookup kubernetes.default.svc on 10.140.0.90:53: no such host'
    
  • In the SDN Pods, the following error messages are visible and Pods may appear stuck in the ContainerCreating phase:

    Error executing ovs-ofctl: ovs-ofctl: -:2: 0/0: invalid IP address
    
  • The issue only appears on OpenShift Container Platform clusters that are using NetworkPolicies.

Environment

  • Red Hat OpenShift Container Platform (OCP) 4.6.9

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content