Cluster Operators degraded after migrating from OpenShift SDN to OVN-Kubernetes in RHOCP4

Solution Verified - Updated -

Issue

  • Critical cluster operators degraded after migrating from OpenShift SDN CNI to OVN-Kubernetes.

Etcd

      message: 'ClusterMemberControllerDegraded: unhealthy members found during reconciling members

      EtcdEndpointsDegraded: EtcdEndpointsController can''t evaluate whether quorum
      is safe: etcd cluster has quorum of 2 and 2 healthy members which is not fault
      tolerant: [{Member:ID:1730761836105137485 name:"master-2.example.com"
      peerURLs:"https://10.186.22.11:2380" clientURLs:"https://10.186.22.11:2379"  Healthy:false
      ....

Authentication

      message: 'WellKnownAvailable: The well-known endpoint is not yet available: failed
      to GET kube-apiserver oauth endpoint https://10.186.22.11:6443/.well-known/oauth-authorization-server:
      dial tcp 10.186.22.11:6443: i/o timeout'
  • The controlplane node generates events with warnings related to ErrorReconcilingNode caused due to failing SNAT sync for Node Feature Discovery (NFD) operator pods.
Events:
  Type     Reason                Age                       From          Message
  ----     ------                ----                      ----          -------
  Warning  ErrorReconcilingNode  3m27s (x3219 over 2d20h)  controlplane  error creating gateway for node master-2.example.com: failed to init shared interface gateway: failed to sync stale SNATs on node master-2.example.com: unable to fetch podIPs for pod openshift-nfd/nfd-master-aqmt

Environment

  • Red Hat OpenShift Container Platform (RHOCP) 4
  • Red Hat Node Feature Discovery Operator (NFD)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content