Cluster Operators degraded after migrating from OpenShift SDN to OVN-Kubernetes in RHOCP4
Issue
- Critical cluster operators degraded after migrating from OpenShift SDN CNI to OVN-Kubernetes.
Etcd
message: 'ClusterMemberControllerDegraded: unhealthy members found during reconciling members
EtcdEndpointsDegraded: EtcdEndpointsController can''t evaluate whether quorum
is safe: etcd cluster has quorum of 2 and 2 healthy members which is not fault
tolerant: [{Member:ID:1730761836105137485 name:"master-2.example.com"
peerURLs:"https://10.186.22.11:2380" clientURLs:"https://10.186.22.11:2379" Healthy:false
....
Authentication
message: 'WellKnownAvailable: The well-known endpoint is not yet available: failed
to GET kube-apiserver oauth endpoint https://10.186.22.11:6443/.well-known/oauth-authorization-server:
dial tcp 10.186.22.11:6443: i/o timeout'
- The controlplane node generates events with warnings related to
ErrorReconcilingNodecaused due to failing SNAT sync for Node Feature Discovery (NFD) operator pods.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ErrorReconcilingNode 3m27s (x3219 over 2d20h) controlplane error creating gateway for node master-2.example.com: failed to init shared interface gateway: failed to sync stale SNATs on node master-2.example.com: unable to fetch podIPs for pod openshift-nfd/nfd-master-aqmt
Environment
- Red Hat OpenShift Container Platform (RHOCP) 4
- Red Hat Node Feature Discovery Operator (NFD)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.