etcd-operator unhealthy trying to connect to defunct bootstrap etcd member endpoint

Solution Verified - Updated 2024-06-14T01:19:19+00:00 -

Issue

New 4.4 etcd-operator continuously tries and fails to connect to defunct bootstrap etcd member endpoint:

W0616 13:16:22.642213       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://10.0.2.123:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 10.0.2.123:2379: operation was canceled". Reconnecting...
I0616 13:16:34.671661       1 endpoint.go:68] ccResolverWrapper: sending new addresses to cc: [{https://10.0.147.167:2379 0  <nil>} {https://10.0.165.221:2379 0  <nil>} {https://10.0.135.207:2379 0  <nil>} {https://10.0.2.123:2379 0  <nil>}]
W0616 13:16:34.680664       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://10.0.2.123:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 10.0.2.123:2379: operation was canceled". Reconnecting...
I0616 13:16:40.688206       1 endpoint.go:68] ccResolverWrapper: sending new addresses to cc: [{https://10.0.135.207:2379 0  <nil>} {https://10.0.147.167:2379 0  <nil>} {https://10.0.165.221:2379 0  <nil>} {https://10.0.2.123:2379 0  <nil>}]
W0616 13:16:40.696429       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://10.0.2.123:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 10.0.2.123:2379: operation was canceled". Reconnecting...

And if we check at the etcd endpoints within one of the pods we can see the following error also:

sh-4.2# etcdctl endpoint health
{"level":"warn","ts":"2020-06-16T13:32:25.637Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-f533f716-59da-432a-b4bf-ccd0fc176d90/10.0.2.123:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
https://10.0.135.207:2379 is healthy: successfully committed proposal: took = 10.59705ms
https://10.0.147.167:2379 is healthy: successfully committed proposal: took = 12.764976ms
https://10.0.165.221:2379 is healthy: successfully committed proposal: took = 13.703024ms
https://10.0.2.123:2379 is unhealthy: failed to commit proposal: context deadline exceeded
Error: unhealthy cluster

Environment

OpenShift Container Platform
- 4.4
- 4.5 (< 4.5.8)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

etcd-operator unhealthy trying to connect to defunct bootstrap etcd member endpoint

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links