etcd-operator unhealthy trying to connect to defunct bootstrap etcd member endpoint

Solution Verified - Updated -

Issue

  • New 4.4 etcd-operator continuously tries and fails to connect to defunct bootstrap etcd member endpoint:
W0616 13:16:22.642213       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://10.0.2.123:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 10.0.2.123:2379: operation was canceled". Reconnecting...
I0616 13:16:34.671661       1 endpoint.go:68] ccResolverWrapper: sending new addresses to cc: [{https://10.0.147.167:2379 0  <nil>} {https://10.0.165.221:2379 0  <nil>} {https://10.0.135.207:2379 0  <nil>} {https://10.0.2.123:2379 0  <nil>}]
W0616 13:16:34.680664       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://10.0.2.123:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 10.0.2.123:2379: operation was canceled". Reconnecting...
I0616 13:16:40.688206       1 endpoint.go:68] ccResolverWrapper: sending new addresses to cc: [{https://10.0.135.207:2379 0  <nil>} {https://10.0.147.167:2379 0  <nil>} {https://10.0.165.221:2379 0  <nil>} {https://10.0.2.123:2379 0  <nil>}]
W0616 13:16:40.696429       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://10.0.2.123:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 10.0.2.123:2379: operation was canceled". Reconnecting...
  • And if we check at the etcd endpoints within one of the pods we can see the following error also:
sh-4.2# etcdctl endpoint health
{"level":"warn","ts":"2020-06-16T13:32:25.637Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-f533f716-59da-432a-b4bf-ccd0fc176d90/10.0.2.123:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
https://10.0.135.207:2379 is healthy: successfully committed proposal: took = 10.59705ms
https://10.0.147.167:2379 is healthy: successfully committed proposal: took = 12.764976ms
https://10.0.165.221:2379 is healthy: successfully committed proposal: took = 13.703024ms
https://10.0.2.123:2379 is unhealthy: failed to commit proposal: context deadline exceeded
Error: unhealthy cluster

Environment

  • OpenShift Container Platform
    • 4.4
    • 4.5 (< 4.5.8)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content