etcd-operator unhealthy trying to connect to defunct bootstrap etcd member endpoint
Issue
- New 4.4
etcd-operator
continuously tries and fails to connect to defunct bootstrapetcd
member endpoint:
W0616 13:16:22.642213 1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://10.0.2.123:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 10.0.2.123:2379: operation was canceled". Reconnecting...
I0616 13:16:34.671661 1 endpoint.go:68] ccResolverWrapper: sending new addresses to cc: [{https://10.0.147.167:2379 0 <nil>} {https://10.0.165.221:2379 0 <nil>} {https://10.0.135.207:2379 0 <nil>} {https://10.0.2.123:2379 0 <nil>}]
W0616 13:16:34.680664 1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://10.0.2.123:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 10.0.2.123:2379: operation was canceled". Reconnecting...
I0616 13:16:40.688206 1 endpoint.go:68] ccResolverWrapper: sending new addresses to cc: [{https://10.0.135.207:2379 0 <nil>} {https://10.0.147.167:2379 0 <nil>} {https://10.0.165.221:2379 0 <nil>} {https://10.0.2.123:2379 0 <nil>}]
W0616 13:16:40.696429 1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://10.0.2.123:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 10.0.2.123:2379: operation was canceled". Reconnecting...
- And if we check at the
etcd
endpoints within one of thepods
we can see the following error also:
sh-4.2# etcdctl endpoint health
{"level":"warn","ts":"2020-06-16T13:32:25.637Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-f533f716-59da-432a-b4bf-ccd0fc176d90/10.0.2.123:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
https://10.0.135.207:2379 is healthy: successfully committed proposal: took = 10.59705ms
https://10.0.147.167:2379 is healthy: successfully committed proposal: took = 12.764976ms
https://10.0.165.221:2379 is healthy: successfully committed proposal: took = 13.703024ms
https://10.0.2.123:2379 is unhealthy: failed to commit proposal: context deadline exceeded
Error: unhealthy cluster
Environment
- OpenShift Container Platform
- 4.4
- 4.5 (< 4.5.8)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.