Strimzi Cluster Operator failed to connect to zookeeper

Solution Verified - Updated -

Issue

Even on a fresh installed kafka cluster, using redhat shipped examples, the strimzi-cluster-operator can't talk to the zookeeper pods on :2181. This doesn't appear to be an issue until the zookeeper spec is updated which triggers the refresh and rolling restart.

[root@test cluster-operator]# oc edit Kafka my-cluster -n cg

change 

  zookeeper:
    livenessProbe:
      initialDelaySeconds: 120
      timeoutSeconds: 5
    readinessProbe:
      initialDelaySeconds: 120
      timeoutSeconds: 5
    replicas: 3
    storage:
      type: ephemeral

TO:

    livenessProbe:
      initialDelaySeconds: 60
      timeoutSeconds: 5
    readinessProbe:
      initialDelaySeconds: 60
      timeoutSeconds: 5

[root@test cluster-operator]# oc edit Kafka my-cluster -n cg
kafka.kafka.strimzi.io/my-cluster edited

... strimzi operator;

2019-05-15 03:37:20 INFO  AbstractAssemblyOperator:167 - Reconciliation #9(timer) Kafka(cg/my-cluster): Assembly my-cluster should be created or updated
2019-05-15 03:37:20 INFO  AbstractAssemblyOperator:312 - Reconciliation #9(timer) Kafka(cg/my-cluster): Assembly reconciled
2019-05-15 03:38:12 INFO  AbstractAssemblyOperator:281 - Reconciliation #10(watch) Kafka(cg/my-cluster): Kafka my-cluster in namespace cg was MODIFIED
2019-05-15 03:38:12 INFO  AbstractAssemblyOperator:167 - Reconciliation #10(watch) Kafka(cg/my-cluster): Assembly my-cluster should be created or updated
2019-05-15 03:38:12 INFO  ZookeeperLeaderFinder:90 - Trusting certificate ca.crt from Secret my-cluster-cluster-ca-cert
2019-05-15 03:38:23 WARN  ZookeeperLeaderFinder:253 - ZK my-cluster-zookeeper-0.my-cluster-zookeeper-nodes.cg.svc.cluster.local:2181: failed to connect to zookeeper:
2019-05-15 03:38:23 INFO  ZookeeperLeaderFinder:192 - No leader found for cluster my-cluster in namespace cg; backing off for 0ms (cumulative 0ms)
2019-05-15 03:38:33 WARN  ZookeeperLeaderFinder:253 - ZK my-cluster-zookeeper-0.my-cluster-zookeeper-nodes.cg.svc.cluster.local:2181: failed to connect to zookeeper:
2019-05-15 03:38:33 INFO  ZookeeperLeaderFinder:192 - No leader found for cluster my-cluster in namespace cg; backing off for 5000ms (cumulative 5000ms)

And if using the nc tool, the ZK pod can not be connected too,

[root@ndccsi-sesosm01 cluster-operator]# oc rsh strimzi-cluster-operator-6bccf4d586-n9xx8
sh-4.2$ nc -v my-cluster-zookeeper-0.my-cluster-zookeeper-client.cg.svc.cluster.local 2181
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connection timed out.
sh-4.2$ nc -v my-cluster-zookeeper-client 2181
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connection timed out.
sh-4.2$ nc -v my-cluster-zookeeper-nodes 2181
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connection to XX.XX.XX.XX failed: Connection timed out.
Ncat: Trying next address...
Ncat: Connection to XX.XX.XX.XX failed: Connection timed out.
Ncat: Trying next address...
Ncat: Connection timed out.
sh-4.2$ nc -v my-cluster-kafka-bootstrap 9092
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to XX.XX.XX.XX:9092.
^C
sh-4.2$ nc -v my-cluster-kafka-bootstrap 9093
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to XX.XX.XX.XX:9093.
^C
sh-4.2$ nc -v my-cluster-kafka-brokers 9092
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to XX.XX.XX.XX:9092.

Environment

  • Red Hat AMQ Streams 1.1.0
  • OpenShift Container Platform 3.11

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content