AMQ Streams multi-version upgrade fails with the Kafka broker unable to connect to Zookeeper
Issue
- Performing a multi-version upgrade from AMQ Streams 1.4.0 to AMQ Streams 1.7.0 results in the Kafka broker unable to connect to Zookeeper
- The Cluster Operator is unable to roll the Kafka broker and the upgrade cannot complete
The following error will be reported in the Cluster Operator pod log:
2021-09-29 16:34:45 INFO KafkaAssemblyOperator:996 - Kafka is upgrading from 2.4.0 to 2.7.0
2021-09-29 16:34:45 INFO KafkaAssemblyOperator:926 - Kafka upgrade from 2.4.0 to 2.7.0 requires Zookeeper upgrade from 3.5.6 to 3.5.8
2021-09-29 16:34:46 WARN VersionUsageUtils:60 - The client is using resource type 'poddisruptionbudgets' with unstable version 'v1beta1'
2021-09-29 16:34:47 INFO PodOperator:65 - Rolling update of amq-streams/my-kafka-zookeeper: Rolling pod my-kafka-zookeeper-0
2021-09-29 16:36:00 INFO KafkaRoller:504 - Reconciliation #3(watch) Kafka(amq-streams/my-streams-kafka): Pod 0 needs to be restarted. Reason: [Pod has old generation]
2021-09-29 16:36:41 WARN KafkaAvailability:86 - Error determining whether it is safe to restart pod 0
org.apache.kafka.common.errors.TimeoutException: Call(callName=describeConfigs, deadlineMs=1632933401647, tries=3, nextAllowedTryMs=1632933401748) timed out at 1632933401648 after 3 attempt(s)
Caused by: org.apache.kafka.common.errors.DisconnectException: Cancelled describeConfigs request with correlation id 11 due to node 0 being disconnected
2021-09-29 16:36:41 INFO KafkaRoller:296 - Reconciliation #3(watch) Kafka(amq-streams/my-streams-kafka): Could not roll pod 0 due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$ForceableProblem: An error while trying to determine rollability, retrying after at least 250ms
- The following error will be reported in the Kafka broker pod log:
2021-09-29 16:46:34,528 INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) [main-SendThread(localhost:2181)]
2021-09-29 16:46:34,528 INFO Socket connection established, initiating session, client: /127.0.0.1:43550, server: localhost/127.0.0.1:2181 (org.apache.zookeeper.ClientCnxn) [main-SendThread(localhost:2181)]
2021-09-29 16:46:34,534 INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) [main-SendThread(localhost:2181)]
2021-09-29 16:46:34,534 INFO Socket connection established, initiating session, client: /127.0.0.1:43558, server: localhost/127.0.0.1:2181 (org.apache.zookeeper.ClientCnxn) [main-SendThread(localhost:2181)]
2021-09-29 16:46:34,537 WARN Session 0x1002efdb8550001 for server localhost/127.0.0.1:2181, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [main-SendThread(localhost:2181)]
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:377)
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:75)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:363)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
- The following error will be reported in the Zookeeper pod log:
2021-09-29 16:46:34,544 WARN Exception caught (org.apache.zookeeper.server.NettyServerCnxnFactory) [nioEventLoopGroup-7-1]
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Client requested protocol TLSv1 is not enabled or supported in server context
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:478)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: javax.net.ssl.SSLHandshakeException: Client requested protocol TLSv1 is not enabled or supported in server context
at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131)
at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:117)
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:336)
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:292)
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:283)
at java.base/sun.security.ssl.ClientHello$ClientHelloConsumer.negotiateProtocol(ClientHello.java:883)
at java.base/sun.security.ssl.ClientHello$ClientHelloConsumer.onClientHello(ClientHello.java:835)
at java.base/sun.security.ssl.ClientHello$ClientHelloConsumer.consume(ClientHello.java:813)
at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:392)
at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:443)
at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1074)
at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1061)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask.run(SSLEngineImpl.java:1008)
at io.netty.handler.ssl.SslHandler.runAllDelegatedTasks(SslHandler.java:1558)
at io.netty.handler.ssl.SslHandler.runDelegatedTasks(SslHandler.java:1572)
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1456)
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1283)
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1330)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:508)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:447)
... 17 more
The logs from the Kafka broker and Zookeeper confirm the TLS version mismatch and indicate that Zookeeper did not automatically enable TLSv1 when the pod was rolled and upgraded.
Environment
- AMQ Streams 1.4.0
- AMQ Streams 1.7.0
- OpenShift 4.7
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.