Appendix B. Frequently Asked Questions

B.1. Cluster Operator

B.1.1. Why do I need cluster admin privileges to install AMQ Streams?

To install AMQ Streams, you must have the ability to create Custom Resource Definitions (CRDs). CRDs instruct OpenShift about resources that are specific to AMQ Streams, such as Kafka, KafkaConnect, and so on. Because CRDs are a cluster-scoped resource rather than being scoped to a particular OpenShift namespace, they typically require cluster admin privileges to install.

In addition, you must also have the ability to create ClusterRoles and ClusterRoleBindings. Like CRDs, these are cluster-scoped resources that typically require cluster admin privileges.

The cluster administrator can inspect all the resources being installed (in the /install/ directory) to assure themselves that the ClusterRoles do not grant unnecessary privileges. For more information about why the Cluster Operator installation resources grant the ability to create ClusterRoleBindings see the following question.

After installation, the Cluster Operator will run as a regular Deployment; any non-admin user with privileges to access the Deployment can configure it.

By default, normal users will not have the privileges necessary to manipulate the custom resources, such as Kafka, KafkaConnect and so on, which the Cluster Operator deals with. These privileges can be granted using normal RBAC resources by the cluster administrator. See this procedure for more details of how to do this.

B.1.2. Why does the Cluster Operator require the ability to create ClusterRoleBindings? Is that not a security risk?

OpenShift has built-in privilege escalation prevention. That means that the Cluster Operator cannot grant privileges it does not have itself. Which in turn means that the Cluster Operator needs to have the privileges necessary for all the components it orchestrates.

In the context of this question there are two places where the Cluster Operator needs to create bindings to ClusterRoleBindings to ServiceAccounts:

  1. The Topic Operator and User Operator need to be able to manipulate KafkaTopics and KafkaUsers, respectively. The Cluster Operator therefore needs to be able to grant them this access, which it does by creating a Role and RoleBinding. For this reason the Cluster Operator itself needs to be able to create Roles and RoleBindings in the namespace that those operators will run in. However, because of the privilege escalation prevention, the Cluster Operator cannot grant privileges it does not have itself (in particular it cannot grant such privileges in namespace it cannot access).
  2. When using rack-aware partition assignment, AMQ Streams needs to be able to discover the failure domain (for example, the Availability Zone in AWS) of the node on which a broker pod is assigned. To do this the broker pod needs to be able to get information about the Node it is running on. A Node is a cluster-scoped resource, so access to it can only be granted via a ClusterRoleBinding (not a namespace-scoped RoleBinding). Therefore the Cluster Operator needs to be able to create ClusterRoleBindings. But again, because of privilege escalation prevention, the Cluster Operator cannot grant privileges it does not have itself (so it cannot, for example, create a ClusterRoleBinding to a ClusterRole to grant privileges that the Cluster Operator does not not already have).

B.1.3. Why can standard OpenShift users not create the custom resource (Kafka, KafkaTopic, and so on)?

Because, when they installed AMQ Streams, the OpenShift cluster administrator did not grant the necessary privileges to standard users.

See this FAQ answer for more details.

B.1.4. Log contains warnings about failing to acquire lock

For each cluster, the Cluster Operator always executes only one operation at a time. The Cluster Operator uses locks to make sure that there are never two parallel operations running for the same cluster. In case an operation requires more time to complete, other operations will wait until it is completed and the lock is released.

INFO
Examples of cluster operations are cluster creation, rolling update, scale down or scale up and so on.

If the wait for the lock takes too long, the operation times out and the following warning message will be printed to the log:

2018-03-04 17:09:24 WARNING AbstractClusterOperations:290 - Failed to acquire lock for kafka cluster lock::kafka::myproject::my-cluster

Depending on the exact configuration of STRIMZI_FULL_RECONCILIATION_INTERVAL_MS and STRIMZI_OPERATION_TIMEOUT_MS, this warning message may appear regularly without indicating any problems. The operations which time out will be picked up by the next periodic reconciliation. It will try to acquire the lock again and execute.

Should this message appear periodically even in situations when there should be no other operations running for a given cluster, it might indicate that due to some error the lock was not properly released. In such cases it is recommended to restart the cluster operator.

B.1.5. Hostname verification fails when connecting to NodePorts using TLS

Currently, off-cluster access using NodePorts with TLS encryption enabled does not support TLS hostname verification. As a result, the clients that verify the hostname will fail to connect. For example, the Java client will fail with the following exception:

Caused by: java.security.cert.CertificateException: No subject alternative names matching IP address 168.72.15.231 found
    at sun.security.util.HostnameChecker.matchIP(HostnameChecker.java:168)
    at sun.security.util.HostnameChecker.match(HostnameChecker.java:94)
    at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:455)
    at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:436)
    at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:252)
    at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:136)
    at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1501)
    ... 17 more

To connect, you must disable hostname verification. In the Java client, you can do this by setting the configuration option ssl.endpoint.identification.algorithm to an empty string.

When configuring the client using a properties file, you can do it this way:

ssl.endpoint.identification.algorithm=

When configuring the client directly in Java, set the configuration option to an empty string:

props.put("ssl.endpoint.identification.algorithm", "");