Hawkular-cassandra pod unready while scheduled on Infra node

Solution Verified - Updated -

Environment

  • OpenShift Container Platform 3.11

Issue

Hawkular-cassandra pod shows either status Running, Error or CrashLoopBackOff and is unready. This can happen in one cluster, or simultaneously in a few different OpenShift clusters as seen below:

[root@server1 ~]# oc get pod
NAME                            READY     STATUS             RESTARTS   AGE
hawkular-cassandra-1-cwg6p      0/1       CrashLoopBackOff   33         2h
hawkular-metrics-k78t7          0/1       Running            865        5d
hawkular-metrics-schema-tw6kh   0/1       Completed          0          11d
heapster-tbkt9                  0/1       Running            867        5d

[root@server2 ~]# oc get pod
NAME                            READY     STATUS             RESTARTS   AGE
hawkular-cassandra-1-82gpn      0/1       CrashLoopBackOff   928        12d
hawkular-metrics-cqgr5          0/1       Running            526        12d
hawkular-metrics-schema-7kw4z   0/1       Completed          0          12d
heapster-lpsfm                  0/1       Running            523        12d

[root@server3 ~]# oc get pod
NAME                            READY     STATUS      RESTARTS   AGE
hawkular-cassandra-1-gtjzr      0/1       Error       927        14d
hawkular-metrics-schema-5zjs6   0/1       Completed   0          14d
hawkular-metrics-sprtv          0/1       Running     532        14d
heapster-vhh6g                  0/1       Running     523        14d

Pod logs may show permissions errors:

sed: cannot rename /opt/apache-cassandra/conf/sed6b3JJH: Operation not permitted
sed: cannot rename /opt/apache-cassandra/conf/sedIBVSPF: Operation not permitted
sed: cannot rename /opt/apache-cassandra/conf/sedQyRWxG: Operation not permitted
sed: cannot rename /opt/apache-cassandra/conf/sedtayTwJ: Operation not permitted
sed: cannot rename /opt/apache-cassandra/conf/sedWAzVmK: Operation not permitted
sed: cannot rename /opt/apache-cassandra/conf/sedun6iVJ: Operation not permitted
sed: cannot rename /opt/apache-cassandra/conf/sed0lEnoI: Operation not permitted
sed: cannot rename /opt/apache-cassandra/conf/sed4LogeJ: Operation not permitted
sed: cannot rename /opt/apache-cassandra/conf/sedfEYpVI: Operation not permitted

Pod logs show a schema error:

2019-06-20 02:04:55,466 INFO  [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms
2019-06-20 02:05:05,470 INFO  [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist

Resolution

  • Modify the nodeSelector in the pod configuration to schedule on a different node, specifically a compute (app) node:
  # oc edit rc -o yaml hawkular-cassandra-1
  # oc edit pod -o yaml hawkular-cassandra-1

Modify the nodeSelector to schedule the pod to a compute node, and remove whatever Infra node selector exists already:

nodeSelector:
  beta.kubernetes.io/arch=amd64, node-role.kubernetes.io/compute=true

Root Cause

Incorrect cluster configuration, or an out of date metrics image version, can cause this to happen. For more information, see this metrics image KCS.

This issue is related to a bug related to metrics deployments on certain OpenShift 3.11 clusters.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments