Hawkular-cassandra pod unready while scheduled on Infra node
Environment
- OpenShift Container Platform 3.11
Issue
Hawkular-cassandra pod shows either status Running, Error or CrashLoopBackOff and is unready. This can happen in one cluster, or simultaneously in a few different OpenShift clusters as seen below:
[root@server1 ~]# oc get pod
NAME READY STATUS RESTARTS AGE
hawkular-cassandra-1-cwg6p 0/1 CrashLoopBackOff 33 2h
hawkular-metrics-k78t7 0/1 Running 865 5d
hawkular-metrics-schema-tw6kh 0/1 Completed 0 11d
heapster-tbkt9 0/1 Running 867 5d
[root@server2 ~]# oc get pod
NAME READY STATUS RESTARTS AGE
hawkular-cassandra-1-82gpn 0/1 CrashLoopBackOff 928 12d
hawkular-metrics-cqgr5 0/1 Running 526 12d
hawkular-metrics-schema-7kw4z 0/1 Completed 0 12d
heapster-lpsfm 0/1 Running 523 12d
[root@server3 ~]# oc get pod
NAME READY STATUS RESTARTS AGE
hawkular-cassandra-1-gtjzr 0/1 Error 927 14d
hawkular-metrics-schema-5zjs6 0/1 Completed 0 14d
hawkular-metrics-sprtv 0/1 Running 532 14d
heapster-vhh6g 0/1 Running 523 14d
Pod logs may show permissions errors:
sed: cannot rename /opt/apache-cassandra/conf/sed6b3JJH: Operation not permitted
sed: cannot rename /opt/apache-cassandra/conf/sedIBVSPF: Operation not permitted
sed: cannot rename /opt/apache-cassandra/conf/sedQyRWxG: Operation not permitted
sed: cannot rename /opt/apache-cassandra/conf/sedtayTwJ: Operation not permitted
sed: cannot rename /opt/apache-cassandra/conf/sedWAzVmK: Operation not permitted
sed: cannot rename /opt/apache-cassandra/conf/sedun6iVJ: Operation not permitted
sed: cannot rename /opt/apache-cassandra/conf/sed0lEnoI: Operation not permitted
sed: cannot rename /opt/apache-cassandra/conf/sed4LogeJ: Operation not permitted
sed: cannot rename /opt/apache-cassandra/conf/sedfEYpVI: Operation not permitted
Pod logs show a schema error:
2019-06-20 02:04:55,466 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms
2019-06-20 02:05:05,470 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist
Resolution
- Modify the nodeSelector in the pod configuration to schedule on a different node, specifically a compute (app) node:
# oc edit rc -o yaml hawkular-cassandra-1
# oc edit pod -o yaml hawkular-cassandra-1
Modify the nodeSelector to schedule the pod to a compute node, and remove whatever Infra node selector exists already:
nodeSelector:
beta.kubernetes.io/arch=amd64, node-role.kubernetes.io/compute=true
Root Cause
Incorrect cluster configuration, or an out of date metrics image version, can cause this to happen. For more information, see this metrics image KCS.
This issue is related to a bug related to metrics deployments on certain OpenShift 3.11 clusters.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments