Error: MetricsNotAvailableYet unable to get metrics for resource cpu: failed to get heapster service: an error on the server ("x509: certificate signed by unknown authority") has prevented the request from succeeding (get services https:heapster:)

Solution Verified - Updated -

Environment

  • OpenShift Container Platform
    • 3.11

Issue

  • The horizontal-pod-autoscaler pod is failing with the following error:

    MetricsNotAvailableYet unable to get metrics for resource cpu: failed to get heapster service: an error on the server 
    ("x509:certificate signed by unknown authority") has prevented the request from succeeding (get services https:heapster:)
    
  • Error in the heapster logs,

    Could not update tags: Put https://hawkular-metrics:443/hawkular/metrics/gauges: x509: certificate has expired or is not yet valid.
    
  • Metrics is not available on the hawkular dashboard.

Resolution

Delete and recreate heapster, cassandra, and hawkuler secrets.

Change context to the metrics namespace (for auto-scaling to work correctly, metrics should have been deployed in 'openshift-infra'):

    # oc project openshift-infra

Back up the existing "opaque" metrics secrets:

    # oc get secret hawkular-cassandra-certs -o yaml > ~/hawkular-cassandra-certs.yaml
    # oc get secret hawkular-metrics-certs -o yaml > ~/hawkular-metrics-certs.yaml
    # oc get secret heapster-secrets -o yaml > ~/heapster-secrets.yaml

Delete these metrics secrets:

    # oc delete secret hawkular-cassandra-certs 
    # oc delete secret hawkular-metrics-certs
    # oc delete secret heapster-secrets

Re-run the metrics deployment ansible playbook (please take care to use the options that are appropriate to your environment including any extra options that may be needed like those for persistent storage!). Eg:

    # ansible-playbook -i <hosts_file> <path_to openshift-metrics.yml> -e openshift_metrics_install_metrics=True -e openshift_metrics_hawkular_hostname=<fqdn_for_metrics_url>

After this process, wait for the metrics pods to start:

    # watch oc get pods

Then try the curling heapster (for the method below, make sure you are logged in to OpenShift as a user with a token, not system:admin):

    # curl -v -H "Authorization: Bearer $(oc whoami -t)" -X GET  $(oc whoami --show-server)/api/v1/proxy/namespaces/openshift-infra/services/https:heapster:/api/v1/model/metrics/cpu/usage_rate

Root Cause

The cluster certificates were likely redeployed without updating the metrics components.

Diagnostic Steps

curl the heapster service API:

    # curl -v -H "Authorization: Bearer $(oc whoami -t)" -X GET  $(oc whoami --show-server)/api/v1/proxy/namespaces/openshift-infra/services/https:heapster:/api/v1/model/metrics/cpu/usage_rate

and curl a generic OpenShift API path such as:

# curl -H "Authorization: Bearer $(oc whoami -t)" -X GET -k $(oc whoami --show-server)/api/v1/

If the heapster curl fails with an error such as x509: certificate signed by unknown authority but the generic API curl succeeds, then you are experiencing the issue described in this KCS. If this error is received for both curl commands then you probably have a different problem such as a router certs issue.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments