OpenShift image data stored in etcd results in a very large database

Solution Verified - Updated -

Issue

  • Our etcd db has grown beyond a manageable size. We have over 2,000 projects and are now at a 1.3GB snapshot size even with a pruned version of etcd. How can we solve the etcd db at scale?
  • The atomic-openshift-master-controllers service was restarting repeatedly on all masters. Further checking revealed that etcd is restarting repeatedly as well. The health checks were failing too:

    # etcdctl -C https://openshift.example.com:2379 --ca-file=/etc/origin/master/master.etcd-ca.crt --cert-file=/etc/origin/master/master.etcd-client.crt --key-file=/etc/origin/master/master.etcd-client.key cluster-health
    failed to check the health of member 3a78b19a3ba02203 on https://openshift.example.com:2379: Get https://openshift.example.com:2379/health: net/http: TLS handshake timeout
    member 3a78b19a3ba02203 is unreachable: [https://openshift.example.com:2379] are all unreachable
    

Environment

  • Red Hat OpenShift Enterprise 3.1, 3.2, 3.3

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In