OpenShift image data stored in etcd results in a very large database

Solution Verified - Updated -

Issue

  • Our etcd db has grown beyond a manageable size. We have over 2,000 projects and are now at a 1.3GB snapshot size even with a pruned version of etcd. How can we solve the etcd db at scale?
  • The atomic-openshift-master-controllers service was restarting repeatedly on all masters. Further checking revealed that etcd is restarting repeatedly as well. The health checks were failing too:

    # etcdctl -C https://openshift.example.com:2379 --ca-file=/etc/origin/master/master.etcd-ca.crt --cert-file=/etc/origin/master/master.etcd-client.crt --key-file=/etc/origin/master/master.etcd-client.key cluster-health
    failed to check the health of member 3a78b19a3ba02203 on https://openshift.example.com:2379: Get https://openshift.example.com:2379/health: net/http: TLS handshake timeout
    member 3a78b19a3ba02203 is unreachable: [https://openshift.example.com:2379] are all unreachable
    

Environment

  • Red Hat OpenShift Enterprise 3.1, 3.2, 3.3

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.