OpenShift image data stored in etcd results in a very large database

Solution Verified - Updated -

Issue

  • Our etcd db has grown beyond a manageable size. We have over 2,000 projects and are now at a 1.3GB snapshot size even with a pruned version of etcd. How can we solve the etcd db at scale?
  • The atomic-openshift-master-controllers service was restarting repeatedly on all masters. Further checking revealed that etcd is restarting repeatedly as well. The health checks were failing too:

    # etcdctl -C https://openshift.example.com:2379 --ca-file=/etc/origin/master/master.etcd-ca.crt --cert-file=/etc/origin/master/master.etcd-client.crt --key-file=/etc/origin/master/master.etcd-client.key cluster-health
    failed to check the health of member 3a78b19a3ba02203 on https://openshift.example.com:2379: Get https://openshift.example.com:2379/health: net/http: TLS handshake timeout
    member 3a78b19a3ba02203 is unreachable: [https://openshift.example.com:2379] are all unreachable
    

Environment

  • Red Hat OpenShift Enterprise 3.1, 3.2, 3.3

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content