Can you recover OpenShift Cluster with no backups

Posted on

Help. I hope someone can help me out. While trying to resolve issues with my cluster I stupidly cordoned on all my nodes and started to drain them only to realize too late that I drained the pods that the cluster is relying on. My cluster is now completely down but I can ssh into the nodes. However I can't uncordon the nodes because the api pod and all other dependencies are inaccessible.

I have three master nodes and the one which hosted the pods is this one.

kubectl cluster-info
E1129 18:41:30.739205 54338 memcache.go:255] couldn't get resource list for apps.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:30.840906 54338 memcache.go:255] couldn't get resource list for authorization.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:30.997903 54338 memcache.go:255] couldn't get resource list for build.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:31.033621 54338 memcache.go:255] couldn't get resource list for image.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:31.052441 54338 memcache.go:255] couldn't get resource list for project.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:31.185436 54338 memcache.go:255] couldn't get resource list for template.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:31.190519 54338 memcache.go:255] couldn't get resource list for route.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:31.190737 54338 memcache.go:255] couldn't get resource list for oauth.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:31.191610 54338 memcache.go:255] couldn't get resource list for quota.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:31.209679 54338 memcache.go:255] couldn't get resource list for security.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:31.210326 54338 memcache.go:255] couldn't get resource list for user.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:31.211279 54338 memcache.go:255] couldn't get resource list for packages.operators.coreos.com/v1: the server is currently unable to handle the request
E1129 18:41:32.040690 54338 memcache.go:106] couldn't get resource list for user.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:32.467943 54338 memcache.go:106] couldn't get resource list for packages.operators.coreos.com/v1: the server is currently unable to handle the request
E1129 18:41:32.783474 54338 memcache.go:106] couldn't get resource list for project.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:33.077875 54338 memcache.go:106] couldn't get resource list for security.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:33.445434 54338 memcache.go:106] couldn't get resource list for template.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:33.885461 54338 memcache.go:106] couldn't get resource list for quota.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:34.117484 54338 memcache.go:106] couldn't get resource list for route.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:34.499829 54338 memcache.go:106] couldn't get resource list for build.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:34.723472 54338 memcache.go:106] couldn't get resource list for apps.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:35.030821 54338 memcache.go:106] couldn't get resource list for authorization.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:35.441867 54338 memcache.go:106] couldn't get resource list for image.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:35.853868 54338 memcache.go:106] couldn't get resource list for oauth.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:36.204840 54338 memcache.go:106] couldn't get resource list for user.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:36.580446 54338 memcache.go:106] couldn't get resource list for packages.operators.coreos.com/v1: the server is currently unable to handle the request
E1129 18:41:36.822071 54338 memcache.go:106] couldn't get resource list for quota.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:36.983997 54338 memcache.go:106] couldn't get resource list for build.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:37.111390 54338 memcache.go:106] couldn't get resource list for image.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:37.264624 54338 memcache.go:106] couldn't get resource list for oauth.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:37.664670 54338 memcache.go:106] couldn't get resource list for route.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:38.353764 54338 memcache.go:106] couldn't get resource list for project.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:38.519618 54338 memcache.go:106] couldn't get resource list for security.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:38.658929 54338 memcache.go:106] couldn't get resource list for template.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:38.768183 54338 memcache.go:106] couldn't get resource list for apps.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:38.928919 54338 memcache.go:106] couldn't get resource list for authorization.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:39.134149 54338 memcache.go:106] couldn't get resource list for oauth.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:39.363737 54338 memcache.go:106] couldn't get resource list for project.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:39.604269 54338 memcache.go:106] couldn't get resource list for quota.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:39.847388 54338 memcache.go:106] couldn't get resource list for route.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:40.017475 54338 memcache.go:106] couldn't get resource list for security.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:40.197562 54338 memcache.go:106] couldn't get resource list for template.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:40.466635 54338 memcache.go:106] couldn't get resource list for user.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:40.772169 54338 memcache.go:106] couldn't get resource list for packages.operators.coreos.com/v1: the server is currently unable to handle the request
E1129 18:41:41.065146 54338 memcache.go:106] couldn't get resource list for apps.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:41.462559 54338 memcache.go:106] couldn't get resource list for authorization.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:41.805974 54338 memcache.go:106] couldn't get resource list for build.openshift.io/v1: the server is currently unable to handle the request
E1129 18:41:42.148497 54338 memcache.go:106] couldn't get resource list for image.openshift.io/v1: the server is currently unable to handle the request

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
Error from server (Forbidden): services is forbidden: User "system:serviceaccount:openshift-machine-config-operator:node-bootstrapper" cannot list resource "services" in API group "" in the namespace "kube-system"

kubectl get pods --all-namespaces
E1129 18:21:41.437918 33168 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E1129 18:21:41.438576 33168 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E1129 18:21:41.440172 33168 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E1129 18:21:41.441725 33168 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E1129 18:21:41.443166 33168 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?

Please help. Any suggestions at all would be helpful.

Responses